TY - GEN
T1 - Corpus annotation as a scientific task
AU - Scott, Donia
AU - Barone, Rossano
AU - Koeling, Rob
PY - 2012
Y1 - 2012
N2 - Annotation studies in CL are generally unscientific: they are mostly not reproducible, make use of too few (and often non-independent) annotators and use guidelines that are often something of a moving target. Additionally, the notion of 'expert annotators' invariably means only that the annotators have linguistic training. While this can be acceptable in some special contexts, it is often far from ideal. This is particularly the case when subtle judgements are required or when, as increasingly, one is making use of corpora originating from technical texts that have been produced by, and intended to be consumed by, an audience of technical experts in the field. We outline a more rigorous approach to collecting human annotations, using as our example a study designed to capture judgements on the meaning of hedge words in medical records.
AB - Annotation studies in CL are generally unscientific: they are mostly not reproducible, make use of too few (and often non-independent) annotators and use guidelines that are often something of a moving target. Additionally, the notion of 'expert annotators' invariably means only that the annotators have linguistic training. While this can be acceptable in some special contexts, it is often far from ideal. This is particularly the case when subtle judgements are required or when, as increasingly, one is making use of corpora originating from technical texts that have been produced by, and intended to be consumed by, an audience of technical experts in the field. We outline a more rigorous approach to collecting human annotations, using as our example a study designed to capture judgements on the meaning of hedge words in medical records.
KW - Annotation
KW - Electronic patient records
KW - Hedges
UR - http://www.scopus.com/inward/record.url?scp=84911904778&partnerID=8YFLogxK
M3 - Contribución a la conferencia
AN - SCOPUS:84911904778
T3 - Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012
SP - 1481
EP - 1485
BT - Proceedings of the 8th International Conference on Language Resources and Evaluation, LREC 2012
A2 - Dogan, Mehmet Ugur
A2 - Mariani, Joseph
A2 - Moreno, Asuncion
A2 - Goggi, Sara
A2 - Choukri, Khalid
A2 - Calzolari, Nicoletta
A2 - Odijk, Jan
A2 - Declerck, Thierry
A2 - Maegaard, Bente
A2 - Piperidis, Stelios
A2 - Mazo, Helene
A2 - Hamon, Olivier
PB - European Language Resources Association (ELRA)
T2 - 8th International Conference on Language Resources and Evaluation, LREC 2012
Y2 - 21 May 2012 through 27 May 2012
ER -