Automatic identification of infrequent word senses

Diana McCarthy, Rob Koeling, Julie Weeds, John Carroll

Research output: Contribution to conferencePaperpeer-review

4 Scopus citations

Abstract

In this paper we show that an unsupervised method for ranking word senses automatically can be used to identify infrequently occurring senses. We demonstrate this using a ranking of noun senses derived from the BNC and evaluating on the sense-tagged text available in both SemCor and the SENSEVAL-2 English all-words task. We show that the method does well at identifying senses that do not occur in a corpus, and that those that are erroneously filtered but do occur typically have a lower frequency than the other senses. This method should be useful for word sense disambiguation systems, allowing effort to be concentrated on more frequent senses; it may also be useful for other tasks such as lexical acquisition. Whilst the results on balanced corpora are promising, our chief motivation for the method is for application to domain specific text. For text within a particular domain many senses from a generic inventory will be rare, and possibly redundant. Since a large domain specific corpus of sense annotated data is not available, we evaluate our method on domain-specific corpora and demonstrate that sense types identified for removal are predominantly senses from outside the domain.

Original languageEnglish
StatePublished - 2004
Externally publishedYes
Event20th International Conference on Computational Linguistics, COLING 2004 - Geneva, Switzerland
Duration: Aug 23 2004Aug 27 2004

Conference

Conference20th International Conference on Computational Linguistics, COLING 2004
Country/TerritorySwitzerland
CityGeneva
Period08/23/0408/27/04

Fingerprint

Dive into the research topics of 'Automatic identification of infrequent word senses'. Together they form a unique fingerprint.

Cite this