TY - JOUR
T1 - Finding predominant word senses in untagged text
AU - McCarthy, Diana
AU - Koeling, Rob
AU - Weeds, Julie
AU - Carroll, John
N1 - Publisher Copyright:
© 2021 Proceedings of the Annual Meeting of the Association for Computational Linguistics. All Rights Reserved.
PY - 2004
Y1 - 2004
N2 - In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The problem with using the predominant, or first sense heuristic, aside from the fact that it does not take surrounding context into account, is that it assumes some quantity of hand-tagged data. Whilst there are a few hand-tagged corpora available for some languages, one would expect the frequency distribution of the senses of words, particularly topical words, to depend on the genre and domain of the text under consideration. We present work on the use of a thesaurus acquired from raw textual corpora and the WordNet similarity package to find predominant noun senses automatically. The acquired predominant senses give a precision of 64% on the nouns of the SENSEVAL-2 English all-words task. This is a very promising result given that our method does not require any hand-tagged text, such as SemCor. Furthermore, we demonstrate that our method discovers appropriate predominant senses for words from two domain-specific corpora.
AB - In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The problem with using the predominant, or first sense heuristic, aside from the fact that it does not take surrounding context into account, is that it assumes some quantity of hand-tagged data. Whilst there are a few hand-tagged corpora available for some languages, one would expect the frequency distribution of the senses of words, particularly topical words, to depend on the genre and domain of the text under consideration. We present work on the use of a thesaurus acquired from raw textual corpora and the WordNet similarity package to find predominant noun senses automatically. The acquired predominant senses give a precision of 64% on the nouns of the SENSEVAL-2 English all-words task. This is a very promising result given that our method does not require any hand-tagged text, such as SemCor. Furthermore, we demonstrate that our method discovers appropriate predominant senses for words from two domain-specific corpora.
UR - http://www.scopus.com/inward/record.url?scp=85149116612&partnerID=8YFLogxK
M3 - Artículo de la conferencia
AN - SCOPUS:85149116612
SN - 0736-587X
SP - 279
EP - 286
JO - Proceedings of the Annual Meeting of the Association for Computational Linguistics
JF - Proceedings of the Annual Meeting of the Association for Computational Linguistics
T2 - 42nd Annual Meeting of the Association for Computational Linguistics, ACL 2004
Y2 - 21 July 2004 through 26 July 2004
ER -