TY - GEN
T1 - Word sense disambiguation for exploiting hierarchical thesauri in text classification
AU - Mavroeidis, Dimitrios
AU - Tsatsaronis, George
AU - Vazirgiannis, Michails
AU - Theobald, Martin
AU - Weikum, Gerhard
PY - 2005
Y1 - 2005
N2 - The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification task, extending the traditional "bag of words" representation, incorporating syntactic and semantic relationships among words. In this paper we address this problem by proposing a Word Sense Disambiguation (WSD) approach based on the intuition that word proximity in the document implies proximity also in the HT graph. We argue that the high precision exhibited by our WSD algorithm in various humanly-disambiguated benchmark datasets, is appropriate for the classification task. Moreover, we define a semantic kernel, based on the general concept of GVSM kernels, that captures the semantic relations contained in the hierarchical thesaurus. Finally, we conduct experiments using various corpora achieving a systematic improvement in classification accuracy using the SVM algorithm, especially when the training set is small.
AB - The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification task, extending the traditional "bag of words" representation, incorporating syntactic and semantic relationships among words. In this paper we address this problem by proposing a Word Sense Disambiguation (WSD) approach based on the intuition that word proximity in the document implies proximity also in the HT graph. We argue that the high precision exhibited by our WSD algorithm in various humanly-disambiguated benchmark datasets, is appropriate for the classification task. Moreover, we define a semantic kernel, based on the general concept of GVSM kernels, that captures the semantic relations contained in the hierarchical thesaurus. Finally, we conduct experiments using various corpora achieving a systematic improvement in classification accuracy using the SVM algorithm, especially when the training set is small.
UR - http://www.scopus.com/inward/record.url?scp=33646432735&partnerID=8YFLogxK
U2 - 10.1007/11564126_21
DO - 10.1007/11564126_21
M3 - Contribución a la conferencia
AN - SCOPUS:33646432735
SN - 3540292446
SN - 9783540292449
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 181
EP - 192
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
T2 - 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2005
Y2 - 3 October 2005 through 7 October 2005
ER -