Word sense disambiguation for exploiting hierarchical thesauri in text classification

Dimitrios Mavroeidis, George Tsatsaronis, Michails Vazirgiannis, Martin Theobald, Gerhard Weikum

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

45 Scopus citations

Abstract

The introduction of hierarchical thesauri (HT) that contain significant semantic information, has led researchers to investigate their potential for improving performance of the text classification task, extending the traditional "bag of words" representation, incorporating syntactic and semantic relationships among words. In this paper we address this problem by proposing a Word Sense Disambiguation (WSD) approach based on the intuition that word proximity in the document implies proximity also in the HT graph. We argue that the high precision exhibited by our WSD algorithm in various humanly-disambiguated benchmark datasets, is appropriate for the classification task. Moreover, we define a semantic kernel, based on the general concept of GVSM kernels, that captures the semantic relations contained in the hierarchical thesaurus. Finally, we conduct experiments using various corpora achieving a systematic improvement in classification accuracy using the SVM algorithm, especially when the training set is small.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages181-192
Number of pages12
DOIs
StatePublished - 2005
Externally publishedYes
Event9th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2005 - Porto, Portugal
Duration: Oct 3 2005Oct 7 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3721 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2005
Country/TerritoryPortugal
CityPorto
Period10/3/0510/7/05

Fingerprint

Dive into the research topics of 'Word sense disambiguation for exploiting hierarchical thesauri in text classification'. Together they form a unique fingerprint.

Cite this