A knowledge-based semantic kernel for text classification

Jamal Abdul Nasir, Asim Karim, George Tsatsaronis, Iraklis Varlamis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

23 Scopus citations

Abstract

Typically, in textual document classification the documents are represented in the vector space using the "Bag of Words" (BOW) approach. Despite its ease of use, BOW representation cannot handle word synonymy and polysemy problems and does not consider semantic relatedness between words. In this paper, we overcome the shortages of the BOW approach by embedding a known WordNet-based semantic relatedness measure for pairs of words, namely Omiotis, into a semantic kernel. The suggested measure incorporates the TF-IDF weighting scheme, thus creating a semantic kernel which combines both semantic and statistical information from text. Empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the standard BOW representation, when Omiotis is embedded in four different classifiers.

Original languageEnglish
Title of host publicationString Processing and Information Retrieval - 18th International Symposium, SPIRE 2011, Proceedings
Pages261-266
Number of pages6
DOIs
StatePublished - 2011
Externally publishedYes
Event18th International Symposium on String Processing and Information Retrieval, SPIRE 2011 - Pisa, Italy
Duration: Oct 17 2011Oct 21 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7024 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference18th International Symposium on String Processing and Information Retrieval, SPIRE 2011
Country/TerritoryItaly
CityPisa
Period10/17/1110/21/11

Keywords

  • Semantic Kernels
  • Text Classification
  • Thesaurus

Fingerprint

Dive into the research topics of 'A knowledge-based semantic kernel for text classification'. Together they form a unique fingerprint.

Cite this