TY - GEN
T1 - A knowledge-based semantic kernel for text classification
AU - Nasir, Jamal Abdul
AU - Karim, Asim
AU - Tsatsaronis, George
AU - Varlamis, Iraklis
PY - 2011
Y1 - 2011
N2 - Typically, in textual document classification the documents are represented in the vector space using the "Bag of Words" (BOW) approach. Despite its ease of use, BOW representation cannot handle word synonymy and polysemy problems and does not consider semantic relatedness between words. In this paper, we overcome the shortages of the BOW approach by embedding a known WordNet-based semantic relatedness measure for pairs of words, namely Omiotis, into a semantic kernel. The suggested measure incorporates the TF-IDF weighting scheme, thus creating a semantic kernel which combines both semantic and statistical information from text. Empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the standard BOW representation, when Omiotis is embedded in four different classifiers.
AB - Typically, in textual document classification the documents are represented in the vector space using the "Bag of Words" (BOW) approach. Despite its ease of use, BOW representation cannot handle word synonymy and polysemy problems and does not consider semantic relatedness between words. In this paper, we overcome the shortages of the BOW approach by embedding a known WordNet-based semantic relatedness measure for pairs of words, namely Omiotis, into a semantic kernel. The suggested measure incorporates the TF-IDF weighting scheme, thus creating a semantic kernel which combines both semantic and statistical information from text. Empirical evaluation with real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the standard BOW representation, when Omiotis is embedded in four different classifiers.
KW - Semantic Kernels
KW - Text Classification
KW - Thesaurus
UR - http://www.scopus.com/inward/record.url?scp=80053966832&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-24583-1_25
DO - 10.1007/978-3-642-24583-1_25
M3 - Contribución a la conferencia
AN - SCOPUS:80053966832
SN - 9783642245824
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 261
EP - 266
BT - String Processing and Information Retrieval - 18th International Symposium, SPIRE 2011, Proceedings
T2 - 18th International Symposium on String Processing and Information Retrieval, SPIRE 2011
Y2 - 17 October 2011 through 21 October 2011
ER -