TY - JOUR
T1 - Integrating clustering with evolutionary feature selection using ENORA and SToWVector
AU - Mackenzie-Rivero, Alexander José
AU - Martínez-Béjar, Rodrigo
AU - Vegas-Meléndez, Hilarión José
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/12
Y1 - 2025/12
N2 - The rapid growth of textual data from sources such as social media, blogs, and digital libraries has intensified the demand for scalable and semantically informed classification methods. This study introduces a hybrid framework that integrates unsupervised clustering, evolutionary feature selection, and semantic interpretation to enhance automatic text classification. The approach combines the SToWVector representation with a Multi-Objective Evolutionary Search (MOES) strategy optimized through the ENORA algorithm, while employing the NaiveBayesMultinomial classifier for evaluation. Semantic interpretation is incorporated via ontological reasoning, enabling the model to capture latent conceptual relationships among terms and thereby complement both the clustering and feature selection processes. Experimental evaluations on benchmark and large-scale datasets (SMS Spam and Euronews) demonstrate the robustness of the framework, including a scenario in which 100% accuracy was achieved. The proposed method outperforms traditional models and achieves competitive results against deep learning-based classifiers. These findings underscore the framework's adaptability and effectiveness in managing high-dimensional unstructured text, while preserving interpretability through symbolic reasoning.
AB - The rapid growth of textual data from sources such as social media, blogs, and digital libraries has intensified the demand for scalable and semantically informed classification methods. This study introduces a hybrid framework that integrates unsupervised clustering, evolutionary feature selection, and semantic interpretation to enhance automatic text classification. The approach combines the SToWVector representation with a Multi-Objective Evolutionary Search (MOES) strategy optimized through the ENORA algorithm, while employing the NaiveBayesMultinomial classifier for evaluation. Semantic interpretation is incorporated via ontological reasoning, enabling the model to capture latent conceptual relationships among terms and thereby complement both the clustering and feature selection processes. Experimental evaluations on benchmark and large-scale datasets (SMS Spam and Euronews) demonstrate the robustness of the framework, including a scenario in which 100% accuracy was achieved. The proposed method outperforms traditional models and achieves competitive results against deep learning-based classifiers. These findings underscore the framework's adaptability and effectiveness in managing high-dimensional unstructured text, while preserving interpretability through symbolic reasoning.
KW - Clustering
KW - Feature selection
KW - Machine learning
KW - Text classification
UR - https://www.scopus.com/pages/publications/105015683997
U2 - 10.1016/j.array.2025.100508
DO - 10.1016/j.array.2025.100508
M3 - Artículo
AN - SCOPUS:105015683997
SN - 2590-0056
VL - 28
JO - Array
JF - Array
M1 - 100508
ER -