Enhancing Extreme Multi-Label Text Classification: Addressing Challenges in Model, Data, and Evaluation

Dan Li, Zi Long Zhu, Janneke van de Loo, Agnés Masip Gómez, Vikrant Yadav, Georgios Tsatsaronis, Zubair Afzal

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Extreme multi-label text classification is a prevalent task in industry, but it frequently encounters challenges in terms of machine learning perspectives, including model limitations, data scarcity, and time-consuming evaluation. This paper aims to mitigate these issues by introducing novel approaches. Firstly, we propose a label ranking model as an alternative to the conventional SciBERT-based classification model, enabling efficient handling of large-scale labels and accommodating new labels. Secondly, we present an active learning-based pipeline that addresses the data scarcity of new labels during the update of a classification system. Finally, we introduce ChatGPT to assist with model evaluation. Our experiments demonstrate the effectiveness of these techniques in enhancing the extreme multi-label text classification task.

Original languageEnglish
Title of host publicationEMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Industry Track
EditorsMingxuan Wang, Imed Zitouni
PublisherAssociation for Computational Linguistics (ACL)
Pages313-321
Number of pages9
ISBN (Electronic)9788891760684
DOIs
StatePublished - 2023
Event2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, EMNLP 2023 - Singapore, Singapore
Duration: Dec 6 2023Dec 10 2023

Publication series

NameEMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Industry Track

Conference

Conference2023 Conference on Empirical Methods in Natural Language Processing: Industry Track, EMNLP 2023
Country/TerritorySingapore
CitySingapore
Period12/6/2312/10/23

Fingerprint

Dive into the research topics of 'Enhancing Extreme Multi-Label Text Classification: Addressing Challenges in Model, Data, and Evaluation'. Together they form a unique fingerprint.

Cite this