TY - GEN
T1 - Automated Synonym Discovery for Taxonomy Maintenance Using Semantic Search Techniques
AU - Moradi Fard, Maziar
AU - Thorne, Camilo
AU - Sorolla Bayod, Paula
AU - Akhondi, Saber
AU - Vlietstra, Wytze
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - Taxonomies group synonymous terms together into concepts, arranged into hierarchical “broader than” semantic relations. However, creating and maintaining taxonomies is labour-intensive, especially when they reach a scale of hundreds of thousands or millions of terms. Here, we present an automated solution to support taxonomy editors in identifying synonymous terms in scientific literature, by leveraging semantic search techniques. Our method first encodes all taxonomy terms or phrases using a pre-trained BERT-based model. Subsequently, we employ FAISS vector search to efficiently discover synonyms for each term. We evaluate by comparing the terms considered synonymous by our method to a manually curated taxonomy that consists of more than 770,000 terms. By integrating state-of-the-art NLP and search methodologies, our approach offers a practical and efficient solution, that can achieve up to 0.79 precision and 0.25 recall for synonym discovery. This automation scales to large taxonomies and can be used at runtime in large taxonomy-driven document retrieval systems.
AB - Taxonomies group synonymous terms together into concepts, arranged into hierarchical “broader than” semantic relations. However, creating and maintaining taxonomies is labour-intensive, especially when they reach a scale of hundreds of thousands or millions of terms. Here, we present an automated solution to support taxonomy editors in identifying synonymous terms in scientific literature, by leveraging semantic search techniques. Our method first encodes all taxonomy terms or phrases using a pre-trained BERT-based model. Subsequently, we employ FAISS vector search to efficiently discover synonyms for each term. We evaluate by comparing the terms considered synonymous by our method to a manually curated taxonomy that consists of more than 770,000 terms. By integrating state-of-the-art NLP and search methodologies, our approach offers a practical and efficient solution, that can achieve up to 0.79 precision and 0.25 recall for synonym discovery. This automation scales to large taxonomies and can be used at runtime in large taxonomy-driven document retrieval systems.
KW - Natural Language Processing
KW - Synonym Discovery
KW - Taxonomy Maintenance
KW - Taxonomy Production
UR - http://www.scopus.com/inward/record.url?scp=85205464354&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-70242-6_33
DO - 10.1007/978-3-031-70242-6_33
M3 - Conference contribution
AN - SCOPUS:85205464354
SN - 9783031702419
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 352
EP - 358
BT - Natural Language Processing and Information Systems - 29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Proceedings
A2 - Rapp, Amon
A2 - Di Caro, Luigi
A2 - Meziane, Farid
A2 - Sugumaran, Vijayan
PB - Springer Science and Business Media Deutschland GmbH
T2 - 29th International Conference on Natural Language and Information Systems, NLDB 2024
Y2 - 25 June 2024 through 27 June 2024
ER -