Automated Synonym Discovery for Taxonomy Maintenance Using Semantic Search Techniques

Maziar Moradi Fard, Camilo Thorne, Paula Sorolla Bayod, Saber Akhondi, Wytze Vlietstra

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Taxonomies group synonymous terms together into concepts, arranged into hierarchical “broader than” semantic relations. However, creating and maintaining taxonomies is labour-intensive, especially when they reach a scale of hundreds of thousands or millions of terms. Here, we present an automated solution to support taxonomy editors in identifying synonymous terms in scientific literature, by leveraging semantic search techniques. Our method first encodes all taxonomy terms or phrases using a pre-trained BERT-based model. Subsequently, we employ FAISS vector search to efficiently discover synonyms for each term. We evaluate by comparing the terms considered synonymous by our method to a manually curated taxonomy that consists of more than 770,000 terms. By integrating state-of-the-art NLP and search methodologies, our approach offers a practical and efficient solution, that can achieve up to 0.79 precision and 0.25 recall for synonym discovery. This automation scales to large taxonomies and can be used at runtime in large taxonomy-driven document retrieval systems.

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems - 29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Proceedings
EditorsAmon Rapp, Luigi Di Caro, Farid Meziane, Vijayan Sugumaran
PublisherSpringer Science and Business Media Deutschland GmbH
Pages352-358
Number of pages7
ISBN (Print)9783031702419
DOIs
StatePublished - 2024
Externally publishedYes
Event29th International Conference on Natural Language and Information Systems, NLDB 2024 - Turin, Italy
Duration: Jun 25 2024Jun 27 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14763 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference29th International Conference on Natural Language and Information Systems, NLDB 2024
Country/TerritoryItaly
CityTurin
Period06/25/2406/27/24

Keywords

  • Natural Language Processing
  • Synonym Discovery
  • Taxonomy Maintenance
  • Taxonomy Production

Fingerprint

Dive into the research topics of 'Automated Synonym Discovery for Taxonomy Maintenance Using Semantic Search Techniques'. Together they form a unique fingerprint.

Cite this