Benchmarking Named Entity Recognition Approaches for Extracting Research Infrastructure Information from Text

Georgios Cheirmpos, Seyed Amin Tabatabaei, Evangelos Kanoulas, Georgios Tsatsaronis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Named entity recognition (NER) is an important component of many information extraction and linking pipelines. The task is especially challenging in a low-resource scenario, where there is very limited amount of high quality annotated data. In this paper we benchmark machine learning approaches for NER that may be very effective in such cases, and compare their performance in a novel application; information extraction of research infrastructure from scientific manuscripts. We explore approaches such as incorporating Contrastive Learning (CL), as well as Conditional Random Fields (CRF) weights in BERT-based architectures and demonstrate experimentally that such combinations are very efficient in few-shot learning set-ups, verifying similar findings that have been reported in other areas of NLP, as well as Computer Vision. More specifically, we show that the usage of CRF weights in BERT-based architectures achieves noteworthy improvements in the overall NER task by approximately 12%, and that in few-shot setups the effectiveness of CRF weights is much higher in smaller training sets.

Original languageEnglish
Title of host publicationMachine Learning, Optimization, and Data Science - 9th International Conference, LOD 2023
EditorsGiuseppe Nicosia, Varun Ojha, Emanuele La Malfa, Gabriele La Malfa, Panos M. Pardalos, Renato Umeton
PublisherSpringer Science and Business Media Deutschland GmbH
Pages131-141
Number of pages11
ISBN (Print)9783031539688
DOIs
StatePublished - 2024
Externally publishedYes
Event9th International Conference on Machine Learning, Optimization, and Data Science, LOD 2023 - Grasmere, United Kingdom
Duration: Sep 22 2023Sep 26 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14505 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference9th International Conference on Machine Learning, Optimization, and Data Science, LOD 2023
Country/TerritoryUnited Kingdom
CityGrasmere
Period09/22/2309/26/23

Keywords

  • Contrastive Learning
  • Few-Shot Learning
  • Named Entity Recognition
  • Natural Language Processing

Fingerprint

Dive into the research topics of 'Benchmarking Named Entity Recognition Approaches for Extracting Research Infrastructure Information from Text'. Together they form a unique fingerprint.

Cite this