Elsevier at SimpleText: Passage Retrieval by Fine-tuning GPL on Scientific Documents

Artemis Capari, Hosein Azarbonyad, Georgios Tsatsaronis, Zubair Afzal

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

CLEF SimpleText Lab is centered around finding relevant passages from a large collection of scientific documents in response to a lay query, detecting and explaining difficult terminology within those passages, and finally simplifying the passages. The first task is similar to the ad-hoc retrieval task in which given a topic/query, the goal is to retrieve relevant passages, but in addition to the relevance, ranking models should assess documents based on their readability/complexity as well. This paper describes our approach towards building a ranking model to tackle the first task. To build the ranking model, we first evaluate performance of several models on a proprietary test collection constructed based on scientific documents across multiple science domains. Then, we fine-tune the best performing model on a large collection of unlabelled documents using the Generative Pseudo Labeling approach. The key contribution and findings of our approach is that a bi-encoder model, trained on the MS-Marco dataset, fine-tuned further on a large collection of unlabelled scientific passages achieves the highest performance on the proprietary dataset which is specifically designed for the scientific passage retrieval task. Finally, fine-tuning a model in the same fashion, but only using the Computer Science queries from the test collection has proven to be successful for SimpleText Task 1.

Original languageEnglish
Pages (from-to)2923-2934
Number of pages12
JournalCEUR Workshop Proceedings
Volume3497
StatePublished - 2023
Externally publishedYes
Event24th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF-WN 2023 - Thessaloniki, Greece
Duration: Sep 18 2023Sep 21 2023

Keywords

  • Domain Adaptation
  • Information Retrieval
  • Scholarly Document Processing
  • Scientific Documents

Fingerprint

Dive into the research topics of 'Elsevier at SimpleText: Passage Retrieval by Fine-tuning GPL on Scientific Documents'. Together they form a unique fingerprint.

Cite this