Overview of the DagPap22 Shared Task on Detecting Automatically Generated Scientific Papers

Yury Kashnitsky, Drahomira Herrmannova, Anita de Waard, Georgios Tsatsaronis, Catriona Fennell, Cyril Labbé

Research output: Contribution to journalConference articlepeer-review

Abstract

This paper provides an overview of the 2022 COLING Scholarly Document Processing workshop shared task on the detection of automatically generated scientific papers. We frame the detection problem as a binary classification task: given an excerpt of text, label it as either human-written or machine-generated. We shared a dataset containing excerpts from human-written papers as well as artificially generated content and suspicious documents collected by Elsevier publishing and editorial teams. As a test set, the participants were provided with a 5x larger corpus of openly accessible human-written as well as generated papers from the same scientific domains of documents. The shared task saw 180 submissions across 14 participating teams and resulted in two published technical reports. We discuss our findings from the shared task in this overview paper.

Original languageEnglish
Pages (from-to)210-213
Number of pages4
JournalProceedings - International Conference on Computational Linguistics, COLING
Volume29
Issue number9
StatePublished - 2022
Externally publishedYes
Event3rd Workshop on Scholarly Document Processing, SDP 2022 at 29th International Conference on Computational Linguistics, COLING 2022 - Gyeongju, Korea, Republic of
Duration: Oct 12 2022Oct 17 2022

Fingerprint

Dive into the research topics of 'Overview of the DagPap22 Shared Task on Detecting Automatically Generated Scientific Papers'. Together they form a unique fingerprint.

Cite this