An Extended Overview of the CLEF 2020 ChEMU Lab: Information Extraction of Chemical Reactions from Patents

Jiayuan He, Dat Quoc Nguyen, Saber A. Akhondi, Christian Druckenbrodt, Camilo Thorne, Ralph Hoessel, Zubair Afzal, Zenan Zhai, Biaoyan Fang, Hiyori Yoshikawa, Ameer Albahem, Jingqi Wang, Yuankai Ren, Zhi Zhang, Yaoyun Zhang, Mai Hoang Dao, Pedro Ruas, Andre Lamurias, Francisco M. Couto, Jenny CoparaNona Naderi, Julien Knafou, Patrick Ruch, Douglas Teodoro, Daniel Lowe, John Mayfield, Abdullatif Köksal, Hilal Dönmez, Elif Özkırımlı, Arzucan Özgür, Darshini Mahendran, Gabrielle Gurdin, Nastassja Lewinski, Christina Tang, Bridget T. McInnes, C. S. Malarkodi, Pattabhi R.K. Rao, Sobha Lalitha Devi, Lawrence Cavedon, Trevor Cohn, Timothy Baldwin, Karin Verspoor

Research output: Contribution to journalConference articlepeer-review

2 Scopus citations

Abstract

The discovery of new chemical compounds is perceived as a key driver of the chemistry industry and many other economic sectors. The information about the new discoveries are usually disclosed in scientific literature and in particular, in chemical patents, since patents are often the first venues where the new chemical compounds are publicized. Despite the significance of the information provided in chemical patents, extracting the information from patents is costly due to the large volume of existing patents and its drastic expansion rate. The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), provides a platform to advance the state-of-the-arts in automatic information extraction systems over chemical patents. In particular, we focus on extracting synthesis process of new chemical compounds from chemical patents. Using the ChEMU corpus of 1500 “snippets” (text segments) sampled from 170 patent documents and annotated by chemical experts, we defined two key information extraction tasks. Task 1 targets at chemical named entity recognition, i.e., the identification of chemical compounds and their specific roles in chemical reactions. Task 2 targets at event extraction, i.e., the identification of reaction steps, relating the chemical compounds involved in a chemical reaction. In this paper, we provide an overview of our ChEMU2020 lab. Herein, we describe the resources created for the two tasks, the evaluation methodology adopted, and participants results. We also provide a brief summary of the methods employed by participants of this lab and the results obtained across 46 runs from 11 teams, finding that several submissions achieve substantially better results than the baseline methods prepared by the organizers.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume2696
StatePublished - 2020
Externally publishedYes
Event11th Conference and Labs of the Evaluation Forum, CLEF 2020 - Thessaloniki, Greece
Duration: Sep 22 2020Sep 25 2020

Keywords

  • Chemical reactions
  • Event extraction
  • Information extraction
  • Named entity recognition
  • Patent text mining

Fingerprint

Dive into the research topics of 'An Extended Overview of the CLEF 2020 ChEMU Lab: Information Extraction of Chemical Reactions from Patents'. Together they form a unique fingerprint.

Cite this