TY - JOUR
T1 - An Extended Overview of the CLEF 2020 ChEMU Lab
T2 - 11th Conference and Labs of the Evaluation Forum, CLEF 2020
AU - He, Jiayuan
AU - Nguyen, Dat Quoc
AU - Akhondi, Saber A.
AU - Druckenbrodt, Christian
AU - Thorne, Camilo
AU - Hoessel, Ralph
AU - Afzal, Zubair
AU - Zhai, Zenan
AU - Fang, Biaoyan
AU - Yoshikawa, Hiyori
AU - Albahem, Ameer
AU - Wang, Jingqi
AU - Ren, Yuankai
AU - Zhang, Zhi
AU - Zhang, Yaoyun
AU - Dao, Mai Hoang
AU - Ruas, Pedro
AU - Lamurias, Andre
AU - Couto, Francisco M.
AU - Copara, Jenny
AU - Naderi, Nona
AU - Knafou, Julien
AU - Ruch, Patrick
AU - Teodoro, Douglas
AU - Lowe, Daniel
AU - Mayfield, John
AU - Köksal, Abdullatif
AU - Dönmez, Hilal
AU - Özkırımlı, Elif
AU - Özgür, Arzucan
AU - Mahendran, Darshini
AU - Gurdin, Gabrielle
AU - Lewinski, Nastassja
AU - Tang, Christina
AU - McInnes, Bridget T.
AU - Malarkodi, C. S.
AU - Rao, Pattabhi R.K.
AU - Devi, Sobha Lalitha
AU - Cavedon, Lawrence
AU - Cohn, Trevor
AU - Baldwin, Timothy
AU - Verspoor, Karin
N1 - Publisher Copyright:
Copyright © 2020 for this paper by its authors.
PY - 2020
Y1 - 2020
N2 - The discovery of new chemical compounds is perceived as a key driver of the chemistry industry and many other economic sectors. The information about the new discoveries are usually disclosed in scientific literature and in particular, in chemical patents, since patents are often the first venues where the new chemical compounds are publicized. Despite the significance of the information provided in chemical patents, extracting the information from patents is costly due to the large volume of existing patents and its drastic expansion rate. The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), provides a platform to advance the state-of-the-arts in automatic information extraction systems over chemical patents. In particular, we focus on extracting synthesis process of new chemical compounds from chemical patents. Using the ChEMU corpus of 1500 “snippets” (text segments) sampled from 170 patent documents and annotated by chemical experts, we defined two key information extraction tasks. Task 1 targets at chemical named entity recognition, i.e., the identification of chemical compounds and their specific roles in chemical reactions. Task 2 targets at event extraction, i.e., the identification of reaction steps, relating the chemical compounds involved in a chemical reaction. In this paper, we provide an overview of our ChEMU2020 lab. Herein, we describe the resources created for the two tasks, the evaluation methodology adopted, and participants results. We also provide a brief summary of the methods employed by participants of this lab and the results obtained across 46 runs from 11 teams, finding that several submissions achieve substantially better results than the baseline methods prepared by the organizers.
AB - The discovery of new chemical compounds is perceived as a key driver of the chemistry industry and many other economic sectors. The information about the new discoveries are usually disclosed in scientific literature and in particular, in chemical patents, since patents are often the first venues where the new chemical compounds are publicized. Despite the significance of the information provided in chemical patents, extracting the information from patents is costly due to the large volume of existing patents and its drastic expansion rate. The Cheminformatics Elsevier Melbourne University (ChEMU) evaluation lab 2020, part of the Conference and Labs of the Evaluation Forum 2020 (CLEF2020), provides a platform to advance the state-of-the-arts in automatic information extraction systems over chemical patents. In particular, we focus on extracting synthesis process of new chemical compounds from chemical patents. Using the ChEMU corpus of 1500 “snippets” (text segments) sampled from 170 patent documents and annotated by chemical experts, we defined two key information extraction tasks. Task 1 targets at chemical named entity recognition, i.e., the identification of chemical compounds and their specific roles in chemical reactions. Task 2 targets at event extraction, i.e., the identification of reaction steps, relating the chemical compounds involved in a chemical reaction. In this paper, we provide an overview of our ChEMU2020 lab. Herein, we describe the resources created for the two tasks, the evaluation methodology adopted, and participants results. We also provide a brief summary of the methods employed by participants of this lab and the results obtained across 46 runs from 11 teams, finding that several submissions achieve substantially better results than the baseline methods prepared by the organizers.
KW - Chemical reactions
KW - Event extraction
KW - Information extraction
KW - Named entity recognition
KW - Patent text mining
UR - http://www.scopus.com/inward/record.url?scp=85121760827&partnerID=8YFLogxK
M3 - Artículo de la conferencia
AN - SCOPUS:85121760827
SN - 1613-0073
VL - 2696
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
Y2 - 22 September 2020 through 25 September 2020
ER -