TY - GEN
T1 - Annotating Research Infrastructure in Scientifc Papers
T2 - 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
AU - Tabatabaei, Seyed Amin
AU - Cheirmpos, Georgios
AU - Doornenbal, Marius
AU - Zigoni, Alberto
AU - Moore, Veronique
AU - Tsatsaronis, Georgios
N1 - Publisher Copyright:
© ACL 2023.All rights reserved.
PY - 2023
Y1 - 2023
N2 - In this work, we present a natural language processing (NLP) pipeline for the identifcation, extraction and linking of Research Infrastructure (RI) used in scientifc publications. Links between scientifc equipment and publications where the equipment was used can support multiple use cases, such as evaluating the impact of RI investment, and supporting Open Science and research reproducibility. These links can also be used to establish a profle of the RI portfolio of each institution and associate each equipment with scientifc output. The system we are describing here is already in production, and has been used to address real business use cases, some of which we discuss in this paper. The computational pipeline at the heart of the system comprises both supervised and unsuper-vised modules to detect the usage of research equipment by processing the full text of the articles. Additionally, we have created a knowledge graph of RI, which is utilized to annotate the articles with metadata. Finally, examples of the business value of the insights made possible by this NLP pipeline are illustrated.
AB - In this work, we present a natural language processing (NLP) pipeline for the identifcation, extraction and linking of Research Infrastructure (RI) used in scientifc publications. Links between scientifc equipment and publications where the equipment was used can support multiple use cases, such as evaluating the impact of RI investment, and supporting Open Science and research reproducibility. These links can also be used to establish a profle of the RI portfolio of each institution and associate each equipment with scientifc output. The system we are describing here is already in production, and has been used to address real business use cases, some of which we discuss in this paper. The computational pipeline at the heart of the system comprises both supervised and unsuper-vised modules to detect the usage of research equipment by processing the full text of the articles. Additionally, we have created a knowledge graph of RI, which is utilized to annotate the articles with metadata. Finally, examples of the business value of the insights made possible by this NLP pipeline are illustrated.
UR - http://www.scopus.com/inward/record.url?scp=85174256794&partnerID=8YFLogxK
U2 - 10.18653/v1/2023.acl-industry.44
DO - 10.18653/v1/2023.acl-industry.44
M3 - Contribución a la conferencia
AN - SCOPUS:85174256794
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 457
EP - 463
BT - Industry Track
PB - Association for Computational Linguistics (ACL)
Y2 - 9 July 2023 through 14 July 2023
ER -