Abstract
Many new chemical compounds are reported each year in patent documents, leading to increasing demand for methods for automatic information extraction of chemical compounds and reactions from patents. Chemical patents often detail a number of similar compounds that have a common substructure and can be synthesized in analogous ways, and therefore contain many references connecting descriptions of similar chemical reactions, to avoid redundancy in describing common reaction conditions. This leads to the problem of reaction reference resolution, where, given a reaction description, we need to identify links to other reaction descriptions it refers to. In this paper, we formally introduce the task and propose baseline methods to address it in analogy with co-reference resolution. To evaluate the performance, we create a large-scale silver-standard dataset based on a commercial database of chemical reactions. The experimental results show that the approach based on a state-of-the-art co-reference resolution method struggles to outperform a simple heuristic in detecting reference links, demonstrating the difficulty of the proposed task and its fundamentally different nature to co-reference resolution.
Original language | English |
---|---|
Pages (from-to) | 10-17 |
Number of pages | 8 |
Journal | CEUR Workshop Proceedings |
Volume | 2909 |
State | Published - 2021 |
Externally published | Yes |
Event | 2nd Workshop on Patent Text Mining and Semantic Technologies, PatentSemTech 2021 - Virtual, Online Duration: Jul 15 2021 → … |
Keywords
- Information extraction
- Natural language processing
- Reaction reference resolution