TY - JOUR
T1 - Extraction of chemical-induced diseases using prior knowledge and textual information
AU - Pons, Ewoud
AU - Becker, Benedikt F.H.
AU - Akhondi, Saber A.
AU - Afzal, Zubair
AU - Van Mulligen, Erik M.
AU - Kors, Jan A.
N1 - Publisher Copyright:
© The Author(s) 2016. Published by Oxford University Press.
PY - 2016
Y1 - 2016
N2 - We describe our approach to the chemical-disease relation (CDR) task in the BioCreative V challenge. The CDR task consists of two subtasks: Automatic disease-named entity recognition and normalization (DNER), and extraction of chemical-induced diseases (CIDs) from Medline abstracts. For the DNER subtask, we used our concept recognition tool Peregrine, in combination with several optimization steps. For the CID subtask, our system, which we named RELigator, was trained on a rich feature set, comprising features derived from a graph database containing prior knowledge about chemicals and diseases, and linguistic and statistical features derived from the abstracts in the CDR training corpus. We describe the systems that were developed and present evaluation results for both subtasks on the CDR test set. For DNER, our Peregrine system reached an F-score of 0.757. For CID, the system achieved an F-score of 0.526, which ranked second among 18 participating teams. Several post-challenge modifications of the systems resulted in substantially improved F-scores (0.828 for DNER and 0.602 for CID).
AB - We describe our approach to the chemical-disease relation (CDR) task in the BioCreative V challenge. The CDR task consists of two subtasks: Automatic disease-named entity recognition and normalization (DNER), and extraction of chemical-induced diseases (CIDs) from Medline abstracts. For the DNER subtask, we used our concept recognition tool Peregrine, in combination with several optimization steps. For the CID subtask, our system, which we named RELigator, was trained on a rich feature set, comprising features derived from a graph database containing prior knowledge about chemicals and diseases, and linguistic and statistical features derived from the abstracts in the CDR training corpus. We describe the systems that were developed and present evaluation results for both subtasks on the CDR test set. For DNER, our Peregrine system reached an F-score of 0.757. For CID, the system achieved an F-score of 0.526, which ranked second among 18 participating teams. Several post-challenge modifications of the systems resulted in substantially improved F-scores (0.828 for DNER and 0.602 for CID).
UR - http://www.scopus.com/inward/record.url?scp=84971006887&partnerID=8YFLogxK
U2 - 10.1093/database/baw046
DO - 10.1093/database/baw046
M3 - Artículo
C2 - 27081155
AN - SCOPUS:84971006887
SN - 1758-0463
VL - 2016
JO - Database
JF - Database
M1 - baw046
ER -