Abstract
We participated in task 2 of the CLEF eHealth 2016 chal-lenge. Two subtasks were addressed: entity recognition and normalization in a corpus of French drug labels and Medline titles, and ICD-10 coding of French death certificates. For both subtasks we used a dictionary-based approach. For entity recognition and normalization, we used Peregrine, our open-source indexing engine, with a dictionary based on French terms in the Unified Medical Language System (UMLS) supplemented with English UMLS terms that were translated into French with automatic translators. For ICD-10 coding, we used the Solr text tagger, together with one of two ICD-10 terminologies derived from the task training ma-terial. To reduce the number of false-positive detections, we implemented several post-processing steps. On the challenge test set, our best system obtained F-scores of 0.702 and 0.651 for entity recognition in the drug labels and in the Medline titles, respectively. For entity normalization, F-scores were 0.529 and 0.474. On the test set for ICD-10 coding, our system achieved an F-score of 0.848 (precision 0.886, recall 0.813). These scores were substantially higher than the average score of the systems that participated in the challenge.
Original language | English |
---|---|
Pages (from-to) | 171-178 |
Number of pages | 8 |
Journal | CEUR Workshop Proceedings |
Volume | 1609 |
State | Published - 2016 |
Externally published | Yes |
Event | 2016 Working Notes of Conference and Labs of the Evaluation Forum, CLEF 2016 - Evora, Portugal Duration: Sep 5 2016 → Sep 8 2016 |
Keywords
- Concept identification
- Entity recognition
- French terminology
- ICD-10 Coding
- Term translation