Erasmus MC at CLEF eHealth 2016: Concept recognition and coding in French texts

Erik M. Van Mulligen, Zubair Afzal, Saber A. Akhondi, Dang Vo, Jan A. Kors

Research output: Contribution to journalConference articlepeer-review

16 Scopus citations

Abstract

We participated in task 2 of the CLEF eHealth 2016 chal-lenge. Two subtasks were addressed: entity recognition and normalization in a corpus of French drug labels and Medline titles, and ICD-10 coding of French death certificates. For both subtasks we used a dictionary-based approach. For entity recognition and normalization, we used Peregrine, our open-source indexing engine, with a dictionary based on French terms in the Unified Medical Language System (UMLS) supplemented with English UMLS terms that were translated into French with automatic translators. For ICD-10 coding, we used the Solr text tagger, together with one of two ICD-10 terminologies derived from the task training ma-terial. To reduce the number of false-positive detections, we implemented several post-processing steps. On the challenge test set, our best system obtained F-scores of 0.702 and 0.651 for entity recognition in the drug labels and in the Medline titles, respectively. For entity normalization, F-scores were 0.529 and 0.474. On the test set for ICD-10 coding, our system achieved an F-score of 0.848 (precision 0.886, recall 0.813). These scores were substantially higher than the average score of the systems that participated in the challenge.

Original languageEnglish
Pages (from-to)171-178
Number of pages8
JournalCEUR Workshop Proceedings
Volume1609
StatePublished - 2016
Externally publishedYes
Event2016 Working Notes of Conference and Labs of the Evaluation Forum, CLEF 2016 - Evora, Portugal
Duration: Sep 5 2016Sep 8 2016

Keywords

  • Concept identification
  • Entity recognition
  • French terminology
  • ICD-10 Coding
  • Term translation

Fingerprint

Dive into the research topics of 'Erasmus MC at CLEF eHealth 2016: Concept recognition and coding in French texts'. Together they form a unique fingerprint.

Cite this