Concept recognition in French biomedical text using automatic translation

Zubair Afzal, Saber A. Akhondi, Herman H.H.B.M. van Haagen, Erik M. van Mulligen, Jan A. Kors

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

We describe the development of a concept recognition system for French documents and its application in task 1b of the 2015 CLEF eHealth challenge. This community challenge included recognition of entities in a French medical corpus, normalization of the recognized entities, and normalization of entity mentions that had been manually annotated. Normalization had to be based on the Unified Medical Language System (UMLS). We addressed all three subtasks by a dictionary-based approach using Peregrine, our open-source indexing engine. To increase the coverage of our initial French terminology, we explored the use of two automatic translators, Google Translate and Microsoft Translator, to translate English UMLS terms into French. The corpus consisted of 1665 titles of French Medline abstracts and 6 French drug labels of the European Medicines Agency (EMEA). The corpus was manually annotated with concepts from the UMLS, and split in an equally-sized training and test set. The best performance on the training set was obtained with a terminology that contained the intersection of the translated terms in combination with several post-processing steps to reduce the number of false-positive detections. When evaluated on the test set, our system achieved F-scores of 0.756 and 0.665 for entity recognition on the EMEA documents and Medline titles, respectively. For subsequent entity normalization, the F-scores were 0.711 and 0.587. Entity normalization given the manually annotated entity mentions resulted in F-scores of 0.872 and 0.671. Our system obtained the highest F-scores among the systems that participated in the challenge.

Original languageEnglish
Title of host publicationExperimental IR Meets Multilinguality, Multimodality, and Interaction - 7th International Conference of the CLEF Association, CLEF 2016, Proceedings
EditorsBirger Larsen, Linda Cappellato, Nicola Ferro, Norbert Fuhr, Krisztian Balog, Craig Macdonald, Paulo Quaresma, Teresa Gonçalves
PublisherSpringer Verlag
Pages162-173
Number of pages12
ISBN (Print)9783319445632
DOIs
StatePublished - 2016
Externally publishedYes
Event7th International Conference of the CLEF Association, CLEF 2016 - Evora, Portugal
Duration: Sep 5 2016Sep 8 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9822 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference7th International Conference of the CLEF Association, CLEF 2016
Country/TerritoryPortugal
CityEvora
Period09/5/1609/8/16

Keywords

  • Concept identification
  • Entity recognition
  • French terminology
  • Term translation

Fingerprint

Dive into the research topics of 'Concept recognition in French biomedical text using automatic translation'. Together they form a unique fingerprint.

Cite this