Evaluation of string normalisation modules for string-based biomedical vocabularies alignment with AnAGram

Anique Van Berne, Veronique Malaisé

Research output: Contribution to journalConference articlepeer-review

Abstract

Biomedical vocabularies have specific characteristics that make their lexical alignment challenging. We have built a string-based vocabulary alignment tool, AnAGram, dedicated to efficiently compare terms in the biomedical domain, and evaluate this tool's results against an algorithm based on Jaro-Winkler's edit-distance. AnAGram is modular, enabling us to evaluate the precision and recall of different normalization procedures. Globally, our normalization and replacement strategy improves the F-measure score from the edit-distance experiment by more than 100%. Most of this increase can be explained by targeted transformations of the strings with the use of a dictionary of adjective/noun correspondences yielding useful results. However, we found that the classic Porter stemming algorithm needs to be adapted to the biomedical domain to give good quality results in this area.

Original languageEnglish
Pages (from-to)237-240
Number of pages4
JournalCEUR Workshop Proceedings
Volume1272
StatePublished - 2014
Externally publishedYes
EventISWC 2014 Posters and Demonstrations Track, ISWC-P and D 2014, 13th International Semantic Web Conference, ISWC 2014 - Riva del Garda, Italy
Duration: Oct 21 2014Oct 21 2014

Fingerprint

Dive into the research topics of 'Evaluation of string normalisation modules for string-based biomedical vocabularies alignment with AnAGram'. Together they form a unique fingerprint.

Cite this