Abstract
Biomedical vocabularies have specific characteristics that make their lexical alignment challenging. We have built a string-based vocabulary alignment tool, AnAGram, dedicated to efficiently compare terms in the biomedical domain, and evaluate this tool's results against an algorithm based on Jaro-Winkler's edit-distance. AnAGram is modular, enabling us to evaluate the precision and recall of different normalization procedures. Globally, our normalization and replacement strategy improves the F-measure score from the edit-distance experiment by more than 100%. Most of this increase can be explained by targeted transformations of the strings with the use of a dictionary of adjective/noun correspondences yielding useful results. However, we found that the classic Porter stemming algorithm needs to be adapted to the biomedical domain to give good quality results in this area.
Original language | English |
---|---|
Pages (from-to) | 237-240 |
Number of pages | 4 |
Journal | CEUR Workshop Proceedings |
Volume | 1272 |
State | Published - 2014 |
Externally published | Yes |
Event | ISWC 2014 Posters and Demonstrations Track, ISWC-P and D 2014, 13th International Semantic Web Conference, ISWC 2014 - Riva del Garda, Italy Duration: Oct 21 2014 → Oct 21 2014 |