Abstract
Biomedical vocabularies have specific characteristics that make their lexical alignment challenging. We have built a string-based vocabulary alignment tool, AnAGram, dedicated to efficiently compare terms in the biomedical domain, and evaluate this tool's results against an algorithm based on Jaro-Winkler's edit-distance. AnAGram is modular, enabling us to evaluate the precision and recall of different normalization procedures. Globally, our normalization and replacement strategy improves the F-measure score from the edit-distance experiment by more than 100%. Most of this increase can be explained by targeted transformations of the strings with the use of a dictionary of adjective/noun correspondences yielding useful results. However, we found that the classic Porter stemming algorithm needs to be adapted to the biomedical domain to give good quality results in this area.
| Original language | English |
|---|---|
| Pages (from-to) | 237-240 |
| Number of pages | 4 |
| Journal | CEUR Workshop Proceedings |
| Volume | 1272 |
| State | Published - 2014 |
| Externally published | Yes |
| Event | ISWC 2014 Posters and Demonstrations Track, ISWC-P and D 2014, 13th International Semantic Web Conference, ISWC 2014 - Riva del Garda, Italy Duration: Oct 21 2014 → Oct 21 2014 |