On the semantic similarity of disease mentions in medline® and twitter

Camilo Thorne, Roman Klinger

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Social media mining is becoming an important technique to track the spread of infectious diseases and to understand specific needs of people affected by a medical condition. A common approach is to select a variety of synonyms for a disease derived from scientific literature to then retrieve social media posts for subsequent analysis. With this paper, we question the underlying assumption that user-generated text always makes use of such names, or assigns them the same meaning as in scientific literature. We analyze the most frequently used concepts in $$\textsc {medline}^{\circledR } $$ for semantic similarity to Twitter use and compare their normalized entropy and cosine similarities based on a simple distributional model. We find that diseases are referred to in semantically different ways in both corpora, a difference that increases in inverse proportion to the frequency of the synonym, and of the commonness of the disease or condition. These results imply that, when sampling social media for disease-related micro-blogs, query expressions must be carefully chosen, and even more so for rarily mentioned diseases or conditions.

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems - 23rd International Conference on Applications of Natural Language to Information Systems, NLDB 2018, Proceedings
EditorsFarid Meziane, Max Silberztein, Faten Atigui, Elena Kornyshova, Elisabeth Metais
PublisherSpringer Verlag
Pages324-332
Number of pages9
ISBN (Print)9783319919461
DOIs
StatePublished - 2018
Externally publishedYes
Event23rd International Conference on Natural Language and Information Systems, NLDB 2018 - Paris, France
Duration: Jun 13 2018Jun 15 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10859 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference23rd International Conference on Natural Language and Information Systems, NLDB 2018
Country/TerritoryFrance
CityParis
Period06/13/1806/15/18

Keywords

  • Disease names
  • Medline®
  • Social media mining
  • Twitter

Fingerprint

Dive into the research topics of 'On the semantic similarity of disease mentions in medline® and twitter'. Together they form a unique fingerprint.

Cite this