Automatically estimating the incidence of symptoms recorded in GP free text notes

Rob Koeling, A. Rosemary Tate, John A. Carroll

    Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

    15 Scopus citations

    Abstract

    The UK General Practice Research Database (GPRD) is a valuable source of information for health services research. It contains coded data supplemented by free text (physicians' notes and letters). However, due to the difficulty of extracting useful information and the cost of anonymisation, this text is seldom utilised in epidemiological research. We annotated the records of 344 women in the year prior to a diagnosis of ovarian cancer and developed a method for automatically detecting mentions of symptoms in text. We estimated the incidence of five commonly presenting symptoms using: (1) coded symptoms, (2) codes augmented by symptoms automatically extracted from text, and (3) a 'gold standard' dataset of codes and text tagged by three clinically trained annotators. The estimates of incidence of each symptom increased by at least 40% when coded information was enhanced using the manually tagged free text. Our automatic method extracted a significant proportion of this extra information. Our straightforward approach should be extremely useful for medical researchers who wish to validate studies based on codes, or to accurately assess symptoms, using information that can be automatically extracted from unanonymised free text.

    Original languageEnglish
    Title of host publicationCIKM 2011 Glasgow
    Subtitle of host publicationMIXHS'11 - Proceedings of the 1st International Workshop on Managing Interoperability and Complexity in Health Systems
    Pages43-49
    Number of pages7
    DOIs
    StatePublished - 2011
    Event1st International Workshop on Managing Interoperability and compleXity in Health Systems, MIXHS'11, Collocated with the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011 - Glasgow, United Kingdom
    Duration: Oct 28 2011Oct 28 2011

    Publication series

    NameInternational Conference on Information and Knowledge Management, Proceedings

    Conference

    Conference1st International Workshop on Managing Interoperability and compleXity in Health Systems, MIXHS'11, Collocated with the 20th ACM International Conference on Information and Knowledge Management, CIKM 2011
    Country/TerritoryUnited Kingdom
    CityGlasgow
    Period10/28/1110/28/11

    Keywords

    • clinical data
    • epidemiology
    • information extraction
    • primary care health records

    Fingerprint

    Dive into the research topics of 'Automatically estimating the incidence of symptoms recorded in GP free text notes'. Together they form a unique fingerprint.

    Cite this