TY - JOUR
T1 - Automatic generation of case-detection algorithms to identify children with asthma from large electronic health record databases
AU - Afzal, Zubair
AU - Engelkes, Marjolein
AU - Verhamme, Katia M.C.
AU - Janssens, Hettie M.
AU - Sturkenboom, Miriam C.J.M.
AU - Kors, Jan A.
AU - Schuemie, Martijn J.
PY - 2013/8
Y1 - 2013/8
N2 - Purpose: Most electronic health record databases contain unstructured free-text narratives, which cannot be easily analyzed. Case-detection algorithms are usually created manually and often rely only on using coded information such as International Classification of Diseases version 9 codes. We applied a machine-learning approach to generate and evaluate an automated case-detection algorithm that uses both free-text and coded information to identify asthma cases. Methods: The Integrated Primary Care Information (IPCI) database was searched for potential asthma patients aged 5-18years using a broad query on asthma-related codes, drugs, and free text. A training set of 5032 patients was created by manually annotating the potential patients as definite, probable, or doubtful asthma cases or non-asthma cases. The rule-learning program RIPPER was then used to generate algorithms to distinguish cases from non-cases. An over-sampling method was used to balance the performance of the automated algorithm to meet our study requirements. Performance of the automated algorithm was evaluated against the manually annotated set. Results: The selected algorithm yielded a positive predictive value (PPV) of 0.66, sensitivity of 0.98, and specificity of 0.95 when identifying only definite asthma cases; a PPV of 0.82, sensitivity of 0.96, and specificity of 0.90 when identifying both definite and probable asthma cases; and a PPV of 0.57, sensitivity of 0.95, and specificity of 0.67 for the scenario identifying definite, probable, and doubtful asthma cases. Conclusions: The automated algorithm shows good performance in detecting cases of asthma utilizing both free-text and coded data. This algorithm will facilitate large-scale studies of asthma in the IPCI database.
AB - Purpose: Most electronic health record databases contain unstructured free-text narratives, which cannot be easily analyzed. Case-detection algorithms are usually created manually and often rely only on using coded information such as International Classification of Diseases version 9 codes. We applied a machine-learning approach to generate and evaluate an automated case-detection algorithm that uses both free-text and coded information to identify asthma cases. Methods: The Integrated Primary Care Information (IPCI) database was searched for potential asthma patients aged 5-18years using a broad query on asthma-related codes, drugs, and free text. A training set of 5032 patients was created by manually annotating the potential patients as definite, probable, or doubtful asthma cases or non-asthma cases. The rule-learning program RIPPER was then used to generate algorithms to distinguish cases from non-cases. An over-sampling method was used to balance the performance of the automated algorithm to meet our study requirements. Performance of the automated algorithm was evaluated against the manually annotated set. Results: The selected algorithm yielded a positive predictive value (PPV) of 0.66, sensitivity of 0.98, and specificity of 0.95 when identifying only definite asthma cases; a PPV of 0.82, sensitivity of 0.96, and specificity of 0.90 when identifying both definite and probable asthma cases; and a PPV of 0.57, sensitivity of 0.95, and specificity of 0.67 for the scenario identifying definite, probable, and doubtful asthma cases. Conclusions: The automated algorithm shows good performance in detecting cases of asthma utilizing both free-text and coded data. This algorithm will facilitate large-scale studies of asthma in the IPCI database.
KW - Automated case definition
KW - Case-detection algorithms
KW - Electronic medical records
KW - Machine learning
KW - Pharmacoepidemiology
UR - https://www.scopus.com/pages/publications/84880712055
U2 - 10.1002/pds.3438
DO - 10.1002/pds.3438
M3 - Artículo
C2 - 23592573
AN - SCOPUS:84880712055
SN - 1053-8569
VL - 22
SP - 826
EP - 833
JO - Pharmacoepidemiology and Drug Safety
JF - Pharmacoepidemiology and Drug Safety
IS - 8
ER -