Combining pattern matching with word embeddings for the extraction of experimental variables from scientific literature

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Scientists frequently use experiments published in other articles or reports by governing entities (e.g. NIH) as templates for reporting on their own experiments. Those templates occasionally change to reflect new discoveries. For creating retrospective studies and meta-analyses, finding the template parameters associated with scientific results can be critical. To aid in the extraction of experimental parameters (e.g. animal housing temperature) in a corpus of ∼8M scientific reports, we used a combination of pattern matching, part of speech tagging, units and measures extraction, and machine learning. We describe a use case where the housing temperature used for experiments involving mice was shown to impact their response to tumor reduction drugs. We show that 1) combining deep learning and pattern matching is a good model to address the problem described and 2) that researcher's behavior and experimental template usage takes a while to change after the publication of an important discovery.

Original languageEnglish
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsZoran Obradovic, Ricardo Baeza-Yates, Jeremy Kepner, Raghunath Nambiar, Chonggang Wang, Masashi Toyoda, Toyotaro Suzumura, Xiaohua Hu, Alfredo Cuzzocrea, Ricardo Baeza-Yates, Jian Tang, Hui Zang, Jian-Yun Nie, Rumi Ghosh
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4287-4292
Number of pages6
ISBN (Electronic)9781538627143
DOIs
StatePublished - Jan 12 2018
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: Dec 11 2017Dec 14 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
Volume2018-January

Conference

Conference5th IEEE International Conference on Big Data, Big Data 2017
Country/TerritoryUnited States
CityBoston
Period12/11/1712/14/17

Keywords

  • biomedical
  • machine learning
  • neural networks
  • pattern matching
  • regular expressions
  • Spark
  • units and measures

Fingerprint

Dive into the research topics of 'Combining pattern matching with word embeddings for the extraction of experimental variables from scientific literature'. Together they form a unique fingerprint.

Cite this