When the How Outweighs the What: The Pivotal Importance of Context

Cole A. Lyman, Connor Anderson, Matt Morris, Umesh K. Nandal, Marianna J. Martindale, Mark Clement, Gordon Broderick

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

A growing body of knowledge about biological mechanisms and interaction of biological components is contained in the peer-reviewed scientific literature. In order to leverage this knowledge towards the development of predictive models, one must first extract these relationships from the text. However, the context in which the interaction was reported is critical in ensuring that it is used in a manner consistent with the model's intended application. Here we assess the applicability of two generic automated methods for leveraging a broader contextual structure in the more specific domain of a biological experiment using only the paper's title and abstract. In an example use case, a Support Vector Machine (SVM) and two variants of the broadly-used Bidirectional Encoder Representations from Transformers (BERT) neural network model, serve to distinguish mouse from human subject experiments in a corpus of over 12,000 papers documenting mechanistic interactions in a regulatory model of of mucosal immune signaling. The BERT and domain-specific BioBERT yielded essentially equivalent classification accuracy with both neural network models performing only marginally better than the SVM. Words occurring frequently in abstracts were largely non-specific, whereas words unique to each class were used in 4% or less of the abstracts. These high-specificity words were used in very similar contexts that separated mouse and human study abstracts on the basis of study design and experimental procedure rather than species or basic biological markers.

Original languageEnglish
Title of host publicationProceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
EditorsIllhoi Yoo, Jinbo Bi, Xiaohua Tony Hu
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2149-2156
Number of pages8
ISBN (Electronic)9781728118673
DOIs
StatePublished - Nov 2019
Externally publishedYes
Event2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019 - San Diego, United States
Duration: Nov 18 2019Nov 21 2019

Publication series

NameProceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019

Conference

Conference2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
Country/TerritoryUnited States
CitySan Diego
Period11/18/1911/21/19

Keywords

  • causal modelling
  • contextual embedding
  • document classification
  • immune signaling
  • natural language processing

Fingerprint

Dive into the research topics of 'When the How Outweighs the What: The Pivotal Importance of Context'. Together they form a unique fingerprint.

Cite this