TY - GEN
T1 - When the How Outweighs the What
T2 - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
AU - Lyman, Cole A.
AU - Anderson, Connor
AU - Morris, Matt
AU - Nandal, Umesh K.
AU - Martindale, Marianna J.
AU - Clement, Mark
AU - Broderick, Gordon
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - A growing body of knowledge about biological mechanisms and interaction of biological components is contained in the peer-reviewed scientific literature. In order to leverage this knowledge towards the development of predictive models, one must first extract these relationships from the text. However, the context in which the interaction was reported is critical in ensuring that it is used in a manner consistent with the model's intended application. Here we assess the applicability of two generic automated methods for leveraging a broader contextual structure in the more specific domain of a biological experiment using only the paper's title and abstract. In an example use case, a Support Vector Machine (SVM) and two variants of the broadly-used Bidirectional Encoder Representations from Transformers (BERT) neural network model, serve to distinguish mouse from human subject experiments in a corpus of over 12,000 papers documenting mechanistic interactions in a regulatory model of of mucosal immune signaling. The BERT and domain-specific BioBERT yielded essentially equivalent classification accuracy with both neural network models performing only marginally better than the SVM. Words occurring frequently in abstracts were largely non-specific, whereas words unique to each class were used in 4% or less of the abstracts. These high-specificity words were used in very similar contexts that separated mouse and human study abstracts on the basis of study design and experimental procedure rather than species or basic biological markers.
AB - A growing body of knowledge about biological mechanisms and interaction of biological components is contained in the peer-reviewed scientific literature. In order to leverage this knowledge towards the development of predictive models, one must first extract these relationships from the text. However, the context in which the interaction was reported is critical in ensuring that it is used in a manner consistent with the model's intended application. Here we assess the applicability of two generic automated methods for leveraging a broader contextual structure in the more specific domain of a biological experiment using only the paper's title and abstract. In an example use case, a Support Vector Machine (SVM) and two variants of the broadly-used Bidirectional Encoder Representations from Transformers (BERT) neural network model, serve to distinguish mouse from human subject experiments in a corpus of over 12,000 papers documenting mechanistic interactions in a regulatory model of of mucosal immune signaling. The BERT and domain-specific BioBERT yielded essentially equivalent classification accuracy with both neural network models performing only marginally better than the SVM. Words occurring frequently in abstracts were largely non-specific, whereas words unique to each class were used in 4% or less of the abstracts. These high-specificity words were used in very similar contexts that separated mouse and human study abstracts on the basis of study design and experimental procedure rather than species or basic biological markers.
KW - causal modelling
KW - contextual embedding
KW - document classification
KW - immune signaling
KW - natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85084341414&partnerID=8YFLogxK
U2 - 10.1109/BIBM47256.2019.8983294
DO - 10.1109/BIBM47256.2019.8983294
M3 - Contribución a la conferencia
AN - SCOPUS:85084341414
T3 - Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
SP - 2149
EP - 2156
BT - Proceedings - 2019 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2019
A2 - Yoo, Illhoi
A2 - Bi, Jinbo
A2 - Hu, Xiaohua Tony
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 18 November 2019 through 21 November 2019
ER -