TY - GEN
T1 - Exploring Pre-Trained Language Models to Build Knowledge Graph for Metal-Organic Frameworks (MOFs)
AU - An, Yuan
AU - Greenberg, Jane
AU - Hu, Xiaohua
AU - Kalinowski, Alex
AU - Fang, Xiao
AU - Zhao, Xintong
AU - McClellan, Scott
AU - Uribe-Romo, Fernando J.
AU - Langlois, Kyle
AU - Furst, Jacob
AU - Gomez-Gualdron, Diego A.
AU - Fajardo-Rojas, Fernando
AU - Ardila, Katherine
AU - Saikin, Semion K.
AU - Harper, Corey A.
AU - Daniel, Ron
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Building a knowledge graph is a time-consuming and costly process which often applies complex natural language processing (NLP) methods for extracting knowledge graph triples from text corpora. Pre-trained large Language Models (PLM) have emerged as a crucial type of approach that provides readily available knowledge for a range of AI applications. However, it is unclear whether it is feasible to construct domain-specific knowledge graphs from PLMs. Motivated by the capacity of knowledge graphs to accelerate data-driven materials discovery, we explored a set of state-of-the-art pre-trained general-purpose and domain-specific language models to extract knowledge triples for metal-organic frameworks (MOFs). We created a knowledge graph benchmark with 7 relations for 1248 published MOF synonyms. Our experimental results showed that domain-specific PLMs consistently outperformed the general-purpose PLMs for predicting MOF related triples. The overall benchmarking results, however, show that using the present PLMs to create domain-specific knowledge graphs is still far from being practical, motivating the need to develop more capable and knowledgeable pre-trained language models for particular applications in materials science.
AB - Building a knowledge graph is a time-consuming and costly process which often applies complex natural language processing (NLP) methods for extracting knowledge graph triples from text corpora. Pre-trained large Language Models (PLM) have emerged as a crucial type of approach that provides readily available knowledge for a range of AI applications. However, it is unclear whether it is feasible to construct domain-specific knowledge graphs from PLMs. Motivated by the capacity of knowledge graphs to accelerate data-driven materials discovery, we explored a set of state-of-the-art pre-trained general-purpose and domain-specific language models to extract knowledge triples for metal-organic frameworks (MOFs). We created a knowledge graph benchmark with 7 relations for 1248 published MOF synonyms. Our experimental results showed that domain-specific PLMs consistently outperformed the general-purpose PLMs for predicting MOF related triples. The overall benchmarking results, however, show that using the present PLMs to create domain-specific knowledge graphs is still far from being practical, motivating the need to develop more capable and knowledgeable pre-trained language models for particular applications in materials science.
KW - Knowledge Graph
KW - Materials Science
KW - Metal-Organic Frameworks
KW - Pre-trained Language Model
KW - Prompt Probing
UR - http://www.scopus.com/inward/record.url?scp=85147906256&partnerID=8YFLogxK
U2 - 10.1109/BigData55660.2022.10020568
DO - 10.1109/BigData55660.2022.10020568
M3 - Contribución a la conferencia
AN - SCOPUS:85147906256
T3 - Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
SP - 3651
EP - 3658
BT - Proceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
A2 - Tsumoto, Shusaku
A2 - Ohsawa, Yukio
A2 - Chen, Lei
A2 - Van den Poel, Dirk
A2 - Hu, Xiaohua
A2 - Motomura, Yoichi
A2 - Takagi, Takuya
A2 - Wu, Lingfei
A2 - Xie, Ying
A2 - Abe, Akihiro
A2 - Raghavan, Vijay
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Conference on Big Data, Big Data 2022
Y2 - 17 December 2022 through 20 December 2022
ER -