Exploring Pre-Trained Language Models to Build Knowledge Graph for Metal-Organic Frameworks (MOFs)

Yuan An, Jane Greenberg, Xiaohua Hu, Alex Kalinowski, Xiao Fang, Xintong Zhao, Scott McClellan, Fernando J. Uribe-Romo, Kyle Langlois, Jacob Furst, Diego A. Gomez-Gualdron, Fernando Fajardo-Rojas, Katherine Ardila, Semion K. Saikin, Corey A. Harper, Ron Daniel

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Scopus citations

Abstract

Building a knowledge graph is a time-consuming and costly process which often applies complex natural language processing (NLP) methods for extracting knowledge graph triples from text corpora. Pre-trained large Language Models (PLM) have emerged as a crucial type of approach that provides readily available knowledge for a range of AI applications. However, it is unclear whether it is feasible to construct domain-specific knowledge graphs from PLMs. Motivated by the capacity of knowledge graphs to accelerate data-driven materials discovery, we explored a set of state-of-the-art pre-trained general-purpose and domain-specific language models to extract knowledge triples for metal-organic frameworks (MOFs). We created a knowledge graph benchmark with 7 relations for 1248 published MOF synonyms. Our experimental results showed that domain-specific PLMs consistently outperformed the general-purpose PLMs for predicting MOF related triples. The overall benchmarking results, however, show that using the present PLMs to create domain-specific knowledge graphs is still far from being practical, motivating the need to develop more capable and knowledgeable pre-trained language models for particular applications in materials science.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022
EditorsShusaku Tsumoto, Yukio Ohsawa, Lei Chen, Dirk Van den Poel, Xiaohua Hu, Yoichi Motomura, Takuya Takagi, Lingfei Wu, Ying Xie, Akihiro Abe, Vijay Raghavan
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3651-3658
Number of pages8
ISBN (Electronic)9781665480451
DOIs
StatePublished - 2022
Externally publishedYes
Event2022 IEEE International Conference on Big Data, Big Data 2022 - Osaka, Japan
Duration: Dec 17 2022Dec 20 2022

Publication series

NameProceedings - 2022 IEEE International Conference on Big Data, Big Data 2022

Conference

Conference2022 IEEE International Conference on Big Data, Big Data 2022
Country/TerritoryJapan
CityOsaka
Period12/17/2212/20/22

Keywords

  • Knowledge Graph
  • Materials Science
  • Metal-Organic Frameworks
  • Pre-trained Language Model
  • Prompt Probing

Fingerprint

Dive into the research topics of 'Exploring Pre-Trained Language Models to Build Knowledge Graph for Metal-Organic Frameworks (MOFs)'. Together they form a unique fingerprint.

Cite this