TY - JOUR
T1 - A survey of large language models for data challenges in graphs
AU - Li, Mengran
AU - Zhang, Pengyu
AU - Xing, Wenbin
AU - Zheng, Yijia
AU - Zaporojets, Klim
AU - Chen, Junzhou
AU - Zhang, Ronghui
AU - Zhang, Yong
AU - Gong, Siyuan
AU - Hu, Jia
AU - Ma, Xiaolei
AU - Liu, Zhiyuan
AU - Groth, Paul
AU - Worring, Marcel
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2026/3/1
Y1 - 2026/3/1
N2 - Graphs are a widely used paradigm for representing non-Euclidean data, with applications ranging from social network analysis to biomolecular prediction. While graph learning has achieved remarkable progress, real-world graph data presents a number of challenges that significantly hinder the learning process. In this survey, we focus on four fundamental data-centric challenges: (1) Incompleteness, real-world graphs have missing nodes, edges, or attributes; (2) Imbalance, the distribution of the labels of nodes or edges and their structures for real-world graphs are highly skewed; (3) Cross-domain Heterogeneity, graphs from different domains exhibit incompatible feature spaces or structural patterns; and (4) Dynamic Instability, graphs evolve over time in unpredictable ways. Recently, Large Language Models (LLMs) offer the potential to tackle these challenges by leveraging rich semantic reasoning and external knowledge. This survey focuses on how LLMs can address four fundamental data-centric challenges in graph-structured data, thereby improving the effectiveness of graph learning. For each challenge, we review both traditional solutions and modern LLM-driven approaches, highlighting how LLMs contribute unique advantages. Finally, we discuss open research questions and promising future directions in this emerging interdisciplinary field. To support further exploration, we have curated a repository of recent advances on graph learning challenges: https://github.com/limengran98/Awesome-Literature-Graph-Learning-Challenges.
AB - Graphs are a widely used paradigm for representing non-Euclidean data, with applications ranging from social network analysis to biomolecular prediction. While graph learning has achieved remarkable progress, real-world graph data presents a number of challenges that significantly hinder the learning process. In this survey, we focus on four fundamental data-centric challenges: (1) Incompleteness, real-world graphs have missing nodes, edges, or attributes; (2) Imbalance, the distribution of the labels of nodes or edges and their structures for real-world graphs are highly skewed; (3) Cross-domain Heterogeneity, graphs from different domains exhibit incompatible feature spaces or structural patterns; and (4) Dynamic Instability, graphs evolve over time in unpredictable ways. Recently, Large Language Models (LLMs) offer the potential to tackle these challenges by leveraging rich semantic reasoning and external knowledge. This survey focuses on how LLMs can address four fundamental data-centric challenges in graph-structured data, thereby improving the effectiveness of graph learning. For each challenge, we review both traditional solutions and modern LLM-driven approaches, highlighting how LLMs contribute unique advantages. Finally, we discuss open research questions and promising future directions in this emerging interdisciplinary field. To support further exploration, we have curated a repository of recent advances on graph learning challenges: https://github.com/limengran98/Awesome-Literature-Graph-Learning-Challenges.
KW - Cross-domain graph heterogeneity
KW - Data imbalance
KW - Dynamic graph instability
KW - Graph incompleteness
KW - Graph learning
KW - Large language models
UR - https://www.scopus.com/pages/publications/105015635501
U2 - 10.1016/j.eswa.2025.129643
DO - 10.1016/j.eswa.2025.129643
M3 - Artículo de revisión
AN - SCOPUS:105015635501
SN - 0957-4174
VL - 298
JO - Expert Systems with Applications
JF - Expert Systems with Applications
M1 - 129643
ER -