TY - GEN
T1 - Understanding the Impact of Entity Linking on the Topology of Entity Co-occurrence Networks for Social Media Analysis
AU - Nevin, James
AU - Zhang, Pengyu
AU - Dimitrov, Dimitar
AU - Lees, Michael
AU - Groth, Paul
AU - Dietze, Stefan
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - A common form of analysis of textual data is entity co-occurrence, where networks of entities and their connections within the text are constructed and their topology analysed. As the analysis is focused on the entities and their relations, the tools used to extract them can have a potentially large effect on the results. A frequently used method as part of these analyses is entity linking, where extracted entities are mapped to a knowledge graph. Many established entity linking tools have been created for long text following standard spelling and grammar rules. As a result, the tools struggle on short, unstructured text such as tweets. On such text, it can be difficult to choose between tools and parameter settings, especially since ground truth is often unavailable. Given these challenges in entity linking on text and the direct influence of extracted entities on subsequent network analysis, we propose the need to apply multiple tools to create a more holistic set of results. We verify this assertion through a set of experiments. Using a dataset of approximately 21 million English-language tweets, we construct multiple entity co-occurrence networks using two tools (Fast Entity Linker and DBpedia Spotlight) and numerous confidence thresholds for each. We find that standard network analysis metrics, such as size, connectivity, and centrality are all heavily influenced by the choice of entity linking tool.
AB - A common form of analysis of textual data is entity co-occurrence, where networks of entities and their connections within the text are constructed and their topology analysed. As the analysis is focused on the entities and their relations, the tools used to extract them can have a potentially large effect on the results. A frequently used method as part of these analyses is entity linking, where extracted entities are mapped to a knowledge graph. Many established entity linking tools have been created for long text following standard spelling and grammar rules. As a result, the tools struggle on short, unstructured text such as tweets. On such text, it can be difficult to choose between tools and parameter settings, especially since ground truth is often unavailable. Given these challenges in entity linking on text and the direct influence of extracted entities on subsequent network analysis, we propose the need to apply multiple tools to create a more holistic set of results. We verify this assertion through a set of experiments. Using a dataset of approximately 21 million English-language tweets, we construct multiple entity co-occurrence networks using two tools (Fast Entity Linker and DBpedia Spotlight) and numerous confidence thresholds for each. We find that standard network analysis metrics, such as size, connectivity, and centrality are all heavily influenced by the choice of entity linking tool.
KW - Co-occurrence networks
KW - Entity linking
KW - Network analysis
KW - Social media
UR - http://www.scopus.com/inward/record.url?scp=85210854271&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-77792-9_5
DO - 10.1007/978-3-031-77792-9_5
M3 - Contribución a la conferencia
AN - SCOPUS:85210854271
SN - 9783031777912
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 69
EP - 85
BT - Knowledge Engineering and Knowledge Management - 24th International Conference, EKAW 2024, Proceedings
A2 - Alam, Mehwish
A2 - Rospocher, Marco
A2 - van Erp, Marieke
A2 - Hollink, Laura
A2 - Gesese, Genet Asefa
PB - Springer Science and Business Media Deutschland GmbH
T2 - 24th International Conference on Knowledge Engineering and Knowledge Management, EKAW 2024
Y2 - 26 November 2024 through 28 November 2024
ER -