TY - JOUR
T1 - Structural properties as proxy for semantic relevance in RDF graph sampling
AU - Rietveld, Laurens
AU - Hoekstra, Rinke
AU - Schlobach, Stefan
AU - Guéret, Christophe
AU - Beek, Wouter
N1 - Publisher Copyright:
© 2014 University of Groningen. All rights reserved.
PY - 2014
Y1 - 2014
N2 - The Linked Data cloud has grown to become the largest knowledge base ever constructed. Its size is now turning into a major bottleneck for many applications. In order to facilitate access to this structured information, this paper proposes an automatic sampling method targeted at maximizing answer coverage for applications using SPARQL querying. The approach presented in this paper is novel: no similar RDF sampling approach exist. Additionally, the concept of creating a sample aimed at maximizing SPARQL answer coverage, is unique. We empirically show that the relevance of triples for sampling (a semantic notion) is influenced by the topology of the graph (purely structural), and can be determined without prior knowledge of the queries. Experiments show a significantly higher recall of topology based sampling methods over random and naive baseline approaches (e.g. up to 90% for Open-BioMed at a sample size of 6%).
AB - The Linked Data cloud has grown to become the largest knowledge base ever constructed. Its size is now turning into a major bottleneck for many applications. In order to facilitate access to this structured information, this paper proposes an automatic sampling method targeted at maximizing answer coverage for applications using SPARQL querying. The approach presented in this paper is novel: no similar RDF sampling approach exist. Additionally, the concept of creating a sample aimed at maximizing SPARQL answer coverage, is unique. We empirically show that the relevance of triples for sampling (a semantic notion) is influenced by the topology of the graph (purely structural), and can be determined without prior knowledge of the queries. Experiments show a significantly higher recall of topology based sampling methods over random and naive baseline approaches (e.g. up to 90% for Open-BioMed at a sample size of 6%).
UR - http://www.scopus.com/inward/record.url?scp=85072672810&partnerID=8YFLogxK
M3 - Artículo de la conferencia
AN - SCOPUS:85072672810
SN - 1568-7805
SP - 145
EP - 146
JO - Belgian/Netherlands Artificial Intelligence Conference
JF - Belgian/Netherlands Artificial Intelligence Conference
T2 - 26th Benelux Conference on Artificial Intelligence, BNAIC 2014
Y2 - 6 November 2014 through 7 November 2014
ER -