Structural properties as proxy for semantic relevance in RDF graph sampling

Laurens Rietveld, Rinke Hoekstra, Stefan Schlobach, Christophe Guéret

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

21 Scopus citations

Abstract

The Linked Data cloud has grown to become the largest knowledge base ever constructed. Its size is now turning into a major bottleneck for many applications. In order to facilitate access to this structured information, this paper proposes an automatic sampling method targeted at maximizing answer coverage for applications using SPARQL querying. The approach presented in this paper is novel: no similar RDF sampling approach exist. Additionally, the concept of creating a sample aimed at maximizing SPARQL answer coverage, is unique. We empirically show that the relevance of triples for sampling (a semantic notion) is influenced by the topology of the graph (purely structural), and can be determined without prior knowledge of the queries. Experiments show a significantly higher recall of topology based sampling methods over random and naive baseline approaches (e.g. up to 90% for Open-BioMed at a sample size of 6%).

Original languageEnglish
Title of host publicationThe Semantic Web - ISWC 2014 - 13th International SemanticWeb Conference, Proceedings
EditorsPeter Mika, Tania Tudorache, Abraham Bernstein, Chris Welty, Craig Knoblock, Denny Vrandečić, Natasha Noy, Paul Groth, Krzysztof Janowicz, Carole Goble
PublisherSpringer Verlag
Pages81-96
Number of pages16
ISBN (Electronic)9783319119144
DOIs
StatePublished - 2014
Externally publishedYes
Event13th International Semantic Web Conference, ISWC 2014 - Riva del Garda, Italy
Duration: Oct 19 2014Oct 23 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8797
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th International Semantic Web Conference, ISWC 2014
Country/TerritoryItaly
CityRiva del Garda
Period10/19/1410/23/14

Keywords

  • Graph analysis
  • Linked data
  • Ranking
  • Sampling
  • Subgraphs

Fingerprint

Dive into the research topics of 'Structural properties as proxy for semantic relevance in RDF graph sampling'. Together they form a unique fingerprint.

Cite this