Executing provenance-enabled queries over web data

Marcin Wylot, Philippe Cudré-Mauroux, Paul Groth

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Scopus citations

Abstract

The proliferation of heterogeneous Linked Data on the Web poses new challenges to database systems. In particular, because of this heterogeneity, the capacity to store, track, and query provenance data is becoming a pivotal feature of modern triple stores. In this paper, we tackle the problem of efficiently executing provenance-enabled queries over RDF data. We propose, implement and empirically evaluate five different query execution strategies for RDF queries that incorporate knowledge of provenance. The evaluation is conducted on Web Data obtained from two different Web crawls (The Billion Triple Challenge, and the Web Data Commons). Our evaluation shows that using an adaptive query materialization execution strategy performs best in our context. Interestingly, we find that because provenance is prevalent within Web Data and is highly selective, it can be used to improve query processing performance. This is a counterintuitive result as provenance is often associated with additional overhead.

Original languageEnglish
Title of host publicationWWW 2015 - Proceedings of the 24th International Conference on World Wide Web
PublisherAssociation for Computing Machinery, Inc
Pages1275-1285
Number of pages11
ISBN (Electronic)9781450334693
DOIs
StatePublished - May 18 2015
Externally publishedYes
Event24th International Conference on World Wide Web, WWW 2015 - Florence, Italy
Duration: May 18 2015May 22 2015

Publication series

NameWWW 2015 - Proceedings of the 24th International Conference on World Wide Web

Conference

Conference24th International Conference on World Wide Web, WWW 2015
Country/TerritoryItaly
CityFlorence
Period05/18/1505/22/15

Keywords

  • Linked Data
  • Provenance
  • Provenance Queries
  • RDF
  • RDF Data Management
  • Web Data

Fingerprint

Dive into the research topics of 'Executing provenance-enabled queries over web data'. Together they form a unique fingerprint.

Cite this