Storing, Tracking, and Querying Provenance in Linked Data

Marcin Wylot, Philippe Cudre-Mauroux, Manfred Hauswirth, Paul Groth

Research output: Contribution to journalArticlepeer-review

27 Scopus citations

Abstract

The proliferation of heterogeneous Linked Data on the Web poses new challenges to database systems. In particular, the capacity to store, track, and query provenance data is becoming a pivotal feature of modern triplestores. We present methods extending a native RDF store to efficiently handle the storage, tracking, and querying of provenance in RDF data. We describe a reliable and understandable specification of the way results were derived from the data and how particular pieces of data were combined to answer a query. Subsequently, we present techniques to tailor queries with provenance data. We empirically evaluate the presented methods and show that the overhead of storing and tracking provenance is acceptable. Finally, we show that tailoring a query with provenance information can also significantly improve the performance of query execution.

Original languageEnglish
Article number7891631
Pages (from-to)1751-1764
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume29
Issue number8
DOIs
StatePublished - Aug 2017
Externally publishedYes

Keywords

  • BigData
  • linked data
  • provenance
  • RDF
  • triplestores

Fingerprint

Dive into the research topics of 'Storing, Tracking, and Querying Provenance in Linked Data'. Together they form a unique fingerprint.

Cite this