TY - GEN
T1 - Executing provenance-enabled queries over web data
AU - Wylot, Marcin
AU - Cudré-Mauroux, Philippe
AU - Groth, Paul
PY - 2015/5/18
Y1 - 2015/5/18
N2 - The proliferation of heterogeneous Linked Data on the Web poses new challenges to database systems. In particular, because of this heterogeneity, the capacity to store, track, and query provenance data is becoming a pivotal feature of modern triple stores. In this paper, we tackle the problem of efficiently executing provenance-enabled queries over RDF data. We propose, implement and empirically evaluate five different query execution strategies for RDF queries that incorporate knowledge of provenance. The evaluation is conducted on Web Data obtained from two different Web crawls (The Billion Triple Challenge, and the Web Data Commons). Our evaluation shows that using an adaptive query materialization execution strategy performs best in our context. Interestingly, we find that because provenance is prevalent within Web Data and is highly selective, it can be used to improve query processing performance. This is a counterintuitive result as provenance is often associated with additional overhead.
AB - The proliferation of heterogeneous Linked Data on the Web poses new challenges to database systems. In particular, because of this heterogeneity, the capacity to store, track, and query provenance data is becoming a pivotal feature of modern triple stores. In this paper, we tackle the problem of efficiently executing provenance-enabled queries over RDF data. We propose, implement and empirically evaluate five different query execution strategies for RDF queries that incorporate knowledge of provenance. The evaluation is conducted on Web Data obtained from two different Web crawls (The Billion Triple Challenge, and the Web Data Commons). Our evaluation shows that using an adaptive query materialization execution strategy performs best in our context. Interestingly, we find that because provenance is prevalent within Web Data and is highly selective, it can be used to improve query processing performance. This is a counterintuitive result as provenance is often associated with additional overhead.
KW - Linked Data
KW - Provenance
KW - Provenance Queries
KW - RDF
KW - RDF Data Management
KW - Web Data
UR - http://www.scopus.com/inward/record.url?scp=84953918044&partnerID=8YFLogxK
U2 - 10.1145/2736277.2741143
DO - 10.1145/2736277.2741143
M3 - Contribución a la conferencia
AN - SCOPUS:84953918044
T3 - WWW 2015 - Proceedings of the 24th International Conference on World Wide Web
SP - 1275
EP - 1285
BT - WWW 2015 - Proceedings of the 24th International Conference on World Wide Web
PB - Association for Computing Machinery, Inc
T2 - 24th International Conference on World Wide Web, WWW 2015
Y2 - 18 May 2015 through 22 May 2015
ER -