TY - GEN
T1 - DAW
T2 - 12th International Semantic Web Conference, ISWC 2013
AU - Saleem, Muhammad
AU - Ngonga Ngomo, Axel Cyrille
AU - Xavier Parreira, Josiane
AU - Deus, Helena F.
AU - Hauswirth, Manfred
PY - 2013
Y1 - 2013
N2 - Over the last years the Web of Data has developed into a large compendium of interlinked data sets from multiple domains. Due to the decentralised architecture of this compendium, several of these datasets contain duplicated data. Yet, so far, only little attention has been paid to the effect of duplicated data on federated querying. This work presents DAW, a novel duplicate-aware approach to federated querying over the Web of Data. DAW is based on a combination of min-wise independent permutations and compact data summaries. It can be directly combined with existing federated query engines in order to achieve the same query recall values while querying fewer data sources. We extend three well-known federated query processing engines - DARQ, SPLENDID, and FedX - with DAW and compare our extensions with the original approaches. The comparison shows that DAW can greatly reduce the number of queries sent to the endpoints, while keeping high query recall values. Therefore, it can significantly improve the performance of federated query processing engines. Moreover, DAW provides a source selection mechanism that maximises the query recall, when the query processing is limited to a subset of the sources.
AB - Over the last years the Web of Data has developed into a large compendium of interlinked data sets from multiple domains. Due to the decentralised architecture of this compendium, several of these datasets contain duplicated data. Yet, so far, only little attention has been paid to the effect of duplicated data on federated querying. This work presents DAW, a novel duplicate-aware approach to federated querying over the Web of Data. DAW is based on a combination of min-wise independent permutations and compact data summaries. It can be directly combined with existing federated query engines in order to achieve the same query recall values while querying fewer data sources. We extend three well-known federated query processing engines - DARQ, SPLENDID, and FedX - with DAW and compare our extensions with the original approaches. The comparison shows that DAW can greatly reduce the number of queries sent to the endpoints, while keeping high query recall values. Therefore, it can significantly improve the performance of federated query processing engines. Moreover, DAW provides a source selection mechanism that maximises the query recall, when the query processing is limited to a subset of the sources.
KW - federated query processing
KW - min-wise independent permutations
KW - SPARQL
KW - Web of Data
UR - http://www.scopus.com/inward/record.url?scp=84891960336&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-41335-3_36
DO - 10.1007/978-3-642-41335-3_36
M3 - Contribución a la conferencia
AN - SCOPUS:84891960336
SN - 9783642413346
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 574
EP - 590
BT - The Semantic Web, ISWC 2013 - 12th International Semantic Web Conference, Proceedings
Y2 - 21 October 2013 through 25 October 2013
ER -