A distributed algorithm for determining the provenance of data

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

6 Scopus citations

Abstract

As computational techniques for tracking provenance have become more widely used, applications are beginning to produce large quantities of provenance information. Furthermore, many of these applications are composed from distributed components (e.g. scientific workflows) that may, for reasons of scalability, security or policy, need to store this information across multiple sites. In this paper, we describe an algorithm, D-PQuery, for determining the provenance of data from distributed sources of provenance information in a parallel fashion. To enable scientist to use D-PQuery on already existing Grid infrastructure, we present an implementation of the algorithm as a Condor DAGMan workflow that works across Kickstart records, which are produced in several production e-Science applications including the example application used in this paper, the astronomy application, Montage. Initial performance benchmarks are also presented.

Original languageEnglish
Title of host publicationProceedings - 4th IEEE International Conference on eScience, eScience 2008
Pages166-173
Number of pages8
DOIs
StatePublished - 2008
Externally publishedYes
Event4th IEEE International Conference on eScience, eScience 2008 - Indianapolis, IN, United States
Duration: Dec 7 2008Dec 12 2008

Publication series

NameProceedings - 4th IEEE International Conference on eScience, eScience 2008

Conference

Conference4th IEEE International Conference on eScience, eScience 2008
Country/TerritoryUnited States
CityIndianapolis, IN
Period12/7/0812/12/08

Fingerprint

Dive into the research topics of 'A distributed algorithm for determining the provenance of data'. Together they form a unique fingerprint.

Cite this