TY - GEN
T1 - A distributed algorithm for determining the provenance of data
AU - Groth, Paul T.
PY - 2008
Y1 - 2008
N2 - As computational techniques for tracking provenance have become more widely used, applications are beginning to produce large quantities of provenance information. Furthermore, many of these applications are composed from distributed components (e.g. scientific workflows) that may, for reasons of scalability, security or policy, need to store this information across multiple sites. In this paper, we describe an algorithm, D-PQuery, for determining the provenance of data from distributed sources of provenance information in a parallel fashion. To enable scientist to use D-PQuery on already existing Grid infrastructure, we present an implementation of the algorithm as a Condor DAGMan workflow that works across Kickstart records, which are produced in several production e-Science applications including the example application used in this paper, the astronomy application, Montage. Initial performance benchmarks are also presented.
AB - As computational techniques for tracking provenance have become more widely used, applications are beginning to produce large quantities of provenance information. Furthermore, many of these applications are composed from distributed components (e.g. scientific workflows) that may, for reasons of scalability, security or policy, need to store this information across multiple sites. In this paper, we describe an algorithm, D-PQuery, for determining the provenance of data from distributed sources of provenance information in a parallel fashion. To enable scientist to use D-PQuery on already existing Grid infrastructure, we present an implementation of the algorithm as a Condor DAGMan workflow that works across Kickstart records, which are produced in several production e-Science applications including the example application used in this paper, the astronomy application, Montage. Initial performance benchmarks are also presented.
UR - http://www.scopus.com/inward/record.url?scp=62749159100&partnerID=8YFLogxK
U2 - 10.1109/eScience.2008.38
DO - 10.1109/eScience.2008.38
M3 - Contribución a la conferencia
AN - SCOPUS:62749159100
SN - 9780769535357
T3 - Proceedings - 4th IEEE International Conference on eScience, eScience 2008
SP - 166
EP - 173
BT - Proceedings - 4th IEEE International Conference on eScience, eScience 2008
T2 - 4th IEEE International Conference on eScience, eScience 2008
Y2 - 7 December 2008 through 12 December 2008
ER -