Looking inside the black-box: Capturing data provenance using dynamic instrumentation

Manolis Stamatogiannakis, Paul Groth, Herbert Bos

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

29 Scopus citations

Abstract

Knowing the provenance of a data item helps in ascertaining its trustworthiness. Various approaches have been proposed to track or infer data provenance. However, these approaches either treat an executing program as a black-box, limiting the fidelity of the captured provenance, or require developers to modify the program to make it provenance-aware. In this paper, we introduce DataTracker, a new approach to capturing data provenance based on taint tracking, a technique widely used in the security and reverse engineering fields. Our system is able to identify data provenance relations through dynamic instrumentation of unmodified binaries, without requiring access to, or knowledge of, their source code. Hence, we can track provenance for a variety of well-known applications. Because DataTracker looks inside the executing program, it captures high-fidelity and accurate data provenance.

Original languageEnglish
Title of host publicationProvenance and Annotation of Data and Processes - 5th International Provenance and Annotation Workshop, IPAW 2014, Revised Selected Papers
EditorsBeth Plale, Bertram Ludäscher, Bertram Ludäscher
PublisherSpringer Verlag
Pages155-167
Number of pages13
ISBN (Electronic)9783319164618
DOIs
StatePublished - 2015
Externally publishedYes
Event5th International Provenance and Annotation Workshop, IPAW 2014 - Cologne, Germany
Duration: Jun 10 2014Jun 11 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8628
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference5th International Provenance and Annotation Workshop, IPAW 2014
Country/TerritoryGermany
CityCologne
Period06/10/1406/11/14

Keywords

  • Data provenance
  • Dynamic
  • PROV
  • Taint analysis
  • Taint tracking

Fingerprint

Dive into the research topics of 'Looking inside the black-box: Capturing data provenance using dynamic instrumentation'. Together they form a unique fingerprint.

Cite this