Repurposing benchmark corpora for reconstructing provenance

Sara Magliacane, Paul Groth

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

Provenance is a critical aspect in evaluating scientific output, yet, it is still often overlooked or not comprehensively produced by practitioners. This incomplete and partial nature of provenance has been recognized in the literature, which has led to the development of new methods for reconstructing missing provenance. Unfortunately, there is currently no agreed upon evaluation framework for testing these methods. Moreover, there is a paucity of datasets that these methods can be applied to. To begin to address this gap, we present a survey of existing benchmark corpora from other computer science communities that could be applied to evaluate provenance reconstruction techniques. The survey identifies, for each corpus, a mapping between the data available and common provenance concepts. In addition to their applicability to provenance reconstruction, we also argue that these corpora could be reused for other tasks pertaining to provenance.

Original languageEnglish
Pages (from-to)39-50
Number of pages12
JournalCEUR Workshop Proceedings
Volume994
StatePublished - 2013
Externally publishedYes
Event3rd Workshop on Semantic Publishing, SePublica 2013 - 10th Extended Semantic Web Conference - Montpellier, France
Duration: May 26 2013 → …

Fingerprint

Dive into the research topics of 'Repurposing benchmark corpora for reconstructing provenance'. Together they form a unique fingerprint.

Cite this