Generating scientific documentation for computational experiments using provenance

Adianto Wibisono, Peter Bloem, Gerben K.D. de Vries, Paul Groth, Adam Belloum, Marian Bubak

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Electronic notebooks are a common mechanism for scientists to document and investigate their work. With the advent of tools such as IPython Notebooks and Knitr, these notebooks allow code and data to be mixed together and published online. However, these approaches assume that all work is done in the same notebook environment. In this work, we look at generating notebook documentation from multi-environment workflows by using provenance represented in the W3C PROV model. Specifically, using PROV generated from the Ducktape workflow system, we are able to generate IPython notebooks that include results tables, provenance visualizations as well as references to the software and datasets used. The notebooks are interactive and editable, so that the user can explore and analyze the results of the experiment without re-running the workflow. We identify specific extensions to PROV necessary for facilitating documentation generation. To evaluate, we recreate the documentation website for a paper which won the Open Science Award at the ECML/ PKDD 2013 machine learning conference. We show that the documentation produced automatically by our system provides more detail and greater experimental insight than the original hand-crafted documentation. Our approach bridges the gap between user friendly notebook documentation and provenance generated by distributed heterogeneous components.

Original languageEnglish
Title of host publicationProvenance and Annotation of Data and Processes - 5th International Provenance and Annotation Workshop, IPAW 2014, Revised Selected Papers
EditorsBeth Plale, Bertram Ludäscher, Bertram Ludäscher
PublisherSpringer Verlag
Pages168-179
Number of pages12
ISBN (Electronic)9783319164618
DOIs
StatePublished - 2015
Externally publishedYes
Event5th International Provenance and Annotation Workshop, IPAW 2014 - Cologne, Germany
Duration: Jun 10 2014Jun 11 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume8628
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference5th International Provenance and Annotation Workshop, IPAW 2014
Country/TerritoryGermany
CityCologne
Period06/10/1406/11/14

Fingerprint

Dive into the research topics of 'Generating scientific documentation for computational experiments using provenance'. Together they form a unique fingerprint.

Cite this