TY - JOUR
T1 - Structure-based knowledge acquisition from electronic lab notebooks for research data provenance documentation
AU - Schröder, Max
AU - Staehlke, Susanne
AU - Groth, Paul
AU - Nebe, J. Barbara
AU - Spors, Sascha
AU - Krüger, Frank
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/12
Y1 - 2022/12
N2 - Background: Electronic Laboratory Notebooks (ELNs) are used to document experiments and investigations in the wet-lab. Protocols in ELNs contain a detailed description of the conducted steps including the necessary information to understand the procedure and the raised research data as well as to reproduce the research investigation. The purpose of this study is to investigate whether such ELN protocols can be used to create semantic documentation of the provenance of research data by the use of ontologies and linked data methodologies. Methods: Based on an ELN protocol of a biomedical wet-lab experiment, a retrospective provenance model of the raised research data describing the details of the experiment in a machine-interpretable way is manually engineered. Furthermore, an automated approach for knowledge acquisition from ELN protocols is derived from these results. This structure-based approach exploits the structure in the experiment’s description such as headings, tables, and links, to translate the ELN protocol into a semantic knowledge representation. To satisfy the Findable, Accessible, Interoperable, and Reuseable (FAIR) guiding principles, a ready-to-publish bundle is created that contains the research data together with their semantic documentation. Results: While the manual modelling efforts serve as proof of concept by employing one protocol, the automated structure-based approach demonstrates the potential generalisation with seven ELN protocols. For each of those protocols, a ready-to-publish bundle is created and, by employing the SPARQL query language, it is illustrated that questions about the processes and the obtained research data can be answered. Conclusions: The semantic documentation of research data obtained from the ELN protocols allows for the representation of the retrospective provenance of research data in a machine-interpretable way. Research Object Crate (RO-Crate) bundles including these models enable researchers to easily share the research data including the corresponding documentation, but also to search and relate the experiment to each other.
AB - Background: Electronic Laboratory Notebooks (ELNs) are used to document experiments and investigations in the wet-lab. Protocols in ELNs contain a detailed description of the conducted steps including the necessary information to understand the procedure and the raised research data as well as to reproduce the research investigation. The purpose of this study is to investigate whether such ELN protocols can be used to create semantic documentation of the provenance of research data by the use of ontologies and linked data methodologies. Methods: Based on an ELN protocol of a biomedical wet-lab experiment, a retrospective provenance model of the raised research data describing the details of the experiment in a machine-interpretable way is manually engineered. Furthermore, an automated approach for knowledge acquisition from ELN protocols is derived from these results. This structure-based approach exploits the structure in the experiment’s description such as headings, tables, and links, to translate the ELN protocol into a semantic knowledge representation. To satisfy the Findable, Accessible, Interoperable, and Reuseable (FAIR) guiding principles, a ready-to-publish bundle is created that contains the research data together with their semantic documentation. Results: While the manual modelling efforts serve as proof of concept by employing one protocol, the automated structure-based approach demonstrates the potential generalisation with seven ELN protocols. For each of those protocols, a ready-to-publish bundle is created and, by employing the SPARQL query language, it is illustrated that questions about the processes and the obtained research data can be answered. Conclusions: The semantic documentation of research data obtained from the ELN protocols allows for the representation of the retrospective provenance of research data in a machine-interpretable way. Research Object Crate (RO-Crate) bundles including these models enable researchers to easily share the research data including the corresponding documentation, but also to search and relate the experiment to each other.
KW - Electronic laboratory notebooks
KW - FAIR
KW - Knowledge acquisition
KW - Provenance
KW - RO-Crate
KW - Research data
KW - Semantic documentation
UR - https://www.scopus.com/pages/publications/85124034638
U2 - 10.1186/s13326-021-00257-x
DO - 10.1186/s13326-021-00257-x
M3 - Artículo
C2 - 35101121
AN - SCOPUS:85124034638
SN - 2041-1480
VL - 13
JO - Journal of Biomedical Semantics
JF - Journal of Biomedical Semantics
IS - 1
M1 - 4
ER -