Towards data-centric what-if analysis for native machine learning pipelines

Stefan Grafberger, Paul Groth, Sebastian Schelter

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

An important task of data scientists is to understand the sensitivity of their models to changes in the data that the models are trained and tested upon. Currently, conducting such data-centric what-if analyses requires significant and costly manual development and testing with the corresponding chance for the introduction of bugs. We discuss the problem of data-centric what-if analysis over whole ML pipelines (including data preparation and feature encoding), propose optimisations that reuse trained models and intermediate data to reduce the runtime of such analysis, and finally conduct preliminary experiments on three complex example pipelines, where our approach reduces the runtime by a factor of up to six.

Original languageEnglish
Title of host publicationProceedings of the 6th Workshop on Data Management for End-To-End Machine Learning, DEEM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450393751
DOIs
StatePublished - Jun 12 2022
Event6th Workshop on Data Management for End-To-End Machine Learning, DEEM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference - Philadelphia, United States
Duration: Jun 12 2022Jun 12 2022

Publication series

NameProceedings of the 6th Workshop on Data Management for End-To-End Machine Learning, DEEM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference

Conference

Conference6th Workshop on Data Management for End-To-End Machine Learning, DEEM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference
Country/TerritoryUnited States
CityPhiladelphia
Period06/12/2206/12/22

Fingerprint

Dive into the research topics of 'Towards data-centric what-if analysis for native machine learning pipelines'. Together they form a unique fingerprint.

Cite this