TY - GEN
T1 - Towards data-centric what-if analysis for native machine learning pipelines
AU - Grafberger, Stefan
AU - Groth, Paul
AU - Schelter, Sebastian
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/6/12
Y1 - 2022/6/12
N2 - An important task of data scientists is to understand the sensitivity of their models to changes in the data that the models are trained and tested upon. Currently, conducting such data-centric what-if analyses requires significant and costly manual development and testing with the corresponding chance for the introduction of bugs. We discuss the problem of data-centric what-if analysis over whole ML pipelines (including data preparation and feature encoding), propose optimisations that reuse trained models and intermediate data to reduce the runtime of such analysis, and finally conduct preliminary experiments on three complex example pipelines, where our approach reduces the runtime by a factor of up to six.
AB - An important task of data scientists is to understand the sensitivity of their models to changes in the data that the models are trained and tested upon. Currently, conducting such data-centric what-if analyses requires significant and costly manual development and testing with the corresponding chance for the introduction of bugs. We discuss the problem of data-centric what-if analysis over whole ML pipelines (including data preparation and feature encoding), propose optimisations that reuse trained models and intermediate data to reduce the runtime of such analysis, and finally conduct preliminary experiments on three complex example pipelines, where our approach reduces the runtime by a factor of up to six.
UR - http://www.scopus.com/inward/record.url?scp=85133167380&partnerID=8YFLogxK
U2 - 10.1145/3533028.3533303
DO - 10.1145/3533028.3533303
M3 - Contribución a la conferencia
AN - SCOPUS:85133167380
T3 - Proceedings of the 6th Workshop on Data Management for End-To-End Machine Learning, DEEM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference
BT - Proceedings of the 6th Workshop on Data Management for End-To-End Machine Learning, DEEM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference
PB - Association for Computing Machinery, Inc
T2 - 6th Workshop on Data Management for End-To-End Machine Learning, DEEM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference
Y2 - 12 June 2022 through 12 June 2022
ER -