TY - GEN
T1 - Towards Interactively Improving ML Data Preparation Code via "Shadow Pipelines"
AU - Grafberger, Stefan
AU - Groth, Paul
AU - Schelter, Sebastian
N1 - Publisher Copyright:
© 2024 Owner/Author.
PY - 2024/6/9
Y1 - 2024/6/9
N2 - Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings. However, this manual process is tedious and error-prone. Therefore, we propose to support data scientists during this development cycle with automatically derived interactive suggestions for pipeline improvements. We discuss our vision to generate these suggestions with so-called shadow pipelines, hidden variants of the original pipeline that modify it to auto-detect potential issues, try out modifications for improvements, and suggest and explain these modifications to the user. We envision to apply incremental view maintenance-based optimisations to ensure low-latency computation and maintenance of the shadow pipelines. We conduct preliminary experiments to showcase the feasibility of our envisioned approach and the potential benefits of our proposed optimisations.
AB - Data scientists develop ML pipelines in an iterative manner: they repeatedly screen a pipeline for potential issues, debug it, and then revise and improve its code according to their findings. However, this manual process is tedious and error-prone. Therefore, we propose to support data scientists during this development cycle with automatically derived interactive suggestions for pipeline improvements. We discuss our vision to generate these suggestions with so-called shadow pipelines, hidden variants of the original pipeline that modify it to auto-detect potential issues, try out modifications for improvements, and suggest and explain these modifications to the user. We envision to apply incremental view maintenance-based optimisations to ensure low-latency computation and maintenance of the shadow pipelines. We conduct preliminary experiments to showcase the feasibility of our envisioned approach and the potential benefits of our proposed optimisations.
UR - http://www.scopus.com/inward/record.url?scp=85196623088&partnerID=8YFLogxK
U2 - 10.1145/3650203.3663327
DO - 10.1145/3650203.3663327
M3 - Contribución a la conferencia
AN - SCOPUS:85196623088
T3 - Proceedings of the 8th Workshop on Data Management for End-to-End Machine Learning, DEEM 2024 - In conjunction with the 2024 ACM SIGMOD/PODS Conference
SP - 7
EP - 11
BT - Proceedings of the 8th Workshop on Data Management for End-to-End Machine Learning, DEEM 2024 - In conjunction with the 2024 ACM SIGMOD/PODS Conference
PB - Association for Computing Machinery, Inc
T2 - 8th Workshop on Data Management for End-to-End Machine Learning, DEEM 2024
Y2 - 9 June 2024 through 9 June 2024
ER -