TY - GEN
T1 - Predicting breakdowns in cloud services (with SPIKE)
AU - Chen, Jianfeng
AU - Chakraborty, Joymallya
AU - Clark, Philip
AU - Haverlock, Kevin
AU - Cherian, Snehit
AU - Menzies, Tim
N1 - Publisher Copyright:
© 2019 ACM.
PY - 2019/8/12
Y1 - 2019/8/12
N2 - Maintaining web-services is a mission-critical task where any down- time means loss of revenue and reputation (of being a reliable service provider). In the current competitive web services market, such a loss of reputation causes extensive loss of future revenue. To address this issue, we developed SPIKE, a data mining tool which can predict upcoming service breakdowns, half an hour into the future. Such predictions let an organization alert and assemble the tiger team to address the problem (e.g. by reconguring cloud hardware in order to reduce the likelihood of that breakdown). SPIKE utilizes (a) regression tree learning (with CART); (b) synthetic minority over-sampling (to handle how rare spikes are in our data); (c) hyperparameter optimization (to learn best settings for our local data) and (d) a technique we called topology sampling where training vectors are built from extensive details of an individual node plus summary details on all their neighbors. In the experiments reported here, SPIKE predicted service spikes 30 minutes into future with recalls and precision of 75% and above. Also, SPIKE performed relatively better than other widely-used learning methods (neural nets, random forests, logistic regression).
AB - Maintaining web-services is a mission-critical task where any down- time means loss of revenue and reputation (of being a reliable service provider). In the current competitive web services market, such a loss of reputation causes extensive loss of future revenue. To address this issue, we developed SPIKE, a data mining tool which can predict upcoming service breakdowns, half an hour into the future. Such predictions let an organization alert and assemble the tiger team to address the problem (e.g. by reconguring cloud hardware in order to reduce the likelihood of that breakdown). SPIKE utilizes (a) regression tree learning (with CART); (b) synthetic minority over-sampling (to handle how rare spikes are in our data); (c) hyperparameter optimization (to learn best settings for our local data) and (d) a technique we called topology sampling where training vectors are built from extensive details of an individual node plus summary details on all their neighbors. In the experiments reported here, SPIKE predicted service spikes 30 minutes into future with recalls and precision of 75% and above. Also, SPIKE performed relatively better than other widely-used learning methods (neural nets, random forests, logistic regression).
KW - Cloud
KW - Data mining
KW - Optimization
KW - Parameter tuning
UR - http://www.scopus.com/inward/record.url?scp=85071909451&partnerID=8YFLogxK
U2 - 10.1145/3338906.3340450
DO - 10.1145/3338906.3340450
M3 - Contribución a la conferencia
AN - SCOPUS:85071909451
T3 - ESEC/FSE 2019 - Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
SP - 916
EP - 924
BT - ESEC/FSE 2019 - Proceedings of the 2019 27th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
A2 - Apel, Sven
A2 - Dumas, Marlon
A2 - Russo, Alessandra
A2 - Pfahl, Dietmar
PB - Association for Computing Machinery, Inc
T2 - 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2019
Y2 - 26 August 2019 through 30 August 2019
ER -