TY - GEN
T1 - Predicting MOOC Student Success Using Aggregated Behavioral Features and Machine Learning Under the CRISP-ML(Q) Framework
AU - Toledo, María Belén
AU - Roa, Henry N.
AU - Loza-Aguirre, Edison
AU - Espinosa-Avila, Eduardo
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2026.
PY - 2026
Y1 - 2026
N2 - This study examines the application of machine learning to predict academic performance in Massive Open Online Courses (MOOCs), aiming to address the persistent challenge of low completion rates despite high enrollment rates. Identifying at-risk students early in the course is crucial for enabling targeted interventions to improve engagement and retention. The research is essential because it offers a practical and scalable approach for early detection using only behavioral data readily available during the initial stages of learning. The study utilized data from a Coursera MOOC, comprising over 8,000 anonymized student records and more than 80 event log files. Aggregated behavioral features—such as total sessions, number of quizzes completed, lecture views, supplementary activity access, and average session duration—were extracted from the first 50% of course participation. Following the CRISP-ML(Q) framework, three predictive models were developed and compared: logistic regression, decision trees, and support vector machines. The results showed that the decision tree classifier achieved the highest performance, with 90.9% accuracy and balanced precision and recall. These findings suggest that early-stage, aggregated behavioral metrics can predict course outcomes without relying on complex sequence modeling. The implications are significant for MOOC providers and educational institutions seeking real-time, interpretable, and scalable solutions to support student success. The proposed model can guide early interventions and improve overall course completion rates in online learning environments.
AB - This study examines the application of machine learning to predict academic performance in Massive Open Online Courses (MOOCs), aiming to address the persistent challenge of low completion rates despite high enrollment rates. Identifying at-risk students early in the course is crucial for enabling targeted interventions to improve engagement and retention. The research is essential because it offers a practical and scalable approach for early detection using only behavioral data readily available during the initial stages of learning. The study utilized data from a Coursera MOOC, comprising over 8,000 anonymized student records and more than 80 event log files. Aggregated behavioral features—such as total sessions, number of quizzes completed, lecture views, supplementary activity access, and average session duration—were extracted from the first 50% of course participation. Following the CRISP-ML(Q) framework, three predictive models were developed and compared: logistic regression, decision trees, and support vector machines. The results showed that the decision tree classifier achieved the highest performance, with 90.9% accuracy and balanced precision and recall. These findings suggest that early-stage, aggregated behavioral metrics can predict course outcomes without relying on complex sequence modeling. The implications are significant for MOOC providers and educational institutions seeking real-time, interpretable, and scalable solutions to support student success. The proposed model can guide early interventions and improve overall course completion rates in online learning environments.
KW - Academic performance prediction
KW - CRISP-ML(Q)
KW - Learning analytics
KW - Machine learning
KW - MOOCs
UR - https://www.scopus.com/pages/publications/105028361288
U2 - 10.1007/978-3-032-07995-4_16
DO - 10.1007/978-3-032-07995-4_16
M3 - Contribución a la conferencia
AN - SCOPUS:105028361288
SN - 9783032079947
T3 - Lecture Notes in Networks and Systems
SP - 226
EP - 242
BT - Proceedings of the Future Technologies Conference, FTC 2025, Volume 3
A2 - Arai, Kohei
PB - Springer Science and Business Media Deutschland GmbH
T2 - Future Technologies Conference, FTC 2025
Y2 - 6 November 2025 through 7 November 2025
ER -