Machine Learning-Based Prediction of LGS Scores from Middle School Exam Results in Kütahya


Keywords:
Student Performance Prediction, Machine Learning, Regression Models, Educational Data AnalysisAbstract
Accurate prediction of student performance is crucial for improving educational outcomes and enabling early interventions. This study examines the predictability of national high school entrance exam (LGS) scores based on in-school exam results across six core subjects from 818 students in grades 6 to 8 at 15 middle schools in Kütahya, Turkey. Fourteen supervised machine learning regression models, including ensemble methods such as Extra Trees, Random Forest, and XGBoost, were applied independently for each subject to forecast LGS net scores. Performance was evaluated using Mean Squared Error (MSE) and Coefficient of Determination (R²). The results show that ensemble-based models significantly outperform traditional algorithms and achieve high accuracy in all subjects.The findings highlight the effectiveness of these models in capturing complex patterns in educational data and their potential for early identification of at-risk students. This research supports the integration of machine learning techniques into educational assessment systems to foster data-driven, personalized interventions.