Journal article
Predictive Insights into U.S. Students’ Mathematics Performance on PISA 2022 Using Ensemble Tree-Based Machine Learning Models
International journal of educational research, Vol.130, 102537
2025
DOI: 10.1016/j.ijer.2025.102537
Abstract
•Provide an innovative insight into using ML modes for predictor selection and predictive analysis for math performance.•Obtain and reveal key and malleable factors that challenge in improving students’ math performance, which provide data-driven recommendations for educators and policymakers.•Contribute to shifts of instructional practices, targeted interventions, curriculum development, and policy decisions, ultimately contributing to enhancing the overall quality of math education in the U.S.
In the latest Program for International Student Assessment (PISA) 2022 results, U.S. students earned the lowest math scores in two decades. Educators and stakeholders have endeavored to identify key malleable factors in an attempt to raise scores. Although more researchers are gradually incorporating machine learning (ML) techniques, most still rely on literature reviews by humans to identify important predictors. Here we focus on providing innovative insights into how to use ML models to identify predictors most strongly associated with students’ math performance.
The dataset comprises 4,552 U.S. students in 154 schools from the PISA 2022. We used three ensemble tree-based ML models (Random Forest, XGBoost, and LightGBM) to select most influential predictors from 143 derived variables of student and school questionnaires. All three models showed high accuracy in predicting students’ math performance, with XGBoost performing best (rMSE = 69.82, training time = 4.14 seconds) and identifying 10 significant predictors. According to the accumulated local effects (ALEs) plots, three of them have general positive effects, five have roughly negative effects, and two have mixed effects on students’ math performance. When comparing these ML-identified predictors to those identified by literature review, the ML method has significantly improved the accuracy of predictor selection (p-value < .05) but offered lower interpretability.
We conclude that ML predictor selection is an effective alternative to LR for obtaining influential factors affecting student learning outcomes. Among the factors identified, math self-efficacy, ESCS, and math anxiety are strongly correlate to students’ math performance. The results provide valuable insights to implement shifts in instructional practices, targeted interventions, curriculum development, and policy decisions, ultimately contributing to enhancing the overall quality of U.S. math education.
Details
- Title: Subtitle
- Predictive Insights into U.S. Students’ Mathematics Performance on PISA 2022 Using Ensemble Tree-Based Machine Learning Models
- Creators
- Li Zhu - Mathematics Education, Department of Teaching and Learning, College of Education, The University of Iowa, 240 S Madison St, Iowa City, IA 52242Hyesun You - Science Education, Department of Teaching and Learning, College of Education, The University of Iowa, 240 S Madison St, Iowa City, IA 52242Minju Hong - University of Arkansas at FayettevilleZhenhan Fang - Statistics, Department of Statistics and Actuarial Science, College of Liberal Arts and Sciences, The University of Iowa, 241 Schaeffer Hall, Iowa City, IA 52242
- Resource Type
- Journal article
- Publication Details
- International journal of educational research, Vol.130, 102537
- DOI
- 10.1016/j.ijer.2025.102537
- ISSN
- 0883-0355
- eISSN
- 1873-538X
- Publisher
- Elsevier Ltd
- Language
- English
- Date published
- 2025
- Academic Unit
- Teaching and Learning
- Record Identifier
- 9984773418202771
Metrics
30 Record Views