Logo image
Evaluating Inferential Statistics Filtering in High-Dimensional Item Feature Spaces for Predicting IRT Parameters
Journal article   Open access   Peer reviewed

Evaluating Inferential Statistics Filtering in High-Dimensional Item Feature Spaces for Predicting IRT Parameters

Juyoung Jung, Yeonju Lee, Ae Kyong Jung, Seungwon Shin and Won-Chan Lee
Mathematics (Basel), Vol.14(10), 1662
05/13/2026
DOI: 10.3390/math14101662
url
https://doi.org/10.3390/math14101662View
Published (Version of record) Open Access

Abstract

Predicting parameter estimates under item response theory (IRT) from expert-coded item features offers a scalable alternative to resource-intensive field testing. This study evaluates whether inferential feature selection can improve predictive accuracy for item difficulty and item discrimination using five filter methods: the Analysis of Variance (ANOVA) F-test, Kendall’s Tau, the Kolmogorov–Smirnov test, the Anderson–Darling test, and the Energy Distance test. Models were trained using K-Nearest Neighbors (KNN) and Support Vector Regression (SVR) under random split and fixed-form cold-start partitioning strategies. Results show that the distributional properties of item features, rather than train–test splitting alone, drive predictive gains: distribution-based filter approaches, particularly the Kolmogorov–Smirnov test, consistently outperformed mean-based approaches by better capturing the full probability structure of the feature-parameter relationship. KNN benefited substantially from feature selection given its reliance on Euclidean distance, while SVR showed smaller gains due to its inherent regularization. Item discrimination generalized well to previously unseen test forms that share no calibration data with the training set, whereas item difficulty prediction was considerably more sensitive to distributional shifts when predicting entirely new, operationally administered forms. The main finding is that the distributional properties of item features are more important than the quantity of features for obtaining robust IRT parameter predictions.
feature selection machine learning item response theory item parameter prediction inferential statistics high-dimensional data

Details

Metrics

1 Record Views
Logo image