Book chapter
Discrepancy-Based Model Selection Criteria Using Cross-Validation
Statistical Models and Methods for Biomedical and Technical Systems, pp.473-486
Statistics for Industry and Technology, Birkhäuser Boston
2008
DOI: 10.1007/978-0-8176-4619-6_33
Abstract
A model selection criterion is often formulated by constructing an approximately unbiased estimator of an expected discrepancy, a measure that gauges the separation between the true model and a fitted approximating model. The expected discrepancy reflects how well, on average, the fitted approximating model predicts “new” data generated under the true model. A related measure, the estimated discrepancy, reflects how well the fitted approximating model predicts the data at hand.
In general, a model selection criterion consists of a goodness-of-fit term and a penalty term. The natural estimator of the expected discrepancy, the estimated discrepancy, corresponds to the goodness-of-fit term of the criterion. However, the estimated discrepancy yields an overly optimistic assessment of how effectively the fitted model predicts new data. It therefore serves as a negatively biased estimator of the expected discrepancy. Correcting for this bias leads to the penalty term.
Cross-validation provides a technique for developing an estimator of an expected discrepancy which need not be adjusted for bias. The basic idea is to construct an empirical discrepancy that evaluates an approximating model by assessing how accurately each case-deleted fitted model predicts the deleted case.
The preceding approach is illustrated in the linear regression framework by formulating estimators of the expected discrepancy based on Kullback’s I-divergence and the Gauss (error sum of squares) discrepancy. The traditional criteria that arise by augmenting the estimated discrepancy with a bias adjustment term are the Akaike information criterion and Mallows’ conceptual predictive statistic. A simulation study is presented.
Details
- Title: Subtitle
- Discrepancy-Based Model Selection Criteria Using Cross-Validation
- Creators
- Joseph E Cavanaugh - Department of Biostatistics, The University of Iowa, Iowa City, USASimon L Davies - Pfizer Global Research and Development, Pfizer, Inc., New York, USAAndrew A Neath - Department of Mathematics and Statistics, Southern Illinois University, Edwardsville, USA
- Resource Type
- Book chapter
- Publication Details
- Statistical Models and Methods for Biomedical and Technical Systems, pp.473-486
- Series
- Statistics for Industry and Technology
- DOI
- 10.1007/978-0-8176-4619-6_33
- Publisher
- Birkhäuser Boston; Boston, MA
- Language
- English
- Date published
- 2008
- Academic Unit
- Statistics and Actuarial Science; Biostatistics; Injury Prevention Research Center
- Record Identifier
- 9984214688602771
Metrics
23 Record Views