Journal article
Best-subset model selection based on multitudinal assessments of likelihood improvements
Journal of applied statistics, Vol.47(13-15), pp.2384-2420
11/17/2020
DOI: 10.1080/02664763.2019.1645097
PMCID: PMC9041924
PMID: 35707408
Abstract
A common model selection approach is to select the best model, according to some criterion, from among the collection of models defined by all possible subsets of the explanatory variables. Identifying an optimal subset has proven to be a challenging problem, both statistically and computationally. Our model selection procedure allows the researcher to nominate, a priori, the probability at which models containing false or spurious variables will be selected from among all possible subsets. The procedure determines whether inclusion of each candidate variable results in a sufficiently improved fitting term - and is hence named the SIFT procedure. Two variants are proposed: a naive method based on a set of restrictive assumptions and an empirical permutation-based method. Properties of these methods are investigated within the standard linear modeling framework and performance is evaluated against other model selection techniques. The SIFT procedure behaves as designed - asymptotically selecting variables that characterize the underlying data generating mechanism, while limiting selection of spurious variables to the desired level. The SIFT methodology offers researchers a promising new approach to model selection, providing the ability to control the probability of selecting a model that includes spurious variables to a level based on the context of the application.
Details
- Title: Subtitle
- Best-subset model selection based on multitudinal assessments of likelihood improvements
- Creators
- Knute D Carter - Department of Biostatistics, University of IowaJoseph E Cavanaugh - Department of Biostatistics, University of Iowa
- Resource Type
- Journal article
- Publication Details
- Journal of applied statistics, Vol.47(13-15), pp.2384-2420
- DOI
- 10.1080/02664763.2019.1645097
- PMID
- 35707408
- PMCID
- PMC9041924
- NLM abbreviation
- J Appl Stat
- ISSN
- 0266-4763
- eISSN
- 1360-0532
- Publisher
- Taylor & Francis
- Language
- English
- Date published
- 11/17/2020
- Academic Unit
- Statistics and Actuarial Science; Biostatistics; Injury Prevention Research Center
- Record Identifier
- 9984214694802771
Metrics
23 Record Views