Dissertation
Modern computational and methodological advances for best subset selection and backward elimination
University of Iowa
Doctor of Philosophy (PhD), University of Iowa
Autumn 2024
DOI: 10.25820/etd.007557
Abstract
For characterizing the mean structure in regression modeling, variable determination is often facilitated by the use of a model selection criterion, a statistic that gauges the propriety of a fitted model based on how well it balances the competing objectives of fidelity to the data and parsimony. In this context, best subset selection is frequently the preferred method of practitioners, since it is guaranteed to find the model that corresponds to the optimal value of the criterion. However, if this method is performed by an exhaustive search, where every model is fit based on every possible variable subset, the algorithm can be prohibitively slow. Thus, we present branch and bound algorithms to make best subset selection much faster. The branch and bound algorithms are exact, in that they always identify the optimal model, and work by systematically ruling out models that cannot possibly be optimal.
Additionally, because backward elimination is a very popular alternative to best subset selection, we present multiple modifications to backward elimination to enhance computational efficiency and to better approximate the solution produced by best subset selection. Lastly, to improve inference with these variable selection methods, we present an approach to quantifying variable importance and calculating accompanying p-values that incorporates the entirety of the variable selection process.
Details
- Title: Subtitle
- Modern computational and methodological advances for best subset selection and backward elimination
- Creators
- Jacob Seedorff
- Contributors
- Joe Cavanaugh (Advisor)Patrick Breheny (Committee Member)Grant Brown (Committee Member)Jacob Oleson (Committee Member)
- Resource Type
- Dissertation
- Degree Awarded
- Doctor of Philosophy (PhD), University of Iowa
- Degree in
- Biostatistics
- Date degree season
- Autumn 2024
- DOI
- 10.25820/etd.007557
- Publisher
- University of Iowa
- Number of pages
- viii, 85 pages
- Copyright
- Copyright 2024 Jacob Seedorff
- Language
- English
- Date submitted
- 12/08/2024
- Description illustrations
- Illustrations, tables, graphs, charts
- Description bibliographic
- Includes bibliographical references (pages 82-85).
- Public Abstract (ETD)
In statistical modeling, practitioners often seek to identify the variables that have a meaningful relationship with an outcome of interest. Algorithms for the determination of the optimal set of variables are often computationally intensive and/or may produce lower quality solutions. We improve upon existing variable selection algorithms to make them more computationally efficient and to make them yield higher quality solutions. In the context of our improved algorithms, we also develop and present a novel method for quantifying the relative importance of each variable via a single index.
- Academic Unit
- Biostatistics
- Record Identifier
- 9984774959502771
Metrics
47 File views/ downloads
4 Record Views