A new class of information criteria for improved prediction in the presence of training/validation data heterogeneity
Abstract
Details
- Title: Subtitle
- A new class of information criteria for improved prediction in the presence of training/validation data heterogeneity
- Creators
- Javier E Flores
- Contributors
- Joseph E. Cavanaugh (Advisor)Andrew A. Neath (Committee Member)Gideon K. D. Zamba (Committee Member)Jacob Oleson (Committee Member)Hyunkeun Cho (Committee Member)
- Resource Type
- Dissertation
- Degree Awarded
- Doctor of Philosophy (PhD), University of Iowa
- Degree in
- Biostatistics
- Date degree season
- Spring 2021
- DOI
- 10.17077/etd.006094
- Publisher
- University of Iowa
- Number of pages
- xix, 176 pages
- Copyright
- Copyright 2021 Javier E Flores
- Language
- English
- Description illustrations
- illustrations (some color)
- Description bibliographic
- Includes bibliographical references (pages 173-176).
- Public Abstract (ETD)
In an era where data-driven reasoning is paramount, predictive modeling is an important tool for leveraging data to extract the actionable insights that drive scientific innovation. However, given an abundance of data and modeling techniques, the enterprising analyst is often faced with the challenge of identifying optimal ways of modeling these data to distill from them the knowledge that they convey. One answer to this challenge is found in model selection (i.e. information) criteria.
Model selection criteria allow for the rank-ordering of a candidate collection of models according to a joint measure of their complexity and fidelity to the underlying data. Generally speaking, predictive models that are overly complex yield predictions that are less systematically biased but highly imprecise, varying wildly with even the slightest changes to the data. Conversely, models that are too simplistic insufficiently characterize the underlying data and yield biased (but more precise) predictions that could be far from the truth. Thus the use of model selection criteria allows one to identify the model that best balances along the complexity/simplicity spectrum and yields predictions that are neither inconsistent nor systematically too high or too low.
This dissertation introduces a new class of selection criteria that improve upon the abilities of existing criteria in selecting models that best strike the bias/variability balance. Currently available criteria rely on the assumption that the target of prediction (i.e. validation data) and the data used to construct each model (i.e. training data) follow identical distributions. This assumption is clearly misaligned with the premise of prediction, where one desires to predict a set of new data that is likely characteristically different from the data at hand. Recognizing this disconnect, the set of model selection criteria introduced by this thesis lead to the selection of good predictive models regardless of the relationship between the validation and training datasets. We demonstrate the utility of our criteria across a variety of popular modeling frameworks and predictive scenarios, and we compare their performance to a subset of widely-implemented selection criteria.
- Academic Unit
- Biostatistics
- Record Identifier
- 9984097169902771