High-dimensional intervals for penalized regression
Abstract
Details
- Title: Subtitle
- High-dimensional intervals for penalized regression
- Creators
- Logan Harris
- Contributors
- Patrick Breheny (Advisor)Joseph Lang (Committee Member)Joseph Cavanaugh (Committee Member)Daniel Sewell (Committee Member)
- Resource Type
- Dissertation
- Degree Awarded
- Doctor of Philosophy (PhD), University of Iowa
- Degree in
- Biostatistics
- Date degree season
- Autumn 2025
- DOI
- 10.25820/etd.008201
- Publisher
- University of Iowa
- Number of pages
- xiv, 121 pages
- Copyright
- Copyright 2025 Logan Harris
- Language
- English
- Date submitted
- 10/23/2025
- Description illustrations
- illustrations, tables, graphs
- Description bibliographic
- Includes bibliographical references (pages 119-121).
- Public Abstract (ETD)
- As data become increasingly available, analyzing datasets with a large number of predictors is commonplace. This becomes particularly difficult if the number of observations is small in comparison. For example, say a physician at a research hospital wants to build a tool to predict if patients with a rare cancer will achieve remission. In efforts to build the best tool possible, the doctor submits for a grant to collect patient samples and obtain biomarkers through lab tests. Over the study, data from 73 patients is obtained with over 10,000 biomarkers collected on each.
In this setting, traditional methods break down and a broad tool called sparse penalized regression is popular because it can be used to simultaneously select important features and produce reliable predictive estimates. Penalized regression introduces a little bit of bias in order to introduce stability in the estimation process, which is what allows penalized regression to be used when the number of predictors exceeds the number of observations.
Once estimates are obtained, it is often of interest to know how important selected features are or, rather, how much their estimated effects vary. One approach is to construct confidence intervals which provide a plausible range of values for the estimates. However, the bias and sparsity introduced by sparse penalized regression leads to difficulties. Current approaches debias to address this issue, but in doing so produce intervals that often do not contain the original estimates being used for prediction. This work proposes a coherent framework for interval construction that results in intervals that are consistent with corresponding estimates. Then a number of methods are proposed and evaluated both with each other and to existing de-biased approaches. Finally, the methods are extended to a wider range of sparse penalties and outcome types, including the one in this example.
- Academic Unit
- Biostatistics
- Record Identifier
- 9985135048602771