High-dimensional intervals for penalized regression

Logan Harris

doi:10.25820/etd.008201

Back

High-dimensional intervals for penalized regression

Dissertation

Open access

High-dimensional intervals for penalized regression

Logan Harris

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Autumn 2025

DOI: 10.25820/etd.008201

Files and links (1)

pdf

harris-logan-2025-10-271.69 MBDownload View

Open Access

Abstract

The lasso (least absolute shrinkage and selection operator) – and, more broadly, sparse penalized regression methods – are widely used for datasets with many predictors because they can be used for variable selection and coefficient estimation simultaneously. With that said, inference on sparse penalized regression is difficult which has led to a wide array of procedures, many of which focus on significance through the use of false discovery rate (FDR) control. Methods for constructing confidence intervals exist, but the bias introduced by the penalties poses a particular challenge here and debiasing is required to obtain traditional frequentist coverage (i.e., correct 1 − α coverage for each parameter individually). However, the drawback to these approaches is that the intervals are not constructed under the same assumptions as were used for obtaining point estimates. Alternatively, this dissertation develops a coherent framework for constructing intervals that are consistent with corresponding point estimates. Specifically, the intervals target average rather than individual coverage, allowing for bias in the construction just like the corresponding point estimates. We refer to intervals as high-dimensional intervals (HDIs). HDIs have properties that fall in between confidence and credible intervals. In particular, we show that the idea of average coverage aligns with Bayesian inference. Additionally, the interval construction methods proposed leverage the connection between the lasso penalty and a prior. Chapter 2 outlines the alternative framework and provides an initial approach for interval construction, Chapter 3 adds an additional criteria that improves alignment of HDIs with corresponding estimates and introduces two alternative approaches, and Chapter 4 extends the proposed methods to a broader range of penalties and outcome types.

confidence intervals

high-dimensional inference

lasso

penalized regression

Details

Title: Subtitle: High-dimensional intervals for penalized regression
Creators: Logan Harris
Contributors: Patrick Breheny (Advisor)
Joseph Lang (Committee Member)
Joseph Cavanaugh (Committee Member)
Daniel Sewell (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Biostatistics
Date degree season: Autumn 2025
DOI: 10.25820/etd.008201
Publisher: University of Iowa
Number of pages: xiv, 121 pages
Language: English
Date submitted: 10/23/2025
Description illustrations: illustrations, tables, graphs
Description bibliographic: Includes bibliographical references (pages 119-121).
Public Abstract (ETD): As data become increasingly available, analyzing datasets with a large number of predictors is commonplace. This becomes particularly difficult if the number of observations is small in comparison. For example, say a physician at a research hospital wants to build a tool to predict if patients with a rare cancer will achieve remission. In efforts to build the best tool possible, the doctor submits for a grant to collect patient samples and obtain biomarkers through lab tests. Over the study, data from 73 patients is obtained with over 10,000 biomarkers collected on each.

In this setting, traditional methods break down and a broad tool called sparse penalized regression is popular because it can be used to simultaneously select important features and produce reliable predictive estimates. Penalized regression introduces a little bit of bias in order to introduce stability in the estimation process, which is what allows penalized regression to be used when the number of predictors exceeds the number of observations.

Once estimates are obtained, it is often of interest to know how important selected features are or, rather, how much their estimated effects vary. One approach is to construct confidence intervals which provide a plausible range of values for the estimates. However, the bias and sparsity introduced by sparse penalized regression leads to difficulties. Current approaches debias to address this issue, but in doing so produce intervals that often do not contain the original estimates being used for prediction. This work proposes a coherent framework for interval construction that results in intervals that are consistent with corresponding estimates. Then a number of methods are proposed and evaluated both with each other and to existing de-biased approaches. Finally, the methods are extended to a wider range of sparse penalties and outcome types, including the one in this example.
Academic Unit: Biostatistics
Record Identifier: 9985135048602771

Metrics

1 Record Views