Topics in statistical learning methods and algorithms

Qian Tang

doi:10.25820/etd.008172

Back

Dissertation

Topics in statistical learning methods and algorithms

Qian Tang

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Summer 2025

DOI: 10.25820/etd.008172

Files and links (1)

pdf

QianTang_Thesis (2)833.92 kB

Embargoed Access, Embargo ends: 08/28/2027

Abstract

Quantile regression is a powerful tool for robust and heterogeneous learning that has seen applications in a diverse range of applied areas. However, its broader application is often hindered by the substantial computational demands arising from the non-smooth quantile loss function. In Chapter 2, we introduce a novel algorithm named fastkqr, which significantly advances the computation of quantile regression in reproducing kernel Hilbert spaces. The core of fastkqr is a finite smoothing algorithm that magically produces exact regression quantiles, rather than approximations. To further accelerate the algorithm, we equip fastkqr with a novel spectral technique that reuses matrix computations. In addition, we extend fastkqr to accommodate a flexible kernel quantile regression with a data-driven crossing penalty, addressing the interpretability challenges of crossing quantile curves at multiple levels. We have implemented fastkqr in a publicly available R package on CRAN. Extensive simulations and real applications show that fastkqr matches the accuracy of state-of-the-art algorithms but can operate up to an order of magnitude faster. In Chapter 3, we introduce Quantile-based Discriminant Analysis (QuanDA) to tackle the challenging problem of binary classification under severe class imbalance and high dimensionality. QuanDA leverages a novel connection with quantile regression and inherently accommodates class imbalance by selecting appropriate quantile levels. We provide comprehensive theoretical analysis to validate QuanDA in ultra-high dimensional settings. Through extensive simulation studies and applications to high-dimensional benchmark datasets, we demonstrate that QuanDA consistently outperforms existing classification methods for imbalanced data. In Chapter 4, we introduce a novel transfer learning framework for high-dimensional clustering that effectively leverages structural information from multiple related yet distinct source datasets. Unlike many existing approaches that depend on labeled data, our method operates in an unsupervised setting and is designed to identify latent similarities between the target task and various source tasks. This makes the proposed framework particularly suitable for analyzing complex, heterogeneous data. The effectiveness of the proposed method is demonstrated through extensive simulation studies.

Details

Title: Subtitle: Topics in statistical learning methods and algorithms
Creators: Qian Tang
Contributors: Boxiang Wang (Advisor)
Kung-Sik Chan (Committee Member)
Aixin Tan (Committee Member)
Sanvesh Srivastava (Committee Member)
Nathan Wikle (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Statistics
Date degree season: Summer 2025
DOI: 10.25820/etd.008172
Publisher: University of Iowa
Number of pages: ix, 116 pages
Language: English
Date submitted: 07/28/2025
Description illustrations: illustrations (some color)
Description bibliographic: Includes bibliographical references (pages 59-72).
Public Abstract (ETD): Quantile regression is a powerful tool for understanding how different factors influence various points in the distribution of an outcome, making it especially useful in situations where effects are not uniform. However, its application has been limited by heavy computational demands. In Chapter 2, we present a new algorithm, fastkqr, which makes quantile regression significantly faster and more practical to use. Unlike traditional methods that produce rough approximations, fastkqr accurately computes regression results and includes smart techniques that reduce redundant calculations. We also enhance the method to improve interpretability when multiple quantile levels are involved. The algorithm is available through a public R package, and experiments show it achieves the same accuracy as leading methods, while being up to ten times faster.

In Chapter 3, we introduce QuanDA, a new method for classifying data when one group is much smaller than the other, a common issue in fields like medical research or cybersecurity. QuanDA uses ideas from quantile regression to handle such imbalance naturally and performs especially well when the data includes a large number of variables. Through theory, simulations, and real-world datasets, we show that QuanDA consistently beats existing approaches, such as decision trees and weighted classifiers.

Chapter 4 introduces a novel framework for transfer learning in clustering tasks, where the goal is to group data without labels. Our method extracts and integrates information from related datasets to improve clustering accuracy in high-dimensional settings. It is particularly suited for situations where the relationships between datasets are not explicitly known. We validate the proposed approach through comprehensive simulation studies.

Together, these contributions provide efficient and reliable methods for modern data analysis, especially in high-dimensional and complex settings.
Academic Unit: Statistics and Actuarial Science
Record Identifier: 9984948238002771

Metrics

26 Record Views