Topics in statistical learning methods and algorithms
Abstract
Details
- Title: Subtitle
- Topics in statistical learning methods and algorithms
- Creators
- Qian Tang
- Contributors
- Boxiang Wang (Advisor)Kung-Sik Chan (Committee Member)Aixin Tan (Committee Member)Sanvesh Srivastava (Committee Member)Nathan Wikle (Committee Member)
- Resource Type
- Dissertation
- Degree Awarded
- Doctor of Philosophy (PhD), University of Iowa
- Degree in
- Statistics
- Date degree season
- Summer 2025
- DOI
- 10.25820/etd.008172
- Publisher
- University of Iowa
- Number of pages
- ix, 116 pages
- Copyright
- Copyright 2025 Qian Tang
- Language
- English
- Date submitted
- 07/28/2025
- Description illustrations
- illustrations (some color)
- Description bibliographic
- Includes bibliographical references (pages 59-72).
- Public Abstract (ETD)
Quantile regression is a powerful tool for understanding how different factors influence various points in the distribution of an outcome, making it especially useful in situations where effects are not uniform. However, its application has been limited by heavy computational demands. In Chapter 2, we present a new algorithm, fastkqr, which makes quantile regression significantly faster and more practical to use. Unlike traditional methods that produce rough approximations, fastkqr accurately computes regression results and includes smart techniques that reduce redundant calculations. We also enhance the method to improve interpretability when multiple quantile levels are involved. The algorithm is available through a public R package, and experiments show it achieves the same accuracy as leading methods, while being up to ten times faster.
In Chapter 3, we introduce QuanDA, a new method for classifying data when one group is much smaller than the other, a common issue in fields like medical research or cybersecurity. QuanDA uses ideas from quantile regression to handle such imbalance naturally and performs especially well when the data includes a large number of variables. Through theory, simulations, and real-world datasets, we show that QuanDA consistently beats existing approaches, such as decision trees and weighted classifiers.
Chapter 4 introduces a novel framework for transfer learning in clustering tasks, where the goal is to group data without labels. Our method extracts and integrates information from related datasets to improve clustering accuracy in high-dimensional settings. It is particularly suited for situations where the relationships between datasets are not explicitly known. We validate the proposed approach through comprehensive simulation studies.
Together, these contributions provide efficient and reliable methods for modern data analysis, especially in high-dimensional and complex settings.
- Academic Unit
- Statistics and Actuarial Science
- Record Identifier
- 9984948238002771