Transfer learning for high-dimensional stochastic regression

Ting-Hung Yu

doi:10.25820/etd.008211

Back

Dissertation

Transfer learning for high-dimensional stochastic regression

Ting-Hung Yu

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Autumn 2025

DOI: 10.25820/etd.008211

Files and links (1)

pdf

Transfer_Learning_for_High_Dimensional_Stochastic_Regression_final_revision1.67 MB

Embargoed Access, Embargo ends: 01/23/2028

Abstract

In this dissertation, we investigate methods to improve statistical efficiency in estimation and prediction for a target dataset by leveraging information from multiple source datasets, a paradigm commonly known as ``transfer learning.'' We focus on linear stochastic regression and generalized stochastic regression models in high-dimensional settings, where the number of covariates may exceed the number of observations. Existing transfer learning approaches typically assume independent, light-tailed data. Their validity for data subject to temporal dependence and/or having heavy-tailed distributions is unexplored. To address this gap, we adopt functional dependence measures to quantify the effects of temporal dependence and heavy-tail distributions, and employ relevant concentration and moment inequalities to establish the convergence rates for transfer learning with serially dependent data in the oracle scenario, first for the case that all source datasets are informative. These results extend and generalize existing transfer learning methodologies. In practice, the source datasets may bear a wide spectrum of similarity to the target dataset, from enjoying an identical data-generating process (DGP) to having a dissimilar DGP. However, incorporating weakly informative or even dissimilar source datasets may lead to negative transfer, resulting in poorer performance on estimation and/or prediction than using the target data alone. To prevent negative transfer, we develop a self-normalized test statistic for screening non-informative source datasets and derive its asymptotic null distribution. The self-normalization technique avoids the need for estimating the long-run covariance matrix, which is especially desirable in the high-dimensional setting. Moreover, the critical values for the test can be readily computed via Monte Carlo simulation. This procedure is computationally efficient and takes full advantage of the temporal dependence in the data, offering an effective alternative to conventional cross-validation methods, which are designed for independent observations. We demonstrate the proposed methods using the thirty constituent stock price series of the Dow Jones index, with each constituent series taken as the target while the other constituent series serving as sources. We proposed several selection rules for screening non-informative source data, allowing users to fine-tune the inclusion of the source data. Data analysis shows that incorporating the informative source data results in substantial improvements in predictive performance in most cases. Finally, we summarize the main findings and discuss potential extensions, highlighting the applicability of the framework to high-dimensional, dependent, and heavy-tailed datasets, and outlining directions for future research in transfer learning for time series analysis.

Asymptotic Theory

Financial Data Analysis

Functional Dependence Measures

High-Dimensional Time Series

Self-Normalization Test

Transfer Learning

Details

Title: Subtitle: Transfer learning for high-dimensional stochastic regression
Creators: Ting-Hung Yu
Contributors: Kung-Sik Chan (Advisor)
Sanvesh Srivastava (Committee Member)
Andrew M. Thomas (Committee Member)
Boxiang Wang (Committee Member)
Dale Zimmerman (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Statistics
Date degree season: Autumn 2025
DOI: 10.25820/etd.008211
Publisher: University of Iowa
Number of pages: xvi, 132 pages
Language: English
Date submitted: 12/09/2025
Description illustrations: illustrations, graphs, tables
Description bibliographic: Includes bibliographical references (pages 111-120).
Public Abstract (ETD): This dissertation explores how to improve estimation or prediction for a target dataset by learning from several related datasets -- a strategy known as transfer learning . While transfer learning has shown promise across many fields, most existing methods rely on assumptions such as independence among datasets. In real data, especially for time series like financial data, this assumption often fails. For example, financial time series frequently exhibit local trends and extreme observations, making prediction more challenging. To address these issues, we evaluate the performance of existing transfer learning methods on dependent and heavy-tailed time series data. We also introduce a new screening procedure to identify which related time series are beneficial for transfer learning and which may cause negative transfer, where adding extra data worsens estimation or prediction. The proposed screening method is computationally efficient, leverages the dynamic structure of the time series data, and bypasses limitations of traditional approaches that assume independence. We illustrate the proposed methods using a panel of stock price data that underlies the calculation of the Dow Jones index. Our results show that, when carefully adapted to the structure of time series data, transfer learning can substantially improve forecasting accuracy in real-world applications.
Academic Unit: Statistics and Actuarial Science
Record Identifier: 9985134848502771

Metrics

1 Record Views