Logo image
Transfer learning for high-dimensional stochastic regression
Dissertation

Transfer learning for high-dimensional stochastic regression

Ting-Hung Yu
University of Iowa
Doctor of Philosophy (PhD), University of Iowa
Autumn 2025
DOI: 10.25820/etd.008211
pdf
Transfer_Learning_for_High_Dimensional_Stochastic_Regression_final_revision1.67 MB
Embargoed Access, Embargo ends: 01/23/2028

Abstract

In this dissertation, we investigate methods to improve statistical efficiency in estimation and prediction for a target dataset by leveraging information from multiple source datasets, a paradigm commonly known as ``transfer learning.'' We focus on linear stochastic regression and generalized stochastic regression models in high-dimensional settings, where the number of covariates may exceed the number of observations. Existing transfer learning approaches typically assume independent, light-tailed data. Their validity for data subject to temporal dependence and/or having heavy-tailed distributions is unexplored. To address this gap, we adopt functional dependence measures to quantify the effects of temporal dependence and heavy-tail distributions, and employ relevant concentration and moment inequalities to establish the convergence rates for transfer learning with serially dependent data in the oracle scenario, first for the case that all source datasets are informative. These results extend and generalize existing transfer learning methodologies. In practice, the source datasets may bear a wide spectrum of similarity to the target dataset, from enjoying an identical data-generating process (DGP) to having a dissimilar DGP. However, incorporating weakly informative or even dissimilar source datasets may lead to negative transfer, resulting in poorer performance on estimation and/or prediction than using the target data alone. To prevent negative transfer, we develop a self-normalized test statistic for screening non-informative source datasets and derive its asymptotic null distribution. The self-normalization technique avoids the need for estimating the long-run covariance matrix, which is especially desirable in the high-dimensional setting. Moreover, the critical values for the test can be readily computed via Monte Carlo simulation. This procedure is computationally efficient and takes full advantage of the temporal dependence in the data, offering an effective alternative to conventional cross-validation methods, which are designed for independent observations. We demonstrate the proposed methods using the thirty constituent stock price series of the Dow Jones index, with each constituent series taken as the target while the other constituent series serving as sources. We proposed several selection rules for screening non-informative source data, allowing users to fine-tune the inclusion of the source data. Data analysis shows that incorporating the informative source data results in substantial improvements in predictive performance in most cases. Finally, we summarize the main findings and discuss potential extensions, highlighting the applicability of the framework to high-dimensional, dependent, and heavy-tailed datasets, and outlining directions for future research in transfer learning for time series analysis.
Asymptotic Theory Financial Data Analysis Functional Dependence Measures High-Dimensional Time Series Self-Normalization Test Transfer Learning

Details

Metrics

1 Record Views
Logo image