Heterogeneity pursuit in regression analysis
Abstract
Details
- Title: Subtitle
- Heterogeneity pursuit in regression analysis
- Creators
- Wenda Tu
- Contributors
- Kung-Sik Chan (Advisor)Kate Cowles (Committee Member)Joseph Lang (Committee Member)Dale Zimmerman (Committee Member) - University of Iowa, BiostatisticsYuan Huang (Committee Member)
- Resource Type
- Dissertation
- Degree Awarded
- Doctor of Philosophy (PhD), University of Iowa
- Degree in
- Statistics
- Date degree season
- Autumn 2019
- DOI
- 10.17077/etd.005202
- Publisher
- University of Iowa
- Number of pages
- xiv, 185 pages
- Copyright
- Copyright 2019 Wenda Tu
- Language
- English
- Description illustrations
- color illustrations
- Description bibliographic
- Includes bibliographical references (pages 182-185).
- Public Abstract (ETD)
The applicability of linear regression hinges on the presumption that the dataset enjoys a single regression model. In reality, however, a dataset may admit multiple groups, each of which is subject to a distinct linear relationship between the response variable and the covariates. The aforementioned heterogeneity in regression functions may arise in the presence of certain latent variable (e.g. certain unknown genetic factor) that renders subtle interactions in the regression relationship. Or the regression function may be inherently nonlinear but can be well approximated by multiple linear functions, each of which applies to a different subdomain of some threshold variable; for instance, height is approximately a piecewise linear function of age. We propose a penalized regression method to pursue heterogeneity in regression functions. We demonstrate the effectiveness of our method, as well as its superiority to several existing methods via simulation studies. We also establish the theoretical properties of our method. Finally, we apply our method to a variety of real-world data scenarios, including a spatial dataset on climate change and global warming, a housing price dataset on the economic effects of a mining operation to the real estate market, and a medical dataset on the potential causes of pulmonary diseases among the former nuclear weapon workers. The group structure recovered through our method is consistent with some of the well-established scientific findings.
- Academic Unit
- Statistics and Actuarial Science
- Record Identifier
- 9983779599702771