Penalized linear mixed models for structured genetic data
Abstract
Details
- Title: Subtitle
- Penalized linear mixed models for structured genetic data
- Creators
- Anna C Reisetter
- Contributors
- Patrick Breheny (Advisor)Michael Jones (Committee Member)Jacob Michaelson (Committee Member)Kelli Ryckman (Committee Member)Kai Wang (Committee Member)
- Resource Type
- Dissertation
- Degree Awarded
- Doctor of Philosophy (PhD), University of Iowa
- Degree in
- Biostatistics
- Date degree season
- Summer 2021
- DOI
- 10.17077/etd.005994
- Publisher
- University of Iowa
- Number of pages
- xi, 115 pages
- Copyright
- Copyright 2021 Anna C Reisetter
- Language
- English
- Description illustrations
- illustrations (chiefly color)
- Description bibliographic
- Includes bibliographical references (pages 108-115)
- Public Abstract (ETD)
Genetic association studies have enhanced our understanding of the genetic basis of quantitative traits and disease. To that end, accurately identifying genotype-phenotype associations is of critical importance. Such associations may be used in a myriad of medical and scientific research including drug discovery, predictive models of disease, and the development of genetic risk scores. Penalized regression methods are a valuable tool with which to identify such associations when the number of variables exceeds the number of observations, as is common in genetic data. However, these methods face added complexity when applied to the analysis of GWAS data, which is often subject to relatedness and unobserved environmental effects. These factors result in complex sample structures, which, when unaccounted for, hinder analysis.
Penalized linear mixed models (LMMs) have been developed to accurately identify genotype-phenotype associations in the presence of dependent samples. In spite of this, the statistical properties of these models are not well understood. In addition, there is a lack of available software for their implementation. The first objective of this dissertation is to provide a detailed review of penalized LMMs for the analysis of structured genetic data, while examining their statistical properties in the genetic association setting. Second, we consider the statistical properties of penalized LMMs in a general setting, and provide recommendations for key components of their implementation, including appropriate data preprocessing. We demonstrate the benefits of our recommendations using both a general setting, and one specific to genetic data. We conclude with a detailed analysis of a large, empirical GWAS data set which contains complex correlation among samples. We use this analysis to illustrate the benefits and potential pitfalls of penalized LMMs compared to traditional GWAS methods, and to demonstrate the utility of penalizedLMM, an R package we have developed for the flexible, and user-friendly implementation of penalized LMMs.
- Academic Unit
- Biostatistics
- Record Identifier
- 9984124172902771