Contributions towards the Bayesian analysis of large and complex data
Abstract
Details
- Title: Subtitle
- Contributions towards the Bayesian analysis of large and complex data
- Creators
- Chunlei Wang
- Contributors
- Sanvesh Srivastava (Advisor)Kung-Sik Chan (Committee Member)Jian Huang (Committee Member)Cheng Li (Committee Member)Aixin Tan (Committee Member)Dale Zimmerman (Committee Member)
- Resource Type
- Dissertation
- Degree Awarded
- Doctor of Philosophy (PhD), University of Iowa
- Degree in
- Statistics
- Date degree season
- Spring 2022
- Publisher
- University of Iowa
- DOI
- 10.25820/etd.006494
- Number of pages
- xiii, 228 pages
- Copyright
- Copyright 2022 Chunlei Wang
- Language
- English
- Description illustrations
- illustrations
- Description bibliographic
- Includes bibliographical references (pages 218-228).
- Public Abstract (ETD)
With the development of information technologies, acquiring data with a large size and complex structures is much easier than before. Areas such as genomic, biomedical imaging, and social media are continuously producing a massive amount of complex data with extremely high dimensions. The abundance of information has led us into a new era of data analysis with the focus primarily on large and complex data. It is important to develop efficient methods to discover the relation between the features and response, and in the meantime to make accurate predictions for future observations.
Bayesian inference, a formal statistical procedure for analyzing data, starts from its central feature of uncertainty quantification by imposing a prior distribution on the unknown parameters, then incorporates the observed data via the likelihood function, finally reaches the updated uncertainty quantification of the unknown parameters, which is formally defined as posterior distribution. A significant amount of sampling algorithms (Markov chain Monte Carlo) further strengthen the power of the Bayesian approach in a wide range of application areas. However, in modern data practice, the expensive computation cost of Bayesian methods partly impedes its availability when the data is enormously large with complex structures.
This dissertation contributes to the understanding of Bayesian methods for large and complex data. In large data settings, I propose a scalable Bayesian method for analyzing large dependent data. The proposed method is rooted in the divide-and-conquer (D&C) technique and has a strong theoretical guarantee. In complex data settings, I study a nonparametric regression problem using Gaussian process (GP) priors under the high-dimensional regime by allowing certain noise on the input vector. The prediction accuracy under this complex data model is justified rigorously.
- Academic Unit
- Statistics and Actuarial Science
- Record Identifier
- 9984271055102771