Contributions towards the Bayesian analysis of large and complex data

Chunlei Wang

doi:10.25820/etd.006494

Back

Contributions towards the Bayesian analysis of large and complex data

Dissertation

Open access

Contributions towards the Bayesian analysis of large and complex data

Chunlei Wang

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Spring 2022

DOI: 10.25820/etd.006494

Files and links (1)

pdf

Wang_Chunlei_Dissertation_Revised4.14 MBDownload View

Free to read and download, Open Access

Abstract

With the development of information technologies, acquiring data with a largesize and complex structures is much easier than before. Areas such as genomic, biomedical imaging, and social media are continuously producing a massive amount of complex data with extremely high dimensions. The abundance of information has led us into a new era of data analysis with the focus primarily on large and complex data. It is important to develop efficient methods to discover the relation between the features and response, and in the meantime to make accurate predictions for future observations. Bayesian inference, a formal statistical procedure for analyzing data, startsfrom its central feature of uncertainty quantification by imposing a prior distribution on the unknown parameters, then incorporates the observed data via the likelihood function, finally reaches the updated uncertainty quantification of the unknown parameters, which is formally defined as posterior distribution. A significant amount of sampling algorithms (Markov chain Monte Carlo) further strengthen the power of the Bayesian approach in a wide range of application areas. However, in modern data practice, the expensive computation cost of Bayesian methods partly impedes its availability when the data is enormously large with complex structures. This dissertation contributes to the understanding of Bayesian methods forlarge and complex data. In large data settings, I propose a scalable Bayesian method for analyzing large dependent data. The proposed method is rooted in the divide-and- conquer (D&C) technique and has a strong theoretical guarantee. In complex data settings, I study a nonparametric regression problem using Gaussian process (GP) priors under the high-dimensional regime by allowing certain noise on the input vector. The prediction accuracy under this complex data model is justified rigorously.

Big Data

Bayesian Inference

Complex Data

Gaussian Process

Hidden Markov Model

High Dimension

Details

Title: Subtitle: Contributions towards the Bayesian analysis of large and complex data
Creators: Chunlei Wang
Contributors: Sanvesh Srivastava (Advisor)
Kung-Sik Chan (Committee Member)
Jian Huang (Committee Member)
Cheng Li (Committee Member)
Aixin Tan (Committee Member)
Dale Zimmerman (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Statistics
Date degree season: Spring 2022
Publisher: University of Iowa
DOI: 10.25820/etd.006494
Number of pages: xiii, 228 pages
Language: English
Description illustrations: illustrations
Description bibliographic: Includes bibliographical references (pages 218-228).
Public Abstract (ETD): With the development of information technologies, acquiring data with a large size and complex structures is much easier than before. Areas such as genomic, biomedical imaging, and social media are continuously producing a massive amount of complex data with extremely high dimensions. The abundance of information has led us into a new era of data analysis with the focus primarily on large and complex data. It is important to develop eﬃcient methods to discover the relation between the features and response, and in the meantime to make accurate predictions for future observations.

Bayesian inference, a formal statistical procedure for analyzing data, starts from its central feature of uncertainty quantiﬁcation by imposing a prior distribution on the unknown parameters, then incorporates the observed data via the likelihood function, ﬁnally reaches the updated uncertainty quantiﬁcation of the unknown parameters, which is formally deﬁned as posterior distribution. A signiﬁcant amount of sampling algorithms (Markov chain Monte Carlo) further strengthen the power of the Bayesian approach in a wide range of application areas. However, in modern data practice, the expensive computation cost of Bayesian methods partly impedes its availability when the data is enormously large with complex structures.

This dissertation contributes to the understanding of Bayesian methods for large and complex data. In large data settings, I propose a scalable Bayesian method for analyzing large dependent data. The proposed method is rooted in the divide-and-conquer (D&C) technique and has a strong theoretical guarantee. In complex data settings, I study a nonparametric regression problem using Gaussian process (GP) priors under the high-dimensional regime by allowing certain noise on the input vector. The prediction accuracy under this complex data model is justiﬁed rigorously.
Academic Unit: Statistics and Actuarial Science
Record Identifier: 9984271055102771

Metrics

6 File views/ downloads

44 Record Views