Bayesian subgroup analysis in regression using mixture models

Yunju Im

doi:10.17077/etd.005354

Back

Bayesian subgroup analysis in regression using mixture models

Dissertation

Open access

Bayesian subgroup analysis in regression using mixture models

Yunju Im

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Spring 2020

DOI: 10.17077/etd.005354

Files and links (1)

pdf

Im Thesis2.31 MBDownload View

Free to read and download, Open Access

Abstract

Heterogeneity occurs in many regression problems, where members from different latent subgroups respond differently to the covariates of interest (e.g., treatments) even after adjusting for other covariates. Our work adopts a Bayesian model called the mixture of finite mixtures (MFM) to identify these subgroups. A key feature of this model is that the number of subgroups needs not to be known a priori, and is modeled as a random variable. The Bayesian MFM model was not commonly used in earlier applications largely due to computational difficulties. In comparison, an alternative infinite mixture model called the Dirichlet Process Mixture (DPM) model has been a main Bayesian tool for clustering even though it is a mis-specified model for many applications. The popularity of DPM is partly due to its convenient mathematical properties that enable efficient computing algorithms. We propose a class of conditional MFMs (cMFM) that are tailored to regression problems, and solve the corresponding computing problem by extending the MCMC scheme for general MFMs in Miller and Harrison (2018). We also address the important question of prior specification for the cMFM, by searching for good values of the hyperparameter subject to the empirical Bayes criterion. For computation, we propose a new algorithm that combines the simulated tempering technique and a coordinate-wise search algorithm to efficiently search for good hyperparameter values. Using simulation and real data examples, we demonstrate the advantages of our cMFM, notably more reasonable clustering results, compared to that of existing frequentist methods, the DPM, and the original MFM models in various setups. We also illustrate the benefits of using the coordinate-wise search algorithm for the choice of hyperparameter values with simulated data examples.

public abstract

Details

Title: Subtitle: Bayesian subgroup analysis in regression using mixture models
Creators: Yunju Im
Contributors: Aixin Tan (Advisor)
Jian Huang (Advisor)
Joyee Ghosh (Committee Member)
Brian J Smith (Committee Member)
Luke Tierney (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Statistics
Date degree season: Spring 2020
DOI: 10.17077/etd.005354
Publisher: University of Iowa
Number of pages: xi, 76 pages
Language: English
Description illustrations: illustrations (chiefly color)
Description bibliographic: Includes bibliographical references (pages 74-76).
Public Abstract (ETD): Regression has long been used to study the association between individuals and the variables of interest (e.g., medical treatments). In regression problems, there are many cases where individuals react diﬀerently to those variables of interest since individuals come from diﬀerent latent subgroups. Identifying such latent subgroups is important, especially in the medical ﬁeld, in the sense that it allows us to better estimate and understand group-speciﬁc treatment eﬀects. However, recovering such latent subgroups is not an easy task. One of the challenges is that we do not know how many subgroups exist among individuals. The number of subgroups needs to be estimated. Second, even after the number of subgroups is estimated, it is not easy to determine which individual belongs to which subgroup.

To answer the questions above, our work adopts a Bayesian model based on a mixture of ﬁnite mixtures (MFM), for which the number of subgroups needs not be speciﬁed a priori and is modeled as a random variable. That is, our model lets data tell us how many subgroups are present in the observed sample. We further study the issue of prior speciﬁcation, which is critical in any Bayesian modeling problem. We use the Bayes factor criterion to compare diﬀerent priors, and develop an algorithm to search for the optimal one eﬃciently. Using simulated and real data, we demonstrate the advantage of the proposed model and its computing, compared to that of existing methods.
Academic Unit: Statistics and Actuarial Science
Record Identifier: 9983949694402771

Metrics

62 File views/ downloads

145 Record Views