Advancing flexible methods for interpretation and forecasting of correlated data

Nicholas James Seedorff

doi:10.17077/etd.006287

Back

Advancing flexible methods for interpretation and forecasting of correlated data

Dissertation

Open access

Advancing flexible methods for interpretation and forecasting of correlated data

Nicholas James Seedorff

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Autumn 2021

DOI: 10.17077/etd.006287

Files and links (1)

pdf

nick_seedorff_dissertation_updated1.65 MBDownload View

Free to read and download, Open Access

Abstract

Correlated data manifests itself in a variety of situations, and while analysis goals can be situation dependent, a common theme is the need to account for the structural dependencies in the data. This thesis advances methods for serially correlated multivariate data of mixed outcome type, as well as interpretation tools for black-box models with highly correlated features. In this work, we first developed a Bayesian vector autoregressive model that can be used to analyze any combination of ordinal, binary, and continuous time series. This was followed by a Bayesian hierarchical model intended for longitudinal data, where we accounted for serial dependence through subject specific effects and an autoregressive error structure. Both of the aforementioned approaches can be used for forecasting, gaining insight into the process under study, or adjusting for autocorrelation while estimating treatment effects. The longitudinal work was motivated by data from a study of leishmaniasis in U.S. foxhounds, and software for both the time series and longitudinal implementations is available in the bmrarm R package. In our final paper, we detailed a model agnostic interpretation tool for black-box models. Conceptually, this technique groups covariates through a principal components analysis, and then interprets the total effects of the groups. This tool, along with additional functionality, is delivered through the totalvis R package.

public abstract

Details

Title: Subtitle: Advancing flexible methods for interpretation and forecasting of correlated data
Creators: Nicholas James Seedorff
Contributors: Grant D Brown (Advisor)
Jacob J Oleson (Committee Member)
Brian J Smith (Committee Member)
Patrick J Breheny (Committee Member)
Mary K Cowles (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Biostatistics
Date degree season: Autumn 2021
DOI: 10.17077/etd.006287
Publisher: University of Iowa
Number of pages: xiv, 129 pages
Language: English
Description illustrations: color illustrations
Description bibliographic: Includes bibliographical references (pages 118-126).
Public Abstract (ETD): Many studies of complex disease processes measure multiple facets of the illness, which can be quantified using different types of data. That data is often collected multiple times for the same set of subjects, which allows researchers to study disease progression and patient specific trajectories. For ordered variables where differences are not meaningful, such as `low', `medium', and `high', there are fewer analytical tools available, and the associated techniques may focus on a single aspect of the disease. We developed a statistical method that can jointly analyze multiple facets of an illness in order to better understand the disease process, and can be used to predict future disease status so that proactive measures can be taken. Motivation for this work came from a study of a neglected tropical disease; leishmaniasis, and its progression in U.S. foxhounds.

While the above method was designed for data that is collected across a number of subjects, some studies are based on a single individual. These single individual studies could be used to select optimal patient-specific treatment options, and observations may be taken with high frequency. Values obtained at one time point tend to be similar to those taken at nearby times, which needs to be accounted for in a statistical analysis. Once again interested in situations where multiple facets of an illness are measured, we developed a separate tool for use when a single subject is the focus.

Lastly, we developed a tool to help interpret machine learning algorithms, where model architectures are flexible, but interpretation is often difficult. When there are a large number of variables that are related, our method groups those variables and provides a graphical interpretation of the effect that group has on the outcome variable. To facilitate adoption of the proposed statistical methods, we developed software for all three of the described scenarios, which is publicly available for use in the R programming language.
Academic Unit: Biostatistics
Record Identifier: 9984210944702771

Metrics

22 File views/ downloads

204 Record Views