Advancing flexible methods for interpretation and forecasting of correlated data
Abstract
Details
- Title: Subtitle
- Advancing flexible methods for interpretation and forecasting of correlated data
- Creators
- Nicholas James Seedorff
- Contributors
- Grant D Brown (Advisor)Jacob J Oleson (Committee Member)Brian J Smith (Committee Member)Patrick J Breheny (Committee Member)Mary K Cowles (Committee Member)
- Resource Type
- Dissertation
- Degree Awarded
- Doctor of Philosophy (PhD), University of Iowa
- Degree in
- Biostatistics
- Date degree season
- Autumn 2021
- DOI
- 10.17077/etd.006287
- Publisher
- University of Iowa
- Number of pages
- xiv, 129 pages
- Copyright
- Copyright 2021 Nicholas James Seedorff
- Language
- English
- Description illustrations
- color illustrations
- Description bibliographic
- Includes bibliographical references (pages 118-126).
- Public Abstract (ETD)
Many studies of complex disease processes measure multiple facets of the illness, which can be quantified using different types of data. That data is often collected multiple times for the same set of subjects, which allows researchers to study disease progression and patient specific trajectories. For ordered variables where differences are not meaningful, such as `low', `medium', and `high', there are fewer analytical tools available, and the associated techniques may focus on a single aspect of the disease. We developed a statistical method that can jointly analyze multiple facets of an illness in order to better understand the disease process, and can be used to predict future disease status so that proactive measures can be taken. Motivation for this work came from a study of a neglected tropical disease; leishmaniasis, and its progression in U.S. foxhounds.
While the above method was designed for data that is collected across a number of subjects, some studies are based on a single individual. These single individual studies could be used to select optimal patient-specific treatment options, and observations may be taken with high frequency. Values obtained at one time point tend to be similar to those taken at nearby times, which needs to be accounted for in a statistical analysis. Once again interested in situations where multiple facets of an illness are measured, we developed a separate tool for use when a single subject is the focus.
Lastly, we developed a tool to help interpret machine learning algorithms, where model architectures are flexible, but interpretation is often difficult. When there are a large number of variables that are related, our method groups those variables and provides a graphical interpretation of the effect that group has on the outcome variable. To facilitate adoption of the proposed statistical methods, we developed software for all three of the described scenarios, which is publicly available for use in the R programming language.
- Academic Unit
- Biostatistics
- Record Identifier
- 9984210944702771