Avoiding the redundant effect on regression analyses of including an outcome in the imputation model

Monelle Tamegnon

doi:10.17077/etd.3aofo5uh

Back

Avoiding the redundant effect on regression analyses of including an outcome in the imputation model

Dissertation

Open access

Avoiding the redundant effect on regression analyses of including an outcome in the imputation model

Monelle Tamegnon

University of Iowa

Doctor of Philosophy (PhD), University of Iowa

Spring 2018

DOI: 10.17077/etd.3aofo5uh

Files and links (1)

pdf

Tamegnon_uiowa_0096D_154752.68 MBDownload View

Free to read and download, Open Access

Abstract

Imputation is one well recognized method for handling missing data. Multiple imputation provides a framework for imputing missing data that incorporate uncertainty about the imputations at the analysis stage. An important factor to consider when performing multiple imputation is the imputation model. In particular, a careful choice of the covariates to include in the model is crucial. The current recommendation by several authors in the literature (Van Buren, 2012; Moons et al., 2006, Little and Rubin, 2002) is to include all variables that will appear in the analytical model including the outcome as covariates in the imputation model. When the goal of the analysis is to explore the relationship between the outcome and the variable with missing data (the target variable), this recommendation seems questionable. Should we make use of the outcome to fill-in the target variable missing observations and then use these filled-in observations along with the observed data on the target variable to explore the relationship of the target variable with the outcome? We believe that this approach is circular. Instead, we have designed multiple imputation approaches rooted in machines learning techniques that avoid the use of the outcome at the imputation stage and maintain reasonable inferential properties. We also compare our approaches performances to currently available methods.

Biostatistics

clustering

imputation model

multiple imputation

penalized splines

Details

Title: Subtitle: Avoiding the redundant effect on regression analyses of including an outcome in the imputation model
Creators: Monelle Tamegnon - University of Iowa
Contributors: Michael Jones (Advisor)
Gideon Zamba (Advisor)
Yusuf Menda (Committee Member)
Eric Foster (Committee Member)
Grant Brown (Committee Member)
Resource Type: Dissertation
Degree Awarded: Doctor of Philosophy (PhD), University of Iowa
Degree in: Biostatistics
Date degree season: Spring 2018
DOI: 10.17077/etd.3aofo5uh
Publisher: University of Iowa
Number of pages: xiii, 277 pages
Language: English
Date submitted: 08/29/2018
Description illustrations: color illustrations
Description bibliographic: Includes bibliographical references (pages 275-277).
Public Abstract (ETD): Multiple imputation is a statistical tool used for dealing with missing data. Multiple imputation provides educated guesses to fill-in the missing data. These guesses are based on predictions of the imputation model. The quality of these imputations depends greatly on the imputation model, in particular on the covariates used in this model. The recommendation in the multiple imputation literature is to use all variables that will appear in the analytic model, including the outcome as covariates in the imputation model. When the goal of the analytical model is to explore the relationship of the outcome with the variable to be imputed, it appears redundant to use the outcome to predict the missing values and then use the filled-in variable to explore its relationship with the outcome. In this dissertation, we have designed three different multiple approaches that avoid the use of the outcome at the imputation stage based on clustering and splines and compared the performances of our approaches to the currently available methods.
Academic Unit: Biostatistics
Record Identifier: 9983776858602771

Metrics

81 File views/ downloads

218 Record Views