Preprint
Gaussian Process Regression and Classification using International Classification of Disease Codes as Covariates
ArXiv.org
Cornell University
08/03/2021
DOI: 10.48550/arXiv.2108.01813
Abstract
International Classification of Disease (ICD) codes are widely used for
encoding diagnoses in electronic health records (EHR). Automated methods have
been developed over the years for predicting biomedical responses using EHR
that borrow information among diagnostically similar patients. Relatively less
attention has been paid to developing patient similarity measures that model
the structure of ICD codes and the presence of multiple chronic conditions,
where a chronic condition is defined as a set of ICD codes. Motivated by this
problem, we first develop a type of string kernel function for measuring
similarity between a pair of subsets of ICD codes, which uses the definition of
chronic conditions. Second, we extend this similarity measure to define a
family of covariance functions on subsets of ICD codes. Using this family, we
develop Gaussian process (GP) priors for Bayesian nonparametric regression and
classification using diagnoses and other demographic information as covariates.
Markov chain Monte Carlo (MCMC) algorithms are used for posterior inference and
predictions. The proposed methods are free of any tuning parameters and are
well-suited for automated prediction of continuous and categorical biomedical
responses that depend on chronic conditions. We evaluate the practical
performance of our method on EHR data collected from 1660 patients at the
University of Iowa Hospitals and Clinics (UIHC) with six different primary
cancer sites. Our method has better sensitivity and specificity than its
competitors in classifying different primary cancer sites and estimates the
marginal associations between chronic conditions and primary cancer sites.
Details
- Title: Subtitle
- Gaussian Process Regression and Classification using International Classification of Disease Codes as Covariates
- Creators
- Sanvesh SrivastavaZongyi XuYunyi LiW. Nick StreetStephanie Gilbertson-White
- Resource Type
- Preprint
- Publication Details
- ArXiv.org
- DOI
- 10.48550/arXiv.2108.01813
- ISSN
- 2331-8422
- Publisher
- Cornell University
- Language
- English
- Date posted
- 08/03/2021
- Academic Unit
- Statistics and Actuarial Science; Bus Admin College; Nursing; Computer Science; Business Analytics; Internal Medicine
- Record Identifier
- 9984293095202771
Metrics
53 Record Views