Journal article
Fairness gaps in Machine learning models for hospitalization and emergency department visit risk prediction in home healthcare patients with heart failure
International journal of medical informatics (Shannon, Ireland), Vol.191, 105534
06/2024
DOI: 10.1016/j.ijmedinf.2024.105534
Abstract
•First study to assess fairness and biases in ML risk prediction models for heart failure patients in home healthcare settings.•Identified significant disparities in model performance across demographic subgroups (e.g., gender, race/ethnicity, socioeconomic status).•Emphasize the urgent need to address biases in ML models to ensure equitable healthcare delivery and mitigate disparities.•Advances ethical considerations in ML for risk prediction, promoting fairness and inclusivity in patient care.
This study aims to evaluate the fairness performance metrics of Machine Learning (ML) models to predict hospitalization and emergency department (ED) visits in heart failure patients receiving home healthcare. We analyze biases, assess performance disparities, and propose solutions to improve model performance in diverse subpopulations.
The study used a dataset of 12,189 episodes of home healthcare collected between 2015 and 2017, including structured (e.g., standard assessment tool) and unstructured data (i.e., clinical notes). ML risk prediction models, including Light Gradient-boosting model (LightGBM) and AutoGluon, were developed using demographic information, vital signs, comorbidities, service utilization data, and the area deprivation index (ADI) associated with the patient’s home address. Fairness metrics, such as Equal Opportunity, Predictive Equality, Predictive Parity, and Statistical Parity, were calculated to evaluate model performance across subpopulations.
Our study revealed significant disparities in model performance across diverse demographic subgroups. For example, the Hispanic, Male, High-ADI subgroup excelled in terms of Equal Opportunity with a metric value of 0.825, which was 28% higher than the lowest-performing Other, Female, Low-ADI subgroup, which scored 0.644. In Predictive Parity, the gap between the highest and lowest-performing groups was 29%, and in Statistical Parity, the gap reached 69%. In Predictive Equality, the difference was 45%.
The findings highlight substantial differences in fairness metrics across diverse patient subpopulations in ML risk prediction models for heart failure patients receiving home healthcare services. Ongoing monitoring and improvement of fairness metrics are essential to mitigate biases.
Details
- Title: Subtitle
- Fairness gaps in Machine learning models for hospitalization and emergency department visit risk prediction in home healthcare patients with heart failure
- Creators
- Anahita Davoudi - Merseburg University of Applied SciencesSena Chae - University of IowaLauren Evans - Merseburg University of Applied SciencesSridevi Sridharan - Merseburg University of Applied SciencesJiyoun Song - University of PennsylvaniaKathryn H. Bowles - Merseburg University of Applied SciencesMargaret V. McDonald - Merseburg University of Applied SciencesMaxim Topaz - Merseburg University of Applied Sciences
- Resource Type
- Journal article
- Publication Details
- International journal of medical informatics (Shannon, Ireland), Vol.191, 105534
- Publisher
- Elsevier B.V
- DOI
- 10.1016/j.ijmedinf.2024.105534
- ISSN
- 1386-5056
- eISSN
- 1872-8243
- Grant note
- Agency for Healthcare Research and Quality [AHRQ]: R01HS027742
This study was funded by Agency for Healthcare Research and Quality [AHRQ] (R01HS027742), "Building risk models for preventable hospitalizations and emergency department visits in homecare (Homecare-CONCERN)." The content is solely the responsibility of the authors and does not necessarily represent the official views of the Agency for Healthcare Research and Quality.
- Language
- English
- Electronic publication date
- 06/2024
- Academic Unit
- Nursing
- Record Identifier
- 9984652157102771
Metrics
2 Record Views