Investigating racial/ethnic disparities in machine learning models for non-small cell lung cancer using SEER registry data

Cameron R. Trentz

doi:10.17077/etd.006394

Back

Investigating racial/ethnic disparities in machine learning models for non-small cell lung cancer using SEER registry data

Thesis

Open access

Investigating racial/ethnic disparities in machine learning models for non-small cell lung cancer using SEER registry data

Cameron R. Trentz

University of Iowa

Master of Science (MS), University of Iowa

Spring 2022

DOI: 10.17077/etd.006394

Files and links (1)

pdf

Cameron Trentz Final Thesis1.72 MBDownload View

Free to read and download, Open Access

Abstract

Massive amounts of data are collected each day, and the way this data is utilized can lead to both positive and negative outcomes for certain groups of individuals. When data is used to model real-world decisions and systems, complications may arise since the trends, disparities, and biases in datasets are often perpetuated in artificial intelligence (AI) models. These models then produce unfair outcomes relative to an identified sensitive attribute such as race, ethnicity, gender identity, age, and religion (to name a few). Unfairness is present in AI models applied to a variety of industries, including the healthcare sector. In this study, we investigated the impact racial/ethnic disparities have on machine learning models, specifically looking at non-small cell lung cancer and utilizing SEER registry data. We used multivariate logistic regression models to predict whether a non-small cell lung cancer patient would be recommended surgery. Three models were built, one including race/ethnicity as a covariate, one interacting race/ethnicity with the remaining covariates, and one excluding race/ethnicity as a covariate. These models were then evaluated for both performance (using accuracy, F1-score, and ROC AUC) and fairness (using demographic parity, disparate impact, equalized odds, and equal opportunity). Lastly, relevant open-source unfairness mitigation techniques were applied to these models in an attempt to reduce unfairness.

Lung Cancer

Machine Learning

Public Health

artificial intelligence

disparities and bias

fairness

Details

Title: Subtitle: Investigating racial/ethnic disparities in machine learning models for non-small cell lung cancer using SEER registry data
Creators: Cameron R. Trentz
Contributors: Guadalupe Canahuate (Advisor)
Tyler Bell (Committee Member)
Thomas Casavant (Committee Member)
Resource Type: Thesis
Degree Awarded: Master of Science (MS), University of Iowa
Degree in: Electrical and Computer Engineering
Date degree season: Spring 2022
Publisher: University of Iowa
DOI: 10.17077/etd.006394
Number of pages: vii, 51 pages
Language: English
Description illustrations: illustration (some color), tables, graphs
Description bibliographic: Includes bibliographical references (pages 43-45).
Public Abstract (ETD): Data has become one of the most valuable assets in modern society. Companies and organizations collect enormous amounts of data from the public, and then use this data to predict future behaviors and outcomes. For example, online retailers such as Amazon track your item viewing, interaction, and purchase history, and show you offers you’re most likely to engage with. A problem arises from generating predictions such as this, since the data used to create these predictions can have some underlying level of disparity and bias. These disparities are then learned by artificial intelligence (AI) models and influence predictions. Subsequently, unfair outcomes are produced for certain groups of people affected by these disparities. Unfairness is present in AI models applied to a variety of industries, including the healthcare sector. To investigate how unfairness in AI software can impact diagnosis and treatment in the medical industry, we built AI models using non-small cell lung cancer data. We then evaluated whether these models generated fair predictions for all race/ethnicity groups. Based on these findings, we attempted to reduce unfairness in these models by utilizing unfairness mitigation tools.
Academic Unit: Electrical and Computer Engineering
Record Identifier: 9984271254002771

Metrics

3 File views/ downloads

47 Record Views