Thesis
Investigating racial/ethnic disparities in machine learning models for non-small cell lung cancer using SEER registry data
University of Iowa
Master of Science (MS), University of Iowa
Spring 2022
DOI: 10.17077/etd.006394
Abstract
Massive amounts of data are collected each day, and the way this data is utilized can lead to both positive and negative outcomes for certain groups of individuals. When data is used to model real-world decisions and systems, complications may arise since the trends, disparities, and biases in datasets are often perpetuated in artificial intelligence (AI) models. These models then produce unfair outcomes relative to an identified sensitive attribute such as race, ethnicity, gender identity, age, and religion (to name a few). Unfairness is present in AI models applied to a variety of industries, including the healthcare sector. In this study, we investigated the impact racial/ethnic disparities have on machine learning models, specifically looking at non-small cell lung cancer and utilizing SEER registry data. We used multivariate logistic regression models to predict whether a non-small cell lung cancer patient would be recommended surgery. Three models were built, one including race/ethnicity as a covariate, one interacting race/ethnicity with the remaining covariates, and one excluding race/ethnicity as a covariate. These models were then evaluated for both performance (using accuracy, F1-score, and ROC AUC) and fairness (using demographic parity, disparate impact, equalized odds, and equal opportunity). Lastly, relevant open-source unfairness mitigation techniques were applied to these models in an attempt to reduce unfairness.
Details
- Title: Subtitle
- Investigating racial/ethnic disparities in machine learning models for non-small cell lung cancer using SEER registry data
- Creators
- Cameron R. Trentz
- Contributors
- Guadalupe Canahuate (Advisor)Tyler Bell (Committee Member)Thomas Casavant (Committee Member)
- Resource Type
- Thesis
- Degree Awarded
- Master of Science (MS), University of Iowa
- Degree in
- Electrical and Computer Engineering
- Date degree season
- Spring 2022
- Publisher
- University of Iowa
- DOI
- 10.17077/etd.006394
- Number of pages
- vii, 51 pages
- Copyright
- Copyright 2022 Cameron Trentz
- Language
- English
- Description illustrations
- illustration (some color), tables, graphs
- Description bibliographic
- Includes bibliographical references (pages 43-45).
- Public Abstract (ETD)
- Data has become one of the most valuable assets in modern society. Companies and organizations collect enormous amounts of data from the public, and then use this data to predict future behaviors and outcomes. For example, online retailers such as Amazon track your item viewing, interaction, and purchase history, and show you offers you’re most likely to engage with. A problem arises from generating predictions such as this, since the data used to create these predictions can have some underlying level of disparity and bias. These disparities are then learned by artificial intelligence (AI) models and influence predictions. Subsequently, unfair outcomes are produced for certain groups of people affected by these disparities. Unfairness is present in AI models applied to a variety of industries, including the healthcare sector. To investigate how unfairness in AI software can impact diagnosis and treatment in the medical industry, we built AI models using non-small cell lung cancer data. We then evaluated whether these models generated fair predictions for all race/ethnicity groups. Based on these findings, we attempted to reduce unfairness in these models by utilizing unfairness mitigation tools.
- Academic Unit
- Electrical and Computer Engineering
- Record Identifier
- 9984271254002771
Metrics
3 File views/ downloads
47 Record Views