Bayesian optimization of Random Forest hyperparameters for radiomics-based detection of clinically significant prostate cancer

Ivan Everett Johnson-Eversoll

doi:10.25820/etd.008082

Back

Bayesian optimization of Random Forest hyperparameters for radiomics-based detection of clinically significant prostate cancer

Thesis

Open access

Bayesian optimization of Random Forest hyperparameters for radiomics-based detection of clinically significant prostate cancer

Ivan Everett Johnson-Eversoll

University of Iowa

Master of Science (MS), University of Iowa

Summer 2025

DOI: 10.25820/etd.008082

Files and links (1)

pdf

Thesis_Document_Final1.67 MBDownload View

Free to read and download, Open Access

Abstract

This thesis presents a case study in enhancing a medical imaging machine learning (ML) system for prostate cancer detection. Prostate cancer affects approximately one in eight men in the United States, making early and accurate detection critical for patient outcomes. While ML systems show promise for improving diagnostic accuracy, their effectiveness depends fundamentally on the quality and integrity of the data. Through systematic analysis of DICOM medical imaging data from clinical environments, this research identifies critical gaps between theoretical ML advancements and practical implementation. The study investigates how clinical data often violates assumptions made during work on research data, thus undermining ML performance when deployed in healthcare environments. By implementing comprehensive validation methodologies, the research demonstrates techniques for establishing data trustworthiness, including duplicate identification, metadata verification, and cross-database synchronization. The work further establishes frameworks for ground truth standardization, addressing continuity challenges in annotation that impact model training. By documenting the complete process—from DICOM file management to training pipeline reconstruction—this research contributes to practical knowledge rarely addressed in academic literature but commonly encountered in clinical settings. Results indicate significant improvements in both data integrity and model performance after implementation of the proposed methodologies. The findings emphasize the importance of rigorous data validation processes before ML training. This work provides actionable frameworks for medical imaging professionals and ML developers to ensure translational success from research innovations to clinical applications, ultimately improving diagnostic capabilities for prostate cancer detection.

Details

Title: Subtitle: Bayesian optimization of Random Forest hyperparameters for radiomics-based detection of clinically significant prostate cancer
Creators: Ivan Everett Johnson-Eversoll
Contributors: Gary Christensen (Advisor)
Hans J. Johnson (Committee Member)
Xiaodong Wu (Committee Member)
Guadalupe Canahuate (Committee Member)
Resource Type: Thesis
Degree Awarded: Master of Science (MS), University of Iowa
Degree in: Electrical and Computer Engineering
Date degree season: Summer 2025
DOI: 10.25820/etd.008082
Publisher: University of Iowa
Number of pages: x, 80 pages
Language: English
Date submitted: 07/11/2025
Description illustrations: Illustrations, tables, graphs, charts
Description bibliographic: Includes bibliographical references (pages 69-80).
Public Abstract (ETD): Prostate cancer affects approximately one in eight men in the United States during their lifetime, making it one of the most common cancers in men. Early and accurate detection is crucial for successful treatment outcomes. Machine learning (ML) holds tremendous promise for improving detection rates and reducing diagnostic variability. However, implementing these technologies broadly in clinical settings has faced significant challenges.

This thesis presents a comprehensive case study of preparing a medical imaging ML system for prostate cancer detection from foundational assumptions. It addresses the fundamental issue that ML systems are only as effective as the data on which they are trained. Through systematic analysis of DICOM medical imaging data, ground truth standardization, and rigorous validation methodologies, this work demonstrates how untrustworthy data and undocumented assumptions can undermine ML performance in healthcare applications.

This research presents practical frameworks for ensuring data integrity, standardizing ground-truth annotations, and building maintainable machine learning training pipelines while addressing real-world clinical constraints. This work focuses on the process of improving an existing clinical data infrastructure for the purpose of ML refinements. This approach provides unique insights into challenges that are rarely addressed in academic literature but are commonly encountered in practice.

The findings underscore the crucial need to bridge the gap between theoretical advancements and clinical implementation. This work provides actionable methodologies for medical imaging professionals and ML developers to ensure that promising research innovations can effectively translate into technologies that benefit patients in clinical settings.
Academic Unit: Electrical and Computer Engineering
Record Identifier: 9984948341202771

Metrics

8 Record Views