Logo image
Performance comparison of four published lung cancer prediction models applied to a cohort from the National Lung Screening Trial
Journal article   Open access   Peer reviewed

Performance comparison of four published lung cancer prediction models applied to a cohort from the National Lung Screening Trial

Kimberly E. Schroeder, Kevin Knoernschild, Sarah L. Averill, Richard M. Hoffman and Jessica C. Sieren
Translational lung cancer research, Vol.14(9), pp.3577-3588
09/2025
DOI: 10.21037/tlcr-2025-439
PMCID: PMC12541660
PMID: 41132971
url
https://doi.org/10.21037/tlcr-2025-439View
Published (Version of record) Open Access

Abstract

Background: Mathematical prediction models (MPMs) based on clinical and radiologist-assessed features have been developed to assist with lung cancer risk assessment for imaging-detected lung nodules. However, MPMs were developed using different datasets, thresholds, and feature sets, making it difficult to cross-compare the published performance metrics and determine prospective performance stability. The aim of this study is to utilize a large lung cancer screening cohort with identified pulmonary nodules to compare the performance of four MPMs, at a standardized sensitivity value, to reduce the false positive rate for lung cancer screening exams. Methods: This retrospective study utilized low-dose computed tomography (LDCT) identified lung nodules from the National Lung Screening Trial (NLST) to evaluate four MPMs [Mayo Clinic (MC), Veterans Affairs (VA), Peking University (PU), and Brock University (BU)]. For cross-comparison, a small NLST sub-cohort (n=270) was used to determine a calibrated decision threshold for each model, targeting a sensitivity for detecting lung cancer of 95%. Performance was evaluated using area under the receiver-operating-characteristic curve (AUC-ROC), area under the precision-recall curve (AUC-PR), sensitivity, and specificity. The calibrated threshold applied to the remaining NLST cohort (n=1,083) was used to demonstrate the stability of performance metrics. Results: A total of 1,353 patients [mean ± standard deviation (SD) age, 62.3±5.2 years; 746 male] were included, of which 122 (9.0%) had a malignant nodule. At the target sensitivity of 95%, the highest testing specificity (correctly identified benigns) was seen in the BU and MC models (55% and 52%, respectively), compared to the VA (45%) and the PU (16%). The AUC-ROCs for BU (83%), MC (83%), PU (76%), and VA (77%) suggest high-moderate performance, while AUC-PR more accurately reflects that all the models have sub-optimal precision (27–33%). Conclusions: Tuning calibration thresholds of existing MPM aids in performance comparison and stability for application in the lung cancer screening setting. However, targeting high sensitivity (95%), the achievable specificity of the MPMs is low (16–55%), which may limit clinical utility.
Prediction models computed tomography (CT) screening malignancy lung cancer

Details

Metrics

33 Record Views
Logo image