Logo image
Comparative Evaluation of Deep Learning Models for 3D Segmentation and Volumetry of Vestibular Schwannomas Using Large Heterogeneous Datasets with External Validation
Journal article   Peer reviewed

Comparative Evaluation of Deep Learning Models for 3D Segmentation and Volumetry of Vestibular Schwannomas Using Large Heterogeneous Datasets with External Validation

Parv M Mehta, Sahika B Yayli, Pranjal Rai, Milan Sonka, Daniel J Blezek, Honghai Zhang, Victoria M Silvera, John C Benson, Matthew L Carlson, Bradley J Erickson, …
American journal of neuroradiology : AJNR, Vol.47(5), pp.1266-1272
05/01/2026
DOI: 10.3174/ajnr.A9112
PMID: 41260669

View Online

Abstract

3D-segmentation and volumetry of vestibular schwannomas (VS) is a more accurate method to determine tumor growth on serial imaging, but manual annotation is time-consuming to implement in routine clinical practice. We evaluated and compared five deep learning-based segmentation models [nnUNet (base, ResEncL), U-Mamba, UNETR, and MedSAM] for 3D VS segmentation and volumetry, and examined robustness to acquisition heterogeneity and generalization on an external cohort MATERIALS AND METHODS: Our refined Internal dataset consisted of T1-contrast enhanced images, including 2,692 scans (n= 383 patients) for training and 277 scans (n=97 patients) for testing. Post model training and validation, performance was evaluated on both internal, as well as a publicly available external test set (n=241) using Dice similarity coefficient, Hausdorff distance, surfaceto-surface (S2S) distance and relative volume error (RVE). A sub-analysis of the model performance was also performed to evaluate the impact of tumor volumes and dataset heterogeneity. The median Dice score on the external test set varied between 0.899-0.927 with U-Mamba achieving highest performance, followed by nnUNet (base and ResEncL). For these top three models, the median Hausdorff distance was 3.59 mm, while the Hausdorff95 was 1.6 mm. The S2S distance was <1 mm and median RVE (%) varied between 0.07-0.08. Median Dice scores were lower (0.848-0.854) for smaller tumors (<200mm3) and higher for tumors >400 mm3(median Dice score 0.925-0.932). Models based on convolutional neural networks (CNNs), transformer networks as well as foundational models show robust performance for VS segmentation. Given the consistently high performance and self-optimizing frameworks of CNN based models (U-Mamba, nnUNet,), these may be more suitable for clinical applications. VS= vestibular schwannoma; CPA= cerebello-pontine angle; IAC= Internal Auditory Canal; DL= Deep-learning; CNN= convolutional neural network; SRS= stereotactic radiosurgery; RVE= relative volume error; OLS= ordinary least squares; LMM= linear mixed effects models; ANOVA= analysis of variance.

Details

Metrics

26 Record Views
Logo image