Journal article
Comparative Evaluation of Deep Learning Models for 3D Segmentation and Volumetry of Vestibular Schwannomas Using Large Heterogeneous Datasets with External Validation
American journal of neuroradiology : AJNR, Vol.47(5), pp.1266-1272
05/01/2026
DOI: 10.3174/ajnr.A9112
PMID: 41260669
Abstract
3D-segmentation and volumetry of vestibular schwannomas (VS) is a more accurate method to determine tumor growth on serial imaging, but manual annotation is time-consuming to implement in routine clinical practice. We evaluated and compared five deep learning-based segmentation models [nnUNet (base, ResEncL), U-Mamba, UNETR, and MedSAM] for 3D VS segmentation and volumetry, and examined robustness to acquisition heterogeneity and generalization on an external cohort MATERIALS AND METHODS: Our refined Internal dataset consisted of T1-contrast enhanced images, including 2,692 scans (n= 383 patients) for training and 277 scans (n=97 patients) for testing. Post model training and validation, performance was evaluated on both internal, as well as a publicly available external test set (n=241) using Dice similarity coefficient, Hausdorff distance, surfaceto-surface (S2S) distance and relative volume error (RVE). A sub-analysis of the model performance was also performed to evaluate the impact of tumor volumes and dataset heterogeneity.
The median Dice score on the external test set varied between 0.899-0.927 with U-Mamba achieving highest performance, followed by nnUNet (base and ResEncL). For these top three models, the median Hausdorff distance was 3.59 mm, while the Hausdorff95 was 1.6 mm. The S2S distance was <1 mm and median RVE (%) varied between 0.07-0.08. Median Dice scores were lower (0.848-0.854) for smaller tumors (<200mm3) and higher for tumors >400 mm3(median Dice score 0.925-0.932).
Models based on convolutional neural networks (CNNs), transformer networks as well as foundational models show robust performance for VS segmentation. Given the consistently high performance and self-optimizing frameworks of CNN based models (U-Mamba, nnUNet,), these may be more suitable for clinical applications.
VS= vestibular schwannoma; CPA= cerebello-pontine angle; IAC= Internal Auditory Canal; DL= Deep-learning; CNN= convolutional neural network; SRS= stereotactic radiosurgery; RVE= relative volume error; OLS= ordinary least squares; LMM= linear mixed effects models; ANOVA= analysis of variance.
Details
- Title: Subtitle
- Comparative Evaluation of Deep Learning Models for 3D Segmentation and Volumetry of Vestibular Schwannomas Using Large Heterogeneous Datasets with External Validation
- Creators
- Parv M Mehta - Mayo ClinicSahika B Yayli - Mayo ClinicPranjal Rai - Mayo ClinicMilan Sonka - University of IowaDaniel J Blezek - Mayo ClinicHonghai Zhang - University of IowaVictoria M Silvera - Mayo ClinicJohn C Benson - Mayo ClinicMatthew L Carlson - Mayo ClinicBradley J Erickson - Mayo ClinicGirish Bathla - Mayo Clinic
- Resource Type
- Journal article
- Publication Details
- American journal of neuroradiology : AJNR, Vol.47(5), pp.1266-1272
- DOI
- 10.3174/ajnr.A9112
- PMID
- 41260669
- NLM abbreviation
- AJNR Am J Neuroradiol
- ISSN
- 1936-959X
- eISSN
- 1936-959X
- Publisher
- American Society of Neuroradiology
- Language
- English
- Electronic publication date
- 11/19/2025
- Date published
- 05/01/2026
- Academic Unit
- Roy J. Carver Department of Biomedical Engineering; Electrical and Computer Engineering; Radiation Oncology; The Iowa Institute for Biomedical Imaging; Fraternal Order of Eagles Diabetes Research Center; Injury Prevention Research Center; Ophthalmology and Visual Sciences
- Record Identifier
- 9985033760702771
Metrics
26 Record Views