Real-world benchmarking and validation of foundation model transformers for endometrial cancer subtyping from histopathology

Vincent M Wagner; Casey M Cosgrove; Stephanie J Chen; Daniel T Griffin; Megan I Samuelson; Michael J Goodheart; Jesus Gonzalez-Bosquet

doi:10.1038/s41698-026-01402-4

Back

Real-world benchmarking and validation of foundation model transformers for endometrial cancer subtyping from histopathology

Journal article

Open access

Peer reviewed

Real-world benchmarking and validation of foundation model transformers for endometrial cancer subtyping from histopathology

Vincent M Wagner, Casey M Cosgrove, Stephanie J Chen, Daniel T Griffin, Megan I Samuelson, Michael J Goodheart and Jesus Gonzalez-Bosquet

NPJ precision oncology

04/04/2026

DOI: 10.1038/s41698-026-01402-4

PMID: 41935172

Files and links (1)

url

https://doi.org/10.1038/s41698-026-01402-4View

Published (Version of record) Open Access

Abstract

We benchmarked histopathology foundation encoders paired with attention-based multiple instance learning (MIL) against convolutional neural networks (CNNs) to assess their robustness for endometrial cancer molecular classification (MMR-deficient, p53 aberrant, POLE pathogenic mutation, and no specific molecular profile) from whole-slide images (WSIs) in a real-world cohort. A public cohort of 815 patients (1195 WSIs) was assembled for model development. Generalizability was evaluated using an external cohort of 720 patients (1357 WSIs). Models were trained using five-fold cross-validation and tested on the external cohort. Performance was summarized using macro-area under the receiver operating characteristic curve (AUC), macro-F1 score, and balanced accuracy. In cross-validation, foundation encoder models outperformed CNNs (macro-AUC 0.799-0.860 vs 0.715-0.829). The best configuration (Virchow2 with CLAM MIL) achieved macro-AUC 0.860, macro-F1 score 0.607, and balanced accuracy 0.647. On external validation, CNN performance degraded substantially, whereas foundation models retained higher discrimination. UNI2 with CLAM MIL achieved the highest external macro-AUC 0.780 with a macro-F1 score of 0.416 and balanced accuracy of 0.507. Subtype-level performance was highest for p53abn (AUC 0.851). When evaluated within a benchmarking framework, foundation encoders paired with attention-based MIL demonstrate improved generalization for endometrial cancer molecular subtyping from WSIs compared with CNNs, supporting their potential for subtype inference.

Details

Title: Subtitle: Real-world benchmarking and validation of foundation model transformers for endometrial cancer subtyping from histopathology
Creators: Vincent M Wagner - University of Iowa
Casey M Cosgrove - The Ohio State University Comprehensive Cancer Center – Arthur G. James Cancer Hospital and Richard J. Solove Research Institute
Stephanie J Chen - University of Iowa
Daniel T Griffin - University of Iowa
Megan I Samuelson - University of Iowa
Michael J Goodheart - University of Iowa
Jesus Gonzalez-Bosquet - University of Iowa, Obstetrics and Gynecology
Resource Type: Journal article
Publication Details: NPJ precision oncology
DOI: 10.1038/s41698-026-01402-4
PMID: 41935172
NLM abbreviation: NPJ Precis Oncol
ISSN: 2397-768X
eISSN: 2397-768X
Publisher: Springer Nature
Grant note: K12TR004382 / National Center for Advancing Translational Sciences of the National Institute of Health (The Institute for Clinical and Translational Science at the University of Iowa K12 Award Program)
Language: English
Electronic publication date: 04/04/2026
Academic Unit: Pathology; Obstetrics and Gynecology
Record Identifier: 9985151594202771

Metrics

1 Record Views