Logo image
Real-world benchmarking and validation of foundation model transformers for endometrial cancer subtyping from histopathology
Journal article   Open access   Peer reviewed

Real-world benchmarking and validation of foundation model transformers for endometrial cancer subtyping from histopathology

Vincent M Wagner, Casey M Cosgrove, Stephanie J Chen, Daniel T Griffin, Megan I Samuelson, Michael J Goodheart and Jesus Gonzalez-Bosquet
NPJ precision oncology
04/04/2026
DOI: 10.1038/s41698-026-01402-4
PMID: 41935172
url
https://doi.org/10.1038/s41698-026-01402-4View
Published (Version of record) Open Access

Abstract

We benchmarked histopathology foundation encoders paired with attention-based multiple instance learning (MIL) against convolutional neural networks (CNNs) to assess their robustness for endometrial cancer molecular classification (MMR-deficient, p53 aberrant, POLE pathogenic mutation, and no specific molecular profile) from whole-slide images (WSIs) in a real-world cohort. A public cohort of 815 patients (1195 WSIs) was assembled for model development. Generalizability was evaluated using an external cohort of 720 patients (1357 WSIs). Models were trained using five-fold cross-validation and tested on the external cohort. Performance was summarized using macro-area under the receiver operating characteristic curve (AUC), macro-F1 score, and balanced accuracy. In cross-validation, foundation encoder models outperformed CNNs (macro-AUC 0.799-0.860 vs 0.715-0.829). The best configuration (Virchow2 with CLAM MIL) achieved macro-AUC 0.860, macro-F1 score 0.607, and balanced accuracy 0.647. On external validation, CNN performance degraded substantially, whereas foundation models retained higher discrimination. UNI2 with CLAM MIL achieved the highest external macro-AUC 0.780 with a macro-F1 score of 0.416 and balanced accuracy of 0.507. Subtype-level performance was highest for p53abn (AUC 0.851). When evaluated within a benchmarking framework, foundation encoders paired with attention-based MIL demonstrate improved generalization for endometrial cancer molecular subtyping from WSIs compared with CNNs, supporting their potential for subtype inference.

Details

Metrics

1 Record Views
Logo image