Logo image
Integrating genomic and non-genomic data to stratify the risk of contralateral breast cancer after radiotherapy
Journal article   Peer reviewed

Integrating genomic and non-genomic data to stratify the risk of contralateral breast cancer after radiotherapy

Sangkyu Lee, Xiang Shu, Andriy Derkach, Anne S Reiner, Xiaolin Liang, Meghan Woods, Patrick Concannon, Charles F Lynch, Kathleen E Malone, Julia A Knight, …
International journal of radiation oncology, biology, physics
02/26/2026
DOI: 10.1016/j.ijrobp.2026.02.237
PMCID: PMC13051349
PMID: 41763494

View Online

Abstract

Women treated with radiation therapy (RT) for breast cancer have an increased risk of developing radiation-associated contralateral breast cancer (CBC). Predicting CBC events is challenging due to the complex interplay of genomic, treatment, personal, and clinical factors. This study investigated computational methods that integrate genome-wide single nucleotide polymorphisms (SNPs) and non-genomic data to develop a risk stratification model for developing CBC in women treated with RT for their first primary breast cancer.PURPOSEWomen treated with radiation therapy (RT) for breast cancer have an increased risk of developing radiation-associated contralateral breast cancer (CBC). Predicting CBC events is challenging due to the complex interplay of genomic, treatment, personal, and clinical factors. This study investigated computational methods that integrate genome-wide single nucleotide polymorphisms (SNPs) and non-genomic data to develop a risk stratification model for developing CBC in women treated with RT for their first primary breast cancer.This study used a subset of the population-based WECARE Study that included 633 CBC cases and 1,253 individually matched unilateral breast cancer (UBC) controls who were treated with RT and had SNP data available from a genome-wide association study (GWAS). The study population was split into training, validation, and test sets for rigorous modeling and validation. Three data integration methods were compared in terms of their ability to stratify CBC risk: 1) Naive integration, 2) Sequential integration, and 3) Sequential iterative integration. A biological analysis of the final model was performed using gene set enrichment analysis (GSEA) and protein-protein interaction (PPI) analysis with gene annotation information informed by the model.METHODSThis study used a subset of the population-based WECARE Study that included 633 CBC cases and 1,253 individually matched unilateral breast cancer (UBC) controls who were treated with RT and had SNP data available from a genome-wide association study (GWAS). The study population was split into training, validation, and test sets for rigorous modeling and validation. Three data integration methods were compared in terms of their ability to stratify CBC risk: 1) Naive integration, 2) Sequential integration, and 3) Sequential iterative integration. A biological analysis of the final model was performed using gene set enrichment analysis (GSEA) and protein-protein interaction (PPI) analysis with gene annotation information informed by the model.The best-performing integration method was the sequential iterative integration equipped with the mixed effect random forest (MERF) algorithm. This approach achieved an area under the curve of 0.64 to stratify CBC risk in the test set, representing moderate predictive power. Calibration analysis showed good agreement between the lowest and highest risk bins stratified using sorted predicted values in the test set, resulting in an odds ratio of 3.27 for both predicted and observed CBC occurrence. GSEA and PPI analysis revealed that genes with high importance scores were associated with pathways relevant to lipid and fatty acid metabolism as well as breast cancer sensitivity to tamoxifen.RESULTSThe best-performing integration method was the sequential iterative integration equipped with the mixed effect random forest (MERF) algorithm. This approach achieved an area under the curve of 0.64 to stratify CBC risk in the test set, representing moderate predictive power. Calibration analysis showed good agreement between the lowest and highest risk bins stratified using sorted predicted values in the test set, resulting in an odds ratio of 3.27 for both predicted and observed CBC occurrence. GSEA and PPI analysis revealed that genes with high importance scores were associated with pathways relevant to lipid and fatty acid metabolism as well as breast cancer sensitivity to tamoxifen.The MERF approach demonstrated the potential for integrating high-dimensional genomic and low-dimensional non-genomic data to stratify CBC risk.CONCLUSIONThe MERF approach demonstrated the potential for integrating high-dimensional genomic and low-dimensional non-genomic data to stratify CBC risk.

Details

Metrics

1 Record Views
Logo image