Journal article
Integrating genomic and non-genomic data to stratify the risk of contralateral breast cancer after radiotherapy
International journal of radiation oncology, biology, physics
02/26/2026
DOI: 10.1016/j.ijrobp.2026.02.237
PMCID: PMC13051349
PMID: 41763494
Abstract
Women treated with radiation therapy (RT) for breast cancer have an increased risk of developing radiation-associated contralateral breast cancer (CBC). Predicting CBC events is challenging due to the complex interplay of genomic, treatment, personal, and clinical factors. This study investigated computational methods that integrate genome-wide single nucleotide polymorphisms (SNPs) and non-genomic data to develop a risk stratification model for developing CBC in women treated with RT for their first primary breast cancer.PURPOSEWomen treated with radiation therapy (RT) for breast cancer have an increased risk of developing radiation-associated contralateral breast cancer (CBC). Predicting CBC events is challenging due to the complex interplay of genomic, treatment, personal, and clinical factors. This study investigated computational methods that integrate genome-wide single nucleotide polymorphisms (SNPs) and non-genomic data to develop a risk stratification model for developing CBC in women treated with RT for their first primary breast cancer.This study used a subset of the population-based WECARE Study that included 633 CBC cases and 1,253 individually matched unilateral breast cancer (UBC) controls who were treated with RT and had SNP data available from a genome-wide association study (GWAS). The study population was split into training, validation, and test sets for rigorous modeling and validation. Three data integration methods were compared in terms of their ability to stratify CBC risk: 1) Naive integration, 2) Sequential integration, and 3) Sequential iterative integration. A biological analysis of the final model was performed using gene set enrichment analysis (GSEA) and protein-protein interaction (PPI) analysis with gene annotation information informed by the model.METHODSThis study used a subset of the population-based WECARE Study that included 633 CBC cases and 1,253 individually matched unilateral breast cancer (UBC) controls who were treated with RT and had SNP data available from a genome-wide association study (GWAS). The study population was split into training, validation, and test sets for rigorous modeling and validation. Three data integration methods were compared in terms of their ability to stratify CBC risk: 1) Naive integration, 2) Sequential integration, and 3) Sequential iterative integration. A biological analysis of the final model was performed using gene set enrichment analysis (GSEA) and protein-protein interaction (PPI) analysis with gene annotation information informed by the model.The best-performing integration method was the sequential iterative integration equipped with the mixed effect random forest (MERF) algorithm. This approach achieved an area under the curve of 0.64 to stratify CBC risk in the test set, representing moderate predictive power. Calibration analysis showed good agreement between the lowest and highest risk bins stratified using sorted predicted values in the test set, resulting in an odds ratio of 3.27 for both predicted and observed CBC occurrence. GSEA and PPI analysis revealed that genes with high importance scores were associated with pathways relevant to lipid and fatty acid metabolism as well as breast cancer sensitivity to tamoxifen.RESULTSThe best-performing integration method was the sequential iterative integration equipped with the mixed effect random forest (MERF) algorithm. This approach achieved an area under the curve of 0.64 to stratify CBC risk in the test set, representing moderate predictive power. Calibration analysis showed good agreement between the lowest and highest risk bins stratified using sorted predicted values in the test set, resulting in an odds ratio of 3.27 for both predicted and observed CBC occurrence. GSEA and PPI analysis revealed that genes with high importance scores were associated with pathways relevant to lipid and fatty acid metabolism as well as breast cancer sensitivity to tamoxifen.The MERF approach demonstrated the potential for integrating high-dimensional genomic and low-dimensional non-genomic data to stratify CBC risk.CONCLUSIONThe MERF approach demonstrated the potential for integrating high-dimensional genomic and low-dimensional non-genomic data to stratify CBC risk.
Details
- Title: Subtitle
- Integrating genomic and non-genomic data to stratify the risk of contralateral breast cancer after radiotherapy
- Creators
- Sangkyu Lee - New York UniversityXiang Shu - Memorial Sloan Kettering Cancer CenterAndriy Derkach - Memorial Sloan Kettering Cancer CenterAnne S Reiner - Memorial Sloan Kettering Cancer CenterXiaolin LiangMeghan Woods - Memorial Sloan Kettering Cancer CenterPatrick Concannon - University of FloridaCharles F Lynch - University of IowaKathleen E Malone - Fred Hutch Cancer CenterJulia A Knight - Lunenfeld-Tanenbaum Research InstituteEsther M John - Stanford UniversityJoseph O Deasy - Memorial Sloan Kettering Cancer CenterJonine L Bernstein - Memorial Sloan Kettering Cancer CenterJung Hun Oh - Memorial Sloan Kettering Cancer Center
- Resource Type
- Journal article
- Publication Details
- International journal of radiation oncology, biology, physics
- DOI
- 10.1016/j.ijrobp.2026.02.237
- PMID
- 41763494
- PMCID
- PMC13051349
- ISSN
- 1879-355X
- eISSN
- 1879-355X
- Publisher
- Elsevier
- Language
- English
- Electronic publication date
- 02/26/2026
- Academic Unit
- Epidemiology
- Record Identifier
- 9985139269002771
Metrics
1 Record Views