Promoting similarity of model sparsity structures in integrative analysis of cancer genetic data

Yuan Huang; Jin Liu; Huangdi Yi; Ben-Chang Shia; Shuangge Ma

doi:10.1002/sim.7138

Back

Promoting similarity of model sparsity structures in integrative analysis of cancer genetic data

Journal article

Peer reviewed

Promoting similarity of model sparsity structures in integrative analysis of cancer genetic data

Yuan Huang, Jin Liu, Huangdi Yi, Ben-Chang Shia and Shuangge Ma

Statistics in medicine, Vol.36(3), pp.509-559

02/10/2017

DOI: 10.1002/sim.7138

PMCID: PMC5209260

PMID: 27667129

View Online

Abstract

In profiling studies, the analysis of a single dataset often leads to unsatisfactory results because of the small sample size. Multi-dataset analysis utilizes information of multiple independent datasets and outperforms single-dataset analysis. Among the available multi-dataset analysis methods, integrative analysis methods aggregate and analyze raw data and outperform meta-analysis methods, which analyze multiple datasets separately and then pool summary statistics. In this study, we conduct integrative analysis and marker selection under the heterogeneity structure, which allows different datasets to have overlapping but not necessarily identical sets of markers. Under certain scenarios, it is reasonable to expect some similarity of identified marker sets - or equivalently, similarity of model sparsity structures - across multiple datasets. However, the existing methods do not have a mechanism to explicitly promote such similarity. To tackle this problem, we develop a sparse boosting method. This method uses a BIC/HDBIC criterion to select weak learners in boosting and encourages sparsity. A new penalty is introduced to promote the similarity of model sparsity structures across datasets. The proposed method has a intuitive formulation and is broadly applicable and computationally affordable. In numerical studies, we analyze right censored survival data under the accelerated failure time model. Simulation shows that the proposed method outperforms alternative boosting and penalization methods with more accurate marker identification. The analysis of three breast cancer prognosis datasets shows that the proposed method can identify marker sets with increased similarity across datasets and improved prediction performance. Copyright (c) 2016 John Wiley & Sons, Ltd.

Life Sciences & Biomedicine

Mathematical & Computational Biology

Mathematics

Medical Informatics

Medicine, Research & Experimental

Physical Sciences

Public, Environmental & Occupational Health

Research & Experimental Medicine

Science & Technology

Statistics & Probability

Details

Title: Subtitle: Promoting similarity of model sparsity structures in integrative analysis of cancer genetic data
Creators: Yuan Huang - Yale University
Jin Liu - Duke-NUS Medical School
Huangdi Yi - Yale University
Ben-Chang Shia - Taipei Medical University
Shuangge Ma - Yale University
Resource Type: Journal article
Publication Details: Statistics in medicine, Vol.36(3), pp.509-559
Publisher: Wiley
DOI: 10.1002/sim.7138
PMID: 27667129
PMCID: PMC5209260
ISSN: 0277-6715
eISSN: 1097-0258
Number of pages: 51
Grant note: 71471152; 71201139; 71301162 / National Natural Science Foundation of China; National Natural Science Foundation of China (NSFC) WBS: R-913-200-098-263 / Duke-NUS Graduate Medical School; National University of Singapore 13ZD148; 13CTJ001 / National Social Science Foundation of China CA142774; CA016359 / NIH; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA VA Cooperative Studies Program of the Department of Veterans Affairs, Office of Research and Development R01CA142774 / NATIONAL CANCER INSTITUTE; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA; NIH National Cancer Institute (NCI) UL1TR001863 / NATIONAL CENTER FOR ADVANCING TRANSLATIONAL SCIENCES; United States Department of Health & Human Services; National Institutes of Health (NIH) - USA; NIH National Center for Advancing Translational Sciences (NCATS)
Language: English
Date published: 02/10/2017
Academic Unit: Biostatistics
Record Identifier: 9984363600702771

Metrics

12 Record Views

13 Times Cited - Web of Science

See more details