Journal article
Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data
Bioinformatics, Vol.36(3), pp.805-812
02/01/2020
DOI: 10.1093/bioinformatics/btz640
PMCID: PMC9883676
PMID: 31400221
Abstract
Abstract
Motivation
Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform read distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide bias correction step(s), which is based on biological considerations—such as GC content—and applied in single samples separately. The main problem is that not all biases are known.
Results
We have developed a novel method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xβ, where the design matrix X is known and is computed based on the simplifying assumptions. In contrast XAEM considers Xβ as a bilinear model with both X and β unknown. Joint estimation of X and β is made possible by a simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. We use an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and β. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes. In a differential-expression analysis of a real single-cell RNA-seq dataset, XAEM achieves substantially better rediscovery rates in independent validation sets.
Availability and implementation
The method and pipeline are implemented as a tool and freely available for use at http://fafner.meb.ki.se/biostatwiki/xaem/.
Supplementary information
Supplementary data are available at Bioinformatics online.
Details
- Title: Subtitle
- Alternating EM algorithm for a bilinear model in isoform quantification from RNA-seq data
- Creators
- Wenjiang Deng - Karolinska InstitutetTian Mou - Karolinska InstitutetKrishna R Kalari - Department of Health Sciences Research, MN 55905, USANifang Niu - Mayo ClinicLiewei Wang - Mayo ClinicYudi Pawitan - Karolinska InstitutetTrung Nghia Vu - Karolinska Institutet
- Resource Type
- Journal article
- Publication Details
- Bioinformatics, Vol.36(3), pp.805-812
- Publisher
- Oxford University Press
- DOI
- 10.1093/bioinformatics/btz640
- PMID
- 31400221
- PMCID
- PMC9883676
- ISSN
- 1367-4803
- eISSN
- 1460-2059
- Number of pages
- 8
- Grant note
- Swedish Cancer Fonden, the Swedish Research Council (VR) China Scholarship Council (CSC) Swedish Foundation for Strategic Research (SSF)
- Language
- English
- Date published
- 02/01/2020
- Academic Unit
- Stead Family Department of Pediatrics; Medical Genetics and Genomics
- Record Identifier
- 9984701547202771
Metrics
12 Record Views