Journal article
Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with aggregateBioVar
Bioinformatics, Vol.37(19), pp.3243-3251
10/11/2021
DOI: 10.1093/bioinformatics/btab337
PMCID: PMC8504643
PMID: 33970215
Abstract
Abstract
Motivation
Single-cell RNA-sequencing (scRNA-seq) provides more granular biological information than bulk RNA-sequencing; bulk RNA sequencing remains popular due to lower costs which allows processing more biological replicates and design more powerful studies. As scRNA-seq costs have decreased, collecting data from more than one biological replicate has become more feasible, but careful modeling of different layers of biological variation remains challenging for many users. Here, we propose a statistical model for scRNA-seq gene counts, describe a simple method for estimating model parameters and show that failing to account for additional biological variation in scRNA-seq studies can inflate false discovery rates (FDRs) of statistical tests.
Results
First, in a simulation study, we show that when the gene expression distribution of a population of cells varies between subjects, a naïve approach to differential expression analysis will inflate the FDR. We then compare multiple differential expression testing methods on scRNA-seq datasets from human samples and from animal models. These analyses suggest that a naïve approach to differential expression testing could lead to many false discoveries; in contrast, an approach based on pseudobulk counts has better FDR control.
Availability and implementation
A software package, aggregateBioVar, is freely available on Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/aggregateBioVar.html) to accommodate compatibility with upstream and downstream methods in scRNA-seq data analysis pipelines.
Supplementary information
Raw gene-by-cell count matrices for pig scRNA-seq data are available as GEO accession GSE150211. Supplementary data are available at Bioinformatics online.
Details
- Title: Subtitle
- Differential gene expression analysis for multi-subject single-cell RNA-sequencing studies with aggregateBioVar
- Creators
- Andrew L Thurman - Roy J. and Lucille A. Carver College of MedicineJason A Ratcliff - Roy J. and Lucille A. Carver College of MedicineMichael S Chimenti - University of IowaAlejandro A Pezzulo - Roy J. and Lucille A. Carver College of Medicine
- Contributors
- Anthony Mathelier (Editor)
- Resource Type
- Journal article
- Publication Details
- Bioinformatics, Vol.37(19), pp.3243-3251
- DOI
- 10.1093/bioinformatics/btab337
- PMID
- 33970215
- PMCID
- PMC8504643
- NLM abbreviation
- Bioinformatics
- ISSN
- 1367-4803
- eISSN
- 1460-2059
- Grant note
- DOI: 10.13039/100000002, name: National Institutes of Health, award: NHLBI K01HL140261; DOI: 10.13039/100000002, name: NIH, award: NIDDK DK54759; DOI: 10.13039/100000002, name: NIH, award: NIEHS ES005605
- Language
- English
- Date published
- 10/11/2021
- Academic Unit
- Pulmonary, Critical Care, and Occupational Medicine; Iowa Neuroscience Institute; Internal Medicine; Iowa Institute of Human Genetics
- Record Identifier
- 9984359884602771
Metrics
21 Record Views