A two-step integrated approach to detect differentially expressed genes in RNA-Seq data

J Bioinform Comput Biol. 2016 Dec;14(6):1650034. doi: 10.1142/S0219720016500347. Epub 2016 Sep 15.

Abstract

One of the primary objectives of ribonucleic acid (RNA) sequencing or RNA-Seq experiment is to identify differentially expressed (DE) genes in two or more treatment conditions. It is a common practice to assume that all read counts from RNA-Seq data follow overdispersed (OD) Poisson or negative binomial (NB) distribution, which is sometimes misleading because within each condition, some genes may have unvarying transcription levels with no overdispersion. In such a case, it is more appropriate and logical to consider two sets of genes: OD and non-overdispersed (NOD). We propose a new two-step integrated approach to distinguish DE genes in RNA-Seq data using standard Poisson and NB models for NOD and OD genes, respectively. This is an integrated approach because this method can be merged with any other NB-based methods for detecting DE genes. We design a simulation study and analyze two real RNA-Seq data to evaluate the proposed strategy. We compare the performance of this new method combined with the three [Formula: see text]-software packages namely edgeR, DESeq2, and DSS with their default settings. For both the simulated and real data sets, integrated approaches perform better or at least equally well compared to the regular methods embedded in these [Formula: see text]-packages.

Keywords: Next generation sequencing; RNA-Seq; differential expression; gene expression.

MeSH terms

  • Algorithms*
  • Binomial Distribution
  • Computer Simulation
  • Gene Expression Profiling / methods
  • Sequence Analysis, RNA / methods*
  • Software