LPEseq: Local-Pooled-Error Test for RNA Sequencing Experiments with a Small Number of Replicates

PLoS One. 2016 Aug 17;11(8):e0159182. doi: 10.1371/journal.pone.0159182. eCollection 2016.

Abstract

RNA-Sequencing (RNA-Seq) provides valuable information for characterizing the molecular nature of the cells, in particular, identification of differentially expressed transcripts on a genome-wide scale. Unfortunately, cost and limited specimen availability often lead to studies with small sample sizes, and hypothesis testing on differential expression between classes with a small number of samples is generally limited. The problem is especially challenging when only one sample per each class exists. In this case, only a few methods among many that have been developed are applicable for identifying differentially expressed transcripts. Thus, the aim of this study was to develop a method able to accurately test differential expression with a limited number of samples, in particular non-replicated samples. We propose a local-pooled-error method for RNA-Seq data (LPEseq) to account for non-replicated samples in the analysis of differential expression. Our LPEseq method extends the existing LPE method, which was proposed for microarray data, to allow examination of non-replicated RNA-Seq experiments. We demonstrated the validity of the LPEseq method using both real and simulated datasets. By comparing the results obtained using the LPEseq method with those obtained from other methods, we found that the LPEseq method outperformed the others for non-replicated datasets, and showed a similar performance with replicated samples; LPEseq consistently showed high true discovery rate while not increasing the rate of false positives regardless of the number of samples. Our proposed LPEseq method can be effectively used to conduct differential expression analysis as a preliminary design step or for investigation of a rare specimen, for which a limited number of samples is available.

MeSH terms

  • Algorithms*
  • Base Sequence
  • Computer Simulation
  • Gene Expression Profiling / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Models, Statistical*
  • RNA / genetics
  • Sample Size
  • Sequence Analysis, RNA / methods*

Substances

  • RNA

Grants and funding

This work is supported by the National Research Foundation of Korea (NRF) grants funded by the Korea government (MSIP) (no. 2012R1A3A2026438) and by the Bio-Synergy Research Project (2013M3A9C4078158) of the Ministry of Science, ICT and Future Planning through the NRF. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.