ChIP-R: Assembling reproducible sets of ChIP-seq and ATAC-seq peaks from multiple replicates

Genomics. 2021 Jul;113(4):1855-1866. doi: 10.1016/j.ygeno.2021.04.026. Epub 2021 Apr 18.

Abstract

Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the primary protocol for detecting genome-wide DNA-protein interactions, and therefore a key tool for understanding transcriptional regulation. A number of factors, including low specificity of antibody and cellular heterogeneity of sample, may cause "peak" callers to output noise and experimental artefacts. Statistically combining multiple experimental replicates from the same condition could significantly enhance our ability to distinguish actual transcription factor binding events, even when peak caller accuracy and consistency of detection are compromised. We adapted the rank-product test to statistically evaluate the reproducibility from any number of ChIP-seq experimental replicates. We demonstrate over a number of benchmarks that our adaptation "ChIP-R" (pronounced 'chipper') performs as well as or better than comparable approaches on recovering transcription factor binding sites in ChIP-seq peak data. We also show ChIP-R extends to evaluate ATAC-seq peaks, finding reproducible peak sets even at low sequencing depth. ChIP-R decomposes peaks across replicates into "fragments" which either form part of a peak in a replicate, or not. We show that by re-analysing existing data sets, ChIP-R reconstructs reproducible peaks from fragments with enhanced biological enrichment relative to current strategies.

Keywords: ATAC-seq; ChIP-seq; Reproducibility; Transcription factor; Transcriptional regulation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Binding Sites
  • Chromatin Immunoprecipitation / methods
  • Chromatin Immunoprecipitation Sequencing*
  • High-Throughput Nucleotide Sequencing / methods
  • Reproducibility of Results
  • Sequence Analysis, DNA / methods