Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo

Nat Genet. 2015 Dec;47(12):1393-401. doi: 10.1038/ng.3432. Epub 2015 Oct 26.

Abstract

The function of human regulatory regions depends exquisitely on their local genomic environment and on cellular context, complicating experimental analysis of common disease- and trait-associated variants that localize within regulatory DNA. We use allelically resolved genomic DNase I footprinting data encompassing 166 individuals and 114 cell types to identify >60,000 common variants that directly influence transcription factor occupancy and regulatory DNA accessibility in vivo. The unprecedented scale of these data enables systematic analysis of the impact of sequence variation on transcription factor occupancy in vivo. We leverage this analysis to develop accurate models of variation affecting the recognition sites for diverse transcription factors and apply these models to discriminate nearly 500,000 common regulatory variants likely to affect transcription factor occupancy across the human genome. The approach and results provide a new foundation for the analysis and interpretation of noncoding variation in complete human genomes and for systems-level investigation of disease-associated variants.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Chromatin / metabolism*
  • Gene Expression Regulation*
  • Genetic Variation / genetics*
  • Genome, Human
  • Genomics / methods
  • Humans
  • Phenotype
  • Polymorphism, Single Nucleotide / genetics*
  • Promoter Regions, Genetic / genetics*
  • Protein Binding
  • Regulatory Elements, Transcriptional / genetics*
  • Transcription Factors / genetics
  • Transcription Factors / metabolism*

Substances

  • Chromatin
  • Transcription Factors

Associated data

  • GEO/GSE18927
  • GEO/GSE26328
  • GEO/GSE29692
  • GEO/GSE30263
  • GEO/GSE55579