BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations

Nucleic Acids Res. 2015 Dec 2;43(21):e147. doi: 10.1093/nar/gkv733. Epub 2015 Jul 21.

Abstract

Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein-DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein-DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Biophysical Phenomena
  • Cell Line
  • Genomics / methods
  • Humans
  • Models, Statistical*
  • Mutation
  • Polymorphism, Single Nucleotide*
  • Position-Specific Scoring Matrices
  • Principal Component Analysis
  • Protein Binding
  • Regulatory Elements, Transcriptional*
  • Transcription Factors / chemistry
  • Transcription Factors / metabolism

Substances

  • Transcription Factors