Using DNase digestion data to accurately identify transcription factor binding sites

Kaixuan Luo; Alexander J Hartemink

Using DNase digestion data to accurately identify transcription factor binding sites

Pac Symp Biocomput. 2013:80-91.

Authors

Kaixuan Luo¹, Alexander J Hartemink

Affiliation

¹ Program in Computational Biology and Bioinformatics, Duke University, Durham, NC 27708, USA. kaixuan.luo@duke.edu

PMID: 23424114
PMCID: PMC3716004

Abstract

Identifying binding sites of transcription factors (TFs) is a key task in deciphering transcriptional regulation. ChIP-based methods are used to survey the genomic locations of a single TF in each experiment. But methods combining DNase digestion data with TF binding specificity information could potentially be used to survey the locations of many TFs in the same experiment, provided such methods permit reasonable levels of sensitivity and specificity. Here, we present a simple such method that outperforms a leading recent method, centipede, marginally in human but dramatically in yeast (average auROC across 20 TFs increases from 74% to 94%). Our method is based on logistic regression and thus benefits from supervision, but we show that partially and completely unsupervised variants perform nearly as well. Because the number of parameters in our method is at least an order of magnitude smaller than CENTIPEDE, we dub it MILLIPEDE.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Binding Sites
Chromatin Immunoprecipitation
Computational Biology
Databases, Genetic
Deoxyribonucleases*
Humans
Logistic Models
Models, Biological
Saccharomyces cerevisiae Proteins / chemistry
Saccharomyces cerevisiae Proteins / metabolism
Software
Transcription Factors / chemistry*
Transcription Factors / metabolism*

Substances

Saccharomyces cerevisiae Proteins
Transcription Factors
Deoxyribonucleases

Abstract

Publication types

MeSH terms

Substances

Grants and funding