Target inference from collections of genomic intervals

Proc Natl Acad Sci U S A. 2013 Jun 18;110(25):E2271-8. doi: 10.1073/pnas.1306909110. Epub 2013 Jun 6.

Abstract

Finding regions of the genome that are significantly recurrent in noisy data are a common but difficult problem in present day computational biology. Cores of recurrent events (CORE) is a computational approach to solving this problem that is based on a formalized notion by which "core" intervals explain the observed data, where the number of cores is the "depth" of the explanation. Given that formalization, we implement CORE as a combinatorial optimization procedure with depth chosen from considerations of statistical significance. An important feature of CORE is its ability to explain data with cores of widely varying lengths. We examine the performance of this system with synthetic data, and then provide two demonstrations of its utility with actual data. Applying CORE to a collection of DNA copy number profiles from single cells of a given tumor, we determine tumor population phylogeny and find the features that separate subpopulations. Applying CORE to comparative genomic hybridization data from a large set of tumor samples, we define regions of recurrent copy number aberration in breast cancer.

Keywords: genome analysis; interval data; statistical inference.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Breast Neoplasms / genetics*
  • Breast Neoplasms / secondary
  • Comparative Genomic Hybridization / methods
  • Computational Biology / methods
  • DNA Copy Number Variations / genetics
  • Databases, Genetic
  • Female
  • Gene Expression Regulation, Neoplastic*
  • Genomics / methods*
  • Humans
  • Models, Genetic*
  • Oligonucleotide Array Sequence Analysis / methods
  • Phylogeny
  • Software
  • Transcriptome