Improving the Power of Structural Variation Detection by Augmenting the Reference

PLoS One. 2015 Aug 31;10(8):e0136771. doi: 10.1371/journal.pone.0136771. eCollection 2015.

Abstract

The uses of the Genome Reference Consortium's human reference sequence can be roughly categorized into three related but distinct categories: as a representative species genome, as a coordinate system for identifying variants, and as an alignment reference for variation detection algorithms. However, the use of this reference sequence as simultaneously a representative species genome and as an alignment reference leads to unnecessary artifacts for structural variation detection algorithms and limits their accuracy. We show how decoupling these two references and developing a separate alignment reference can significantly improve the accuracy of structural variation detection, lead to improved genotyping of disease related genes, and decrease the cost of studying polymorphism in a population.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Base Sequence
  • Genome, Human / genetics*
  • Genomic Structural Variation*
  • Genotype
  • Humans
  • Membrane Proteins / genetics
  • Nerve Tissue Proteins / genetics
  • Reference Values
  • Sequence Alignment / methods*

Substances

  • CNTNAP2 protein, human
  • Membrane Proteins
  • Nerve Tissue Proteins

Grants and funding

PM was funded in part by National Science Foundation (nsf.gov), grants DBI-1356529 and IIS-1453527. J.S. and A.T.P. were supported by an NHMRC Program Grant (1054618) (www.nhmrc.gov.au/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.