Building a pan-genome reference for a population

Ngan Nguyen; Glenn Hickey; Daniel R Zerbino; Brian Raney; Dent Earl; Joel Armstrong; W James Kent; David Haussler; Benedict Paten

doi:10.1089/cmb.2014.0146

Building a pan-genome reference for a population

J Comput Biol. 2015 May;22(5):387-401. doi: 10.1089/cmb.2014.0146. Epub 2015 Jan 7.

Authors

Ngan Nguyen¹, Glenn Hickey, Daniel R Zerbino, Brian Raney, Dent Earl, Joel Armstrong, W James Kent, David Haussler, Benedict Paten

Affiliation

¹ 1 Center for Biomolecular Science and Engineering, University of California , Santa Cruz, California.

Abstract

A reference genome is a high quality individual genome that is used as a coordinate system for the genomes of a population, or genomes of closely related subspecies. Given a set of genomes partitioned by homology into alignment blocks we formalize the problem of ordering and orienting the blocks such that the resulting ordering maximally agrees with the underlying genomes' ordering and orientation, creating a pan-genome reference ordering. We show this problem is NP-hard, but also demonstrate, empirically and within simulations, the performance of heuristic algorithms based upon a cactus graph decomposition to find locally maximal solutions. We describe an extension of our Cactus software to create a pan-genome reference for whole genome alignments, and demonstrate how it can be used to create novel genome browser visualizations using human variation data as a test. In addition, we test the use of a pan-genome for describing variations and as a reference for read mapping.

Keywords: algorithms; computational molecular biology; genomics; molecular evolution; sequence analysis.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Algorithms*
Computer Graphics
Evolution, Molecular
Genetics, Population / standards*
Genetics, Population / statistics & numerical data
Genome, Human*
Humans
Reference Standards
Sequence Alignment
Sequence Analysis, DNA
Software*

Abstract

Publication types

MeSH terms

Grants and funding