Representing and decomposing genomic structural variants as balanced integer flows on sequence graphs

BMC Bioinformatics. 2016 Sep 29;17(1):400. doi: 10.1186/s12859-016-1258-4.

Abstract

Background: The study of genomic variation has provided key insights into the functional role of mutations. Predominantly, studies have focused on single nucleotide variants (SNV), which are relatively easy to detect and can be described with rich mathematical models. However, it has been observed that genomes are highly plastic, and that whole regions can be moved, removed or duplicated in bulk. These structural variants (SV) have been shown to have significant impact on phenotype, but their study has been held back by the combinatorial complexity of the underlying models.

Results: We describe here a general model of structural variation that encompasses both balanced rearrangements and arbitrary copy-number variants (CNV).

Conclusions: In this model, we show that the space of possible evolutionary histories that explain the structural differences between any two genomes can be sampled ergodically.

Keywords: Copy-number variation; DCJ; Rearrangement; Structural variation.