GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs

Nat Commun. 2019 Nov 27;10(1):5402. doi: 10.1038/s41467-019-13341-9.

Abstract

Analysis of sequence diversity in the human genome is fundamental for genetic studies. Structural variants (SVs) are frequently omitted in sequence analysis studies, although each has a relatively large impact on the genome. Here, we present GraphTyper2, which uses pangenome graphs to genotype SVs and small variants using short-reads. Comparison to the syndip benchmark dataset shows that our SV genotyping is sensitive and variant segregation in families demonstrates the accuracy of our approach. We demonstrate that incorporating public assembly data into our pipeline greatly improves sensitivity, particularly for large insertions. We validate 6,812 SVs on average per genome using long-read data of 41 Icelanders. We show that GraphTyper2 can simultaneously genotype tens of thousands of whole-genomes by characterizing 60 million small variants and half a million SVs in 49,962 Icelanders, including 80 thousand SVs with high-confidence.

MeSH terms

  • Computer Graphics
  • Databases, Genetic
  • Genetics, Population
  • Genome, Human*
  • Genomic Structural Variation*
  • Genotyping Techniques / methods*
  • Genotyping Techniques / statistics & numerical data
  • Humans
  • Iceland
  • Pedigree
  • Polymorphism, Single Nucleotide
  • Reproducibility of Results
  • Software*
  • Workflow