Accurate, scalable and integrative haplotype estimation

Nat Commun. 2019 Nov 28;10(1):5436. doi: 10.1038/s41467-019-13225-y.

Abstract

The number of human genomes being genotyped or sequenced increases exponentially and efficient haplotype estimation methods able to handle this amount of data are now required. Here we present a method, SHAPEIT4, which substantially improves upon other methods to process large genotype and high coverage sequencing datasets. It notably exhibits sub-linear running times with sample size, provides highly accurate haplotypes and allows integrating external phasing information such as large reference panels of haplotypes, collections of pre-phased variants and long sequencing reads. We provide SHAPEIT4 in an open source format and demonstrate its performance in terms of accuracy and running times on two gold standard datasets: the UK Biobank data and the Genome In A Bottle.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Specimen Banks
  • Data Interpretation, Statistical*
  • Datasets as Topic
  • Genotype
  • Haplotypes*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Polymorphism, Single Nucleotide
  • Sample Size
  • Sequence Analysis, DNA
  • Software*