Improving accuracy of rare variant imputation with a two-step imputation approach

Eur J Hum Genet. 2015 Mar;23(3):395-400. doi: 10.1038/ejhg.2014.91. Epub 2014 Jun 18.

Abstract

Genotype imputation has been the pillar of the success of genome-wide association studies (GWAS) for identifying common variants associated with common diseases. However, most GWAS have been run using only 60 HapMap samples as reference for imputation, meaning less frequent and rare variants not being comprehensively scrutinized. Next-generation arrays ensuring sufficient coverage together with new reference panels, as the 1000 Genomes panel, are emerging to facilitate imputation of low frequent single-nucleotide polymorphisms (minor allele frequency (MAF) <5%). In this study, we present a two-step imputation approach improving the quality of the 1000 Genomes imputation by genotyping only a subset of samples to create a local reference population on a dense array with many low-frequency markers. In this approach, the study sample, genotyped with a first generation array, is imputed first to the local reference sample genotyped on a dense array and hereafter to the 1000 Genomes reference panel. We show that mean imputation quality, measured by the r(2) using this approach, increases by 28% for variants with a MAF between 1 and 5% as compared with direct imputation to 1000 Genomes reference. Similarly, the concordance rate between calls of imputed and true genotypes was found to be significantly higher for heterozygotes (P<1e-15) and rare homozygote calls (P<1e-15) in this low frequency range. The two-step approach in our setting improves imputation quality compared with traditional direct imputation noteworthy in the low-frequency spectrum and is a cost-effective strategy in large epidemiological studies.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Aged
  • Alleles
  • Gene Frequency
  • Genetic Variation*
  • Genome-Wide Association Study* / methods
  • Genome-Wide Association Study* / standards
  • Genotype*
  • Genotyping Techniques* / methods
  • Genotyping Techniques* / standards
  • Humans
  • Middle Aged
  • Polymorphism, Single Nucleotide
  • Reproducibility of Results