Practical Consideration of Genotype Imputation: Sample Size, Window Size, Reference Choice, and Untyped Rate

Stat Interface. 2011;4(3):339-352. doi: 10.4310/sii.2011.v4.n3.a8.

Abstract

Imputation offers a promising way to infer the missing and/or untyped genotypes in genetic studies. In practice, however, many factors may affect the quality of imputation. In this study, we evaluated the influence of untyped rate, sizes of the study sample and the reference sample, window size, and reference choice (for admixed population), as the factors affecting the quality of imputation. The results show that in order to obtain good imputation quality, it is necessary to have an untyped rate less than 50%, a reference sample size greater than 50, and a window size of greater than 500 SNPs (roughly 1 MB in base pairs). Compared with the whole-region imputation, piecewise imputation with large-enough window sizes provides improved efficacy. For an admixed study sample, if only an external reference panel is used, it should include samples from the ancestral populations that represent the admixed population under investigation. Internal references are strongly recommended. When internal references are limited, however, augmentation by external references should be used carefully. More specifically, augmentation with samples from the major source populations of the admixture can lower the quality of imputation; augmentation with seemingly genetically unrelated cohorts may improve the quality of imputation.