Statistical inference with large-scale trait imputation

Stat Med. 2024 Feb 20;43(4):625-641. doi: 10.1002/sim.9975. Epub 2023 Dec 1.

Abstract

Recently a nonparametric method called LS-imputation has been proposed for large-scale trait imputation based on a GWAS summary dataset and a large set of genotyped individuals. The imputed trait values, along with the genotypes, can be treated as an individual-level dataset for downstream genetic analyses, including those that cannot be done with GWAS summary data. However, since the covariance matrix of the imputed trait values is often too large to calculate, the current method imposes a working assumption that the imputed trait values are identically and independently distributed, which is incorrect in truth. Here we propose a "divide and conquer/combine" strategy to estimate and account for the covariance matrix of the imputed trait values via batches, thus relaxing the incorrect working assumption. Applications of the methods to the UK Biobank data for marginal association analysis showed some improvement by the new method in some cases, but overall the original method performed well, which was explained by nearly constant variances of and mostly weak correlations among imputed trait values.

Keywords: GWAS; LS-imputation; SNP; least squares; linear models.

MeSH terms

  • Genome-Wide Association Study* / methods
  • Genotype
  • Humans
  • Phenotype
  • Polymorphism, Single Nucleotide*