Streamlined analysis of pooled genotype data in SNP-based association studies

Genet Epidemiol. 2005 Apr;28(3):273-82. doi: 10.1002/gepi.20062.

Abstract

Several groups have developed methods for estimating allele frequencies in DNA pools as a fast and cheap way for detecting allelic association between genetic markers and disease. To obtain accurate estimates of allele frequencies, a correction factor k for the degree to which measurement of allele-specific products is biased is generally applied. Factor k is usually obtained as the ratio of the two allele-specific signals in samples from heterozygous individuals, a step that can significantly impair throughput and increase cost. We have systematically investigated the properties of k through the use of empirical and simulated data. We show that for the dye terminator primer extension genotyping method we have applied, the correction factor k is substantially influenced by the dye terminators incorporated, but also by the terminal 3' base of the extension primer. We also show that the variation in k is large enough to result in unacceptable error rates if association studies are conducted without regard to k. We show that the impact of ignoring k can be neutralized by applying a correction factor k(max) that can be easily derived, but this at the potential cost of an increase in type I error. Finally, based upon observed distributions for k we derive a method allowing the estimation of the probability pooled data reflects significant differences in the allele frequencies between the subjects comprising the pools. By controlling the error rates in the absence of knowledge of the appropriate SNP-specific correction factors, each approach enhances the performance of DNA pooling, while considerably streamlining the method by reducing time and cost.

MeSH terms

  • Alleles
  • Case-Control Studies
  • Computer Simulation
  • DNA / genetics
  • Gene Frequency
  • Genetic Markers / genetics
  • Genetic Predisposition to Disease / genetics
  • Genotype*
  • Humans
  • Models, Genetic*
  • Polymorphism, Single Nucleotide / genetics*

Substances

  • Genetic Markers
  • DNA