Reconstituting the frequency spectrum of ascertained single-nucleotide polymorphism data

Genetics. 2004 Dec;168(4):2373-82. doi: 10.1534/genetics.104.031039. Epub 2004 Sep 15.

Abstract

Most of the available SNP data have eluded valid population genetic analysis because most population genetical methods do not correctly accommodate the special discovery process used to identify SNPs. Most of the available SNP data have allele frequency distributions that are biased by the ascertainment protocol. We here show how this problem can be corrected by obtaining maximum-likelihood estimates of the true allele frequency distribution. In simple cases, the ML estimate of the true allele frequency distribution can be obtained analytically, but in other cases computational methods based on numerical optimization or the EM algorithm must be used. We illustrate the new correction method by analyzing some previously published SNP data from the SNP Consortium. Appropriate treatment of SNP ascertainment is vital to our ability to make correct inferences from the data of the International HapMap Project.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Alleles
  • Data Interpretation, Statistical
  • Gene Frequency*
  • Genetic Variation
  • Models, Genetic
  • Polymorphism, Single Nucleotide*