Measuring marker information content by the ambiguity of block boundaries observed in dense SNP data

Ann Hum Genet. 2007 Jan;71(Pt 1):127-40. doi: 10.1111/j.1469-1809.2006.00315.x. Epub 2006 Sep 8.

Abstract

Recent studies have noted that the boundary of common haplotype blocks in hapmap constructions involve a certain degree of ambiguity, and so do the resulting "tagSNPs". Here, we report how to address this issue at the level of individual SNP markers. We introduce a measure called the marker ambiguity score (MAS), and evaluate its utility by simulation studies based on a real dataset of 2949 SNPs spanning a region of 56.1M bp. We show that the MAS method can be used to assess the level of boundary ambiguity caused by varying ethnic background, sample sizes for hapmap construction, and disease aggregation. We find a striking difference in overall patterns of block boundary distributions in two ethnic groups (blacks and whites), and subtle changes in block structures that agree with the evolutionary history of the two populations. Our analyses suggest that a sample size of 200 or more subjects is probably needed for "stable" hapmap constructions. In addition, we demonstrate that there are subtle changes in block boundaries in hapmaps constructed in disease populations versus normal controls. This approach can quantify the information content of individual markers in the context of highly dense SNP data, which may have important implications in designing efficient genome-wide association mapping projects.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Computer Simulation
  • Ethnicity
  • Gene Frequency
  • Genetic Markers*
  • Genome, Human*
  • Haplotypes
  • Humans
  • Linkage Disequilibrium
  • Monte Carlo Method
  • Polymorphism, Single Nucleotide*
  • Sample Size

Substances

  • Genetic Markers