Power-based, phase-informed selection of single nucleotide polymorphisms for disease association screens

Genet Epidemiol. 2006 Sep;30(6):459-70. doi: 10.1002/gepi.20159.

Abstract

Single nucleotide polymorphisms (SNPs) are becoming widely used as genotypic markers in genetic association studies of common, complex human diseases. For such association screens, a crucial part of study design is determining what SNPs to prioritize for genotyping. We present a novel power-based algorithm to select a subset of tag SNPs for genotyping from a map of available SNPs. Blocks of markers in strong linkage disequilibrium (LD) are identified, and SNPs are selected to represent each block such that power to detect disease association with an underlying disease allele in LD with block members is preserved; all markers outside of blocks are also included in the tagging subset. A key, novel element of this method is that it incorporates information about the phase of LD observed among marker pairs to retain markers likely to be in coupling phase with an underlying disease locus, thus increasing power compared to a phase-blind approach. Power calculations illustrate important issues regarding LD phase and make clear the advantages of our approach to SNP selection. We apply our algorithm to genotype data from the International HapMap Consortium and demonstrate that considerable reduction in SNP genotyping may be attained while retaining much of the available power for a disease association screen. We also demonstrate that these tag SNPs effectively represent underlying variants not included in the LD analysis and SNP selection, by using leave-one-out tests to show that most (approximately 90%) of the "untyped" variants lying in blocks are in coupling-phase LD with a tag SNP. Additional performance tests using the HapMap ENCyclopedia of DNA Elements (ENCODE) regions show that the method compares well with the popular r2 bin tagging method. This work is a concrete example of how empirical LD phase may be used to benefit study design.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Chromosome Mapping
  • Chromosomes, Human, Pair 7 / genetics
  • Computer Simulation
  • Genetic Linkage*
  • Genetic Predisposition to Disease / epidemiology*
  • Genome, Human
  • Genotype
  • Haplotypes
  • Humans
  • Linkage Disequilibrium
  • Models, Statistical
  • Polymorphism, Single Nucleotide*
  • Risk Factors
  • Sample Size
  • White People / genetics*