Strategies for selecting subsets of single-nucleotide polymorphisms to genotype in association studies

BMC Genet. 2005 Dec 30;6 Suppl 1(Suppl 1):S72. doi: 10.1186/1471-2156-6-S1-S72.

Abstract

In genetic association studies, linkage disequilibrium (LD) within a region can be exploited to select a subset of single-nucleotide polymorphisms (SNPs) to genotype with minimal loss of information. A novel entropy-based method for selecting SNPs is proposed and compared to an existing method based on the coefficient of determination (R2) using simulated data from Genetic Analysis Workshop 14. The effect of the size of the sample used to investigate LD (by estimating haplotype frequencies) and hence select the SNPs is also investigated for both measures. It is found that the novel method and the established method select SNP subsets that do not differ greatly. The entropy-based measure may thus have value because it is easier to compute than R2. Increasing the sample size used to estimate haplotype frequencies improves the predictive power of the subset of SNPs selected. A smaller subset of SNPs chosen using a large initial sample to estimate LD can in some instances be more informative than a larger subset chosen based on poor estimates of LD (using a small initial sample). An initial sample size of 50 individuals is sufficient in most situations investigated, which involved selection from a set of 7 SNPs, although to select a larger number of SNPs, a larger initial sample size may be required.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genetic Loci / genetics
  • Genome-Wide Association Study*
  • Genotype
  • Humans
  • Linkage Disequilibrium / genetics
  • Polymorphism, Single Nucleotide / genetics*
  • Sample Size