SNP500Cancer: a public resource for sequence validation and assay development for genetic variation in candidate genes

Nucleic Acids Res. 2004 Jan 1;32(Database issue):D528-32. doi: 10.1093/nar/gkh005.

Abstract

The SNP500Cancer Database provides sequence and genotype assay information for candidate single nucleotide polymorphisms (SNPs) useful in mapping complex diseases, such as cancer. The database is an integral component of the NCI's Cancer Genome Anatomy Project. SNP500Cancer provides bi-directional sequencing information on a set of control DNA samples derived from anonymized subjects (102 Coriell samples representing four self-described ethnic groups: African/African-American, Caucasian, Hispanic and Pacific Rim). All SNPs are chosen from public databases and reports, and the choice of genes includes a bias towards non-synonymous and promoter SNPs in genes that have been implicated in one or more cancers. The web site is searchable by gene, chromosome, gene ontology pathway and by known dbSNP ID. As of July 2003, the database contains over 3400 SNPs, 2490 of which have been sequenced in the SNP500Cancer population. For each analyzed SNP, gene location and over 200 bp of surrounding annotated sequence (including nearby SNPs) are provided, with frequency information in total and per subpopulation, and calculation of Hardy-Weinberg Equilibrium (HWE) for each subpopulation. Sequence validated SNPs with minor allele frequency > 5% are entered into a high-throughput pipeline for genotyping analysis to determine concordance for the same 102 samples. The website provides the conditions for validated genotyping assays. SNP500Cancer provides an invaluable resource for investigators to select SNPs for analysis, design genotyping assays using validated sequence data, choose selected assays already validated on one or more genotyping platforms, and select reference standards for genotyping assays. The SNP500Cancer Database is freely accessible via the web page at http://snp500cancer.nci.nih.gov/.

MeSH terms

  • Computational Biology
  • Databases, Genetic*
  • Gene Frequency
  • Genome, Human
  • Genomics
  • Genotype
  • Humans
  • Information Storage and Retrieval
  • Internet
  • National Institutes of Health (U.S.)
  • Neoplasms / genetics*
  • Polymorphism, Single Nucleotide / genetics*
  • Racial Groups / genetics
  • Reproducibility of Results
  • Sequence Analysis, DNA
  • United States