An exhaustive DNA micro-satellite map of the human genome using high performance computing

Genomics. 2003 Jul;82(1):10-9. doi: 10.1016/s0888-7543(03)00076-4.

Abstract

The current pace of the generation of sequence data requires the development of software tools that can rapidly provide full annotation of the data. We have developed a new method for rapid sequence comparison using the exact match algorithm without repeat masking. As a demonstration, we have identified all perfect simple tandem repeats (STR) within the draft sequence of the human genome. The STR elements (chromosome, position, length and repeat subunit) have been placed into a relational database. Repeat flanking sequence is also publicly accessible at http://grid.abcc.ncifcrf.gov. To illustrate the utility of this complete set of STR elements, we documented the increased density of potentially polymorphic markers throughout the genome. The new STR markers may be useful in disease association studies because so many STR elements manifest multiallelic polymorphism. Also, because triplet repeat expansions are important for human disease etiology, we identified trinucleotide repeats that exist within exons of known genes. This resulted in a list that includes all 14 genes known to undergo polynucleotide expansion, and 48 additional candidates. Several of these are non-polyglutamine triplet repeats. Other examinations of the STR database demonstrated repeats spanning splice junctions and identified SNPs within repeat elements.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Alleles
  • Chromosome Mapping
  • Computing Methodologies
  • DNA / genetics*
  • Databases, Genetic
  • Genetic Markers
  • Genome, Human*
  • Humans
  • Microsatellite Repeats*
  • Polymorphism, Single Nucleotide
  • Sequence Analysis, DNA / methods

Substances

  • Genetic Markers
  • DNA