Whole genome protein domain analysis using a new method for domain clustering

Comput Chem. 1999 Jun 15;23(3-4):333-40. doi: 10.1016/s0097-8485(99)00011-x.

Abstract

We present the outcome of a systematic analysis of protein domain shuffling in 17 completed microbial genomes. This analysis has been performed using MKDOM Version 2, a completely new version of the domain clustering program MKDOM based on PSI-BLAST recursive homology searches. It allows to delineate the most frequent protein domain building blocks, which domains are found specifically in Bacteria, Archaea or yeast, and which domains are shared between two or all three domains of life. The latter are good candidates as the basic protein building blocks underlying all forms of cellular life. Statistics of multi-domain proteins indicate that some organisms such as Bacillus subtilis or Mycobacterium tuberculosis contain an abnormally high number of large multi-domain proteins. We also provide examples of highly shuffled or circularly permutated domains. A WWW graphical interface has been made available to interactively browse domain arrangements of proteins in all 17 genomes, at http:@www.toulouse.inra.fr/prodomCG.html.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Archaea / genetics
  • Cluster Analysis*
  • Database Management Systems
  • Genome, Bacterial
  • Genome, Fungal
  • Proteins / genetics*

Substances

  • Proteins