Proteome-wide analysis of protein function composition reveals the clustering and phylogenetic properties of organisms

Mol Phylogenet Evol. 2002 Oct;25(1):101-11. doi: 10.1016/s1055-7903(02)00354-8.

Abstract

A 17-dimensional vector named the proteome vector is defined to represent an organism. The components of the vector reflect the relative contents of protein-encoding genes of the 17 cluster of orthologous groups of proteins (COGs) classes in the whole genome of the relevant organism. Based on the definition of this proteome vector, the fuzzy clustering of 36 completely sequenced organisms (8 archaea, 24 bacteria, and 4 eukarya) was performed and a proteome tree was constructed. Our results show that (1) the 36 organisms can be 100% correctly classified into three clusters corresponding to the three primary kingdoms, (2) our proteome tree is remarkably similar to that derived from 16S rRNA, and (3) the chromosomes and/or plasmids belonging to the same organism have very similar gene composition. Based on these results, we argue that the 17-dimensional proteome vector could be a good criterion for clustering approaches and to a large extent reveals the phylogenetic properties of organisms; the Three Primary Kingdoms Hypothesis is trustworthy although the existence of lateral gene transfer (LGT) brings controversy to the construction of the "universal tree of life."

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Archaea / genetics
  • Bacteria / genetics
  • Caenorhabditis elegans / genetics
  • Drosophila melanogaster / genetics
  • Models, Genetic
  • Phylogeny*
  • Proteins / genetics*
  • Proteome / analysis
  • Saccharomyces cerevisiae / genetics

Substances

  • Proteins
  • Proteome