Sequence-related human proteins cluster by degree of evolutionary conservation

Phys Rev E Stat Nonlin Soft Matter Phys. 2004 Nov;70(5 Pt 1):051908. doi: 10.1103/PhysRevE.70.051908. Epub 2004 Nov 17.

Abstract

Gene duplication followed by adaptive evolution is thought to be a central mechanism for the emergence of novel genes. To illuminate the contribution of duplicated protein-coding sequences to the complexity of the human genome, we study the connectivity of pairwise sequence-related human proteins and construct a network (N) of linked protein sequences with shared similarities. We find that (i) the connectivity distribution P (k) for k sequence-related proteins decays as a power law P (k) approximately k(-gamma) with gamma approximately 1.2 , (ii) the top rank of N consists of a single large cluster of proteins ( approximately 70%) , while bottom ranks consist of multiple isolated clusters, and (iii) structural characteristics of N show both a high degree of clustering and an intermediate connectivity ("small-world" features). We gain further insight into structural properties of N by studying the relationship between the connectivity distribution and the phylogenetic conservation of proteins in bacteria, plants, invertebrates, and vertebrates. We find that (iv) the proportion of sequence-related proteins increases with increasing extent of evolutionary conservation. Our results support that small-world network properties constitute a footprint of an evolutionary mechanism and extend the traditional interpretation of protein families.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Chromosome Mapping / methods*
  • Cluster Analysis
  • Conserved Sequence
  • Evolution, Molecular*
  • Genetic Variation
  • Genome, Human
  • Humans
  • Models, Genetic*
  • Phylogeny
  • Proteins / chemistry*
  • Proteins / classification
  • Proteins / genetics*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Sequence Homology, Amino Acid

Substances

  • Proteins