Detection of orphan domains in Drosophila using "hydrophobic cluster analysis"

Biochimie. 2015 Dec:119:244-53. doi: 10.1016/j.biochi.2015.02.019. Epub 2015 Feb 28.

Abstract

Introduction: Comparative genomics has become an important strategy in life science research. While many genes, and the proteins they code for, can be well characterized by assigning orthologs, a significant amount of proteins or domains remain obscure "orphans". Some orphans are overlooked by current computational methods because they rapidly diverged, others emerged relatively recently (de novo). Recent research has demonstrated the importance of orphans, and of de novo proteins and domains for development of new phenotypic traits and adaptation. New approaches for detecting novel domains are thus of paramount importance.

Results: The hydrophobic cluster analysis (HCA) method delineates globular-like domains from the information of a protein sequence and thereby allows bypassing some of the established methods limitations based on conserved sequence similarity. In this study, HCA is tested for orphan domain detection on 12 Drosophila genomes. After their detection, the oprhan domains are classified into two categories, depending on their presence/absence in distantly related species. The two categories show significantly different physico-chemical properties when compared to previously characterized domains from the Pfam database. The newly detected domains have a higher degree of intrinsic disorder and a particular hydrophobic cluster composition. The older the domains are, the more similar their hydrophobic cluster content is to the cluster content of Pfam domains. The results suggest that, over time, newly created domains acquire a canonical set of hydrophobic clusters but conserve some features of intrinsically disordered regions.

Conclusion: Our results agree with previous findings on orphan domains and suggest that the physico-chemical properties of domains change over evolutionary long time scale. The presented HCA-based method is able to detect domains with unusual properties without relying on prior knowledge, such as the availability of homologs. Therefore, the method has large potential for complementing existing strategies to annotate genomes, and for better understanding how molecular features emerge.

Keywords: Domain detection; Domain evolution; Intrinsically disordered domain; Protein domain.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Chemical Phenomena
  • Cluster Analysis
  • Databases, Genetic
  • Databases, Protein
  • Drosophila Proteins / chemistry*
  • Drosophila Proteins / classification
  • Drosophila Proteins / genetics
  • Drosophila Proteins / metabolism
  • Evolution, Molecular
  • Genome, Insect
  • Hydrophobic and Hydrophilic Interactions
  • Intrinsically Disordered Proteins / chemistry
  • Intrinsically Disordered Proteins / classification
  • Intrinsically Disordered Proteins / genetics
  • Intrinsically Disordered Proteins / metabolism
  • Models, Molecular*
  • Molecular Sequence Annotation
  • Origin Recognition Complex / chemistry
  • Origin Recognition Complex / classification
  • Origin Recognition Complex / genetics
  • Origin Recognition Complex / metabolism
  • Phylogeny
  • Protein Structure, Tertiary
  • Proteome / chemistry*
  • Proteome / classification
  • Proteome / genetics
  • Proteome / metabolism
  • Proteomics / methods*
  • Structural Homology, Protein

Substances

  • Drosophila Proteins
  • Intrinsically Disordered Proteins
  • Orc1 protein, Drosophila
  • Origin Recognition Complex
  • Proteome