Origins and structural properties of novel and de novo protein domains during insect evolution

FEBS J. 2018 Jul;285(14):2605-2625. doi: 10.1111/febs.14504. Epub 2018 Jun 29.

Abstract

Over long time scales, protein evolution is characterized by modular rearrangements of protein domains. Such rearrangements are mainly caused by gene duplication, fusion and terminal losses. To better understand domain emergence mechanisms we investigated 32 insect genomes covering a speciation gradient ranging from ~ 2 to ~ 390 mya. We use established domain models and foldable domains delineated by hydrophobic cluster analysis (HCA), which does not require homologous sequences, to also identify domains which have likely arisen de novo, that is, from previously noncoding DNA. Our results indicate that most novel domains emerge terminally as they originate from ORF extensions while fewer arise in middle arrangements, resulting from exonization of intronic or intergenic regions. Many novel domains rapidly migrate between terminal or middle positions and single- and multidomain arrangements. Young domains, such as most HCA-defined domains, are under strong selection pressure as they show signals of purifying selection. De novo domains, linked to ancient domains or defined by HCA, have higher degrees of intrinsic disorder and disorder-to-order transition upon binding than ancient domains. However, the corresponding DNA sequences of the novel domains of de novo origins could only rarely be found in sister genomes. We conclude that novel domains are often recruited by other proteins and undergo important structural modifications shortly after their emergence, but evolve too fast to be characterized by cross-species comparisons alone.

Keywords: de novo domains; domain evolution; novel domains; protein disorder.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Base Sequence*
  • Cluster Analysis
  • Computational Biology / methods
  • Evolution, Molecular*
  • Exons
  • Gene Duplication
  • Gene Expression
  • Gene Fusion
  • Genome, Insect*
  • Hydrophobic and Hydrophilic Interactions
  • Insect Proteins / chemistry*
  • Insect Proteins / genetics
  • Insect Proteins / metabolism
  • Insecta / classification
  • Insecta / genetics*
  • Introns
  • Phylogeny
  • Protein Domains
  • Selection, Genetic
  • Sequence Deletion*

Substances

  • Insect Proteins