Frequent appearance of novel protein-coding sequences by frameshift translation

Genomics. 2006 Dec;88(6):690-697. doi: 10.1016/j.ygeno.2006.06.009. Epub 2006 Aug 4.

Abstract

Genomic duplication, followed by divergence, contributes to organismal evolution. Several mechanisms, such as exon shuffling and alternative splicing, are responsible for novel gene functions, but they generate homologous domains and do not usually lead to drastic innovation. Major novelties can potentially be introduced by frameshift mutations and this idea can explain the creation of novel proteins. Here, we employ a strategy using simulated protein sequences and identify 470 human and 108 mouse frameshift events that originate new gene segments. No obvious interspecies overlap was observed, suggesting high rates of acquisition of evolutionary events. This inference is supported by a deficiency of TpA dinucleotides in the protein-coding sequences, which decreases the occurrence of translational termination, even on the complementary strand. Increased usage of the TGA codon as the termination signal in newer genes also supports our inference. This suggests that tolerated frameshift changes are a prevalent mechanism for the rapid emergence of new genes and that protein-coding sequences can be derived from existing or ancestral exons rather than from events that result in noncoding sequences becoming exons.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence / genetics*
  • Animals
  • Codon
  • Codon, Terminator
  • CpG Islands / genetics
  • Frameshift Mutation*
  • Humans
  • Mice
  • Molecular Sequence Data
  • Open Reading Frames
  • Protein Biosynthesis*
  • Proteins / genetics*
  • Sequence Alignment
  • Species Specificity

Substances

  • Codon
  • Codon, Terminator
  • Proteins