Molecular mechanism and history of non-sense to sense evolution of antifreeze glycoprotein gene in northern gadids

Proc Natl Acad Sci U S A. 2019 Mar 5;116(10):4400-4405. doi: 10.1073/pnas.1817138116. Epub 2019 Feb 14.

Abstract

A fundamental question in evolutionary biology is how genetic novelty arises. De novo gene birth is a recently recognized mechanism, but the evolutionary process and function of putative de novo genes remain largely obscure. With a clear life-saving function, the diverse antifreeze proteins of polar fishes are exemplary adaptive innovations and models for investigating new gene evolution. Here, we report clear evidence and a detailed molecular mechanism for the de novo formation of the northern gadid (codfish) antifreeze glycoprotein (AFGP) gene from a minimal noncoding sequence. We constructed genomic DNA libraries for AFGP-bearing and AFGP-lacking species across the gadid phylogeny and performed fine-scale comparative analyses of the AFGP genomic loci and homologs. We identified the noncoding founder region and a nine-nucleotide (9-nt) element therein that supplied the codons for one Thr-Ala-Ala unit from which the extant repetitive AFGP-coding sequence (cds) arose through tandem duplications. The latent signal peptide (SP)-coding exons were fortuitous noncoding DNA sequence immediately upstream of the 9-nt element, which, when spliced, supplied a typical secretory signal. Through a 1-nt frameshift mutation, these two parts formed a single read-through open reading frame (ORF). It became functionalized when a putative translocation event conferred the essential cis promoter for transcriptional initiation. We experimentally proved that all genic components of the extant gadid AFGP originated from entirely nongenic DNA. The gadid AFGP evolutionary process also represents a rare example of the proto-ORF model of de novo gene birth where a fully formed ORF existed before the regulatory element to activate transcription was acquired.

Keywords: adaptive evolution; codfish AFGP; de novo gene; noncoding origin; proto-ORF.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Antifreeze Proteins / genetics*
  • Base Sequence
  • DNA / genetics
  • Evolution, Molecular*
  • Fish Proteins / genetics*
  • Gadiformes / classification
  • Gadiformes / genetics*
  • Open Reading Frames
  • Phylogeny
  • Promoter Regions, Genetic
  • Selection, Genetic

Substances

  • Antifreeze Proteins
  • Fish Proteins
  • DNA

Associated data

  • GENBANK/MK011258
  • GENBANK/MK011272
  • GENBANK/MH992395
  • GENBANK/MH992397
  • GENBANK/MK011291
  • GENBANK/MK011308