Diversification of CpG-Island Promoters Revealed by Comparative Analysis Between Human and Rhesus Monkey Genomes

Mamm Genome. 2020 Aug;31(7-8):240-251. doi: 10.1007/s00335-020-09844-2. Epub 2020 Jul 9.

Abstract

While CpG dinucleotides are significantly reduced compared to other dinucleotides in mammalian genomes, they can congregate and form CpG islands, which localize around the 5' regions of genes, where they function as promoters. CpG-island promoters are generally unmethylated and are often found in housekeeping genes. However, their nucleotide sequences and existence per se are not conserved between humans and mice, which may be due to evolutionary gain and loss of the regulatory regions. In this study, human and rhesus monkey genomes, with moderately conserved sequences, were compared at base resolution. Using transcription start site data, we first validated our methods' ability to identify orthologous promoters and indicated a limitation using the 5' end of curated gene models, such as NCBI RefSeq, as their transcription start sites. We found that, in addition to deamination mutations, insertions and deletions of bases, repeats, and long fragments contributed to the mutations of CpG dinucleotides. We also observed that the G + C contents tended to change in CpG-poor environments, while CpG content was altered in G + C-rich environments. While loss of CpG islands can be caused by gradual decreases in CpG sites, gain of these islands appear to require two distinct nucleotide altering steps. Taken together, our findings provide novel insights into the process of acquisition and diversification of CpG-island promoters in vertebrates.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • CpG Islands*
  • DNA Methylation*
  • Epigenesis, Genetic*
  • Genetic Variation
  • Genome*
  • Genome, Human
  • Genomics / methods
  • Humans
  • INDEL Mutation
  • Macaca mulatta
  • Mammals / genetics
  • Mice
  • Mutation
  • Promoter Regions, Genetic*
  • Regulatory Sequences, Nucleic Acid
  • Transcription Initiation Site