Massive-scale genomic analysis reveals SARS-CoV-2 mutation characteristics and evolutionary trends

mLife. 2022 Sep;1(3):311-322. doi: 10.1002/mlf2.12040. Epub 2022 Sep 26.

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic resulted in significant societal costs. Hence, an in-depth understanding of SARS-CoV-2 virus mutation and its evolution will help determine the direction of the COVID-19 pandemic. In this study, we identified 296,728 de novo mutations in more than 2,800,000 high-quality SARS-CoV-2 genomes. All possible factors affecting the mutation frequency of SARS-CoV-2 in human hosts were analyzed, including zinc finger antiviral proteins, sequence context, amino acid change, and translation efficiency. As a result, we proposed that when adenine (A) and tyrosine (T) bases are in the context of AM (M stands for adenine or cytosine) or TA motif, A or T base has lower mutation frequency. Furthermore, we hypothesized that translation efficiency can affect the mutation frequency of the third position of the codon by the selection, which explains why SARS-CoV-2 prefers AT3 codons usage. In addition, we found a host-specific asymmetric dinucleotide mutation frequency in the SARS-CoV-2 genome, which provides a new basis for determining the origin of the SARS-CoV-2. Finally, we summarize all possible factors affecting mutation frequency and provide insights into the mutation characteristics and evolutionary trends of SARS-CoV-2.

Keywords: SARS‐CoV‐2; de novo mutation; evolutionary trends; mutation characteristics; mutation frequency.

Associated data

  • figshare/10.6084/m9.figshare.19471571.v4