Low complexity regions (LCRs) contribute to the hypervariability of the HIV-1 gp120 protein

J Theor Biol. 2013 Dec 7:338:80-6. doi: 10.1016/j.jtbi.2013.08.039. Epub 2013 Sep 8.

Abstract

Low complexity regions (LCRs) are sequences of nucleic acids or proteins defined by a compositional bias. Their occurrence has been confirmed in sequences of the three cellular lineages (Bacteria, Archaea and Eucarya), and has also been reported in viral genomes. We present here the results of a detailed computer analysis of the LCRs present in the HIV-1 glycoprotein 120 (gp120) encoded by the viral gene env. The analysis was performed using a sample of 3637 Env polyprotein sequences derived from 4117 completely sequenced and translated HIV-1 genomes available in public databases as of December 2012. We have identified 1229 LCRs located in four different regions of the gp120 protein that correspond to four of the five regions that have been identified as hypervariable (V1, V2, V4 and V5). The remaining 29 LCRs are found in the signal peptide and in the conserved regions C2, C3, C4 and C5. No LCR has been identified in the hypervariable region V3. The LCRs detected in the V1, V2, V4, and V5 hypervariable regions exhibit a high Asn content in their amino acid composition, which very likely correspond to glycosylation sites, which may contribute to the retroviral ability to avoid the immune system. In sharp contrast with what is observed in gp120 proteins lacking LCRs, the glycosylation sites present in LCRs tend to be clustered towards the center of the region forming well-defined islands. The results presented here suggest that LCRs represent a hitherto undescribed source of genomic variability in lentivirus, and that these repeats may represent an important source of antigenic variation in HIV-1 populations. The results reported here may exemplify the evolutionary processes that may have increased the size of primitive cellular RNA genomes and the role of LCRs as a source of raw material during the processes of evolutionary acquisition of new functions.

Keywords: Glycosylation sites; Human immunodeficiency virus; Hypervariable regions; LCRs; Low complexity regions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Amino Acids / analysis
  • Databases, Protein
  • Evolution, Molecular
  • Genetic Variation / genetics
  • Genome, Viral
  • Glycosylation
  • HIV Envelope Protein gp120 / genetics*
  • HIV Envelope Protein gp120 / immunology
  • HIV-1 / genetics*
  • HIV-1 / immunology
  • Humans
  • Molecular Sequence Data
  • Protein Structure, Tertiary
  • Sequence Alignment

Substances

  • Amino Acids
  • HIV Envelope Protein gp120
  • gp120 protein, Human immunodeficiency virus 1