Validation of the reliability of computational O-GlcNAc prediction

Biochim Biophys Acta. 2014 Feb;1844(2):416-21. doi: 10.1016/j.bbapap.2013.12.002. Epub 2013 Dec 9.

Abstract

O-GlcNAcylation is an inducible, highly dynamic and reversible posttranslational modification, which regulates numerous cellular processes such as gene expression, translation, immune reactions, protein degradation, protein-protein interaction, apoptosis, and signal transduction. In contrast to N-linked glycosylation, O-GlcNAcylation does not display a strict amino acid consensus sequence, although serine or threonine residues flanked by proline and valine are preferred sites of O-GlcNAcylation. Based on this information, computational prediction tools of O-GlcNAc sites have been developed. Here, we retrospectively assessed the performance of two available O-GlcNAc prediction programs YinOYang 1.2 server and OGlcNAcScan by comparing their predictions for recently discovered experimentally validated O-GlcNAc sites. Both prediction programs efficiently identified O-GlcNAc sites situated in an environment resembling the consensus sequence P-P-V-[ST]-T-A. However, both prediction programs revealed numerous false negative O-GlcNAc predictions when the site of modification was located in an amino acid sequence differing from the known consensus sequence. By searching for a common sequence motif, we found that O-GlcNAcylation of nucleocytoplasmic proteins preferably occurs at serine and threonine residues flanked downstream by proline and valine and upstream by one to two alanines followed by a stretch of serine and threonine residues. However, O-GlcNAcylation of proteins located in the mitochondria or in the secretory lumen occurs at different sites and does not follow a distinct consensus sequence. Thus, our study indicates the limitations of the presently available computational prediction methods for O-GlcNAc sites and suggests that experimental validation is mandatory. Continuously update and further development of available databases will be the key to improve the performance of O-GlcNAc site prediction.

Keywords: O-GlcNAc; OGlcNAcScan; Prediction program; YinOYang server; dbOGAP database.

Publication types

  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Acetylglucosamine / metabolism*
  • Algorithms*
  • Animals
  • Binding Sites
  • Computational Biology / methods*
  • Consensus Sequence
  • Forecasting
  • Humans
  • Protein Processing, Post-Translational
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Retrospective Studies
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins
  • Acetylglucosamine