Probabilistic identification of saccharide moieties in biomolecules and their protein complexes

Sci Data. 2020 Jul 3;7(1):210. doi: 10.1038/s41597-020-0547-y.

Abstract

The chemical composition of saccharide complexes underlies their biomedical activities as biomarkers for cardiometabolic disease, various types of cancer, and other conditions. However, because these molecules may undergo major structural modifications, distinguishing between compounds of saccharide and non-saccharide origin becomes a challenging computational problem that hinders the aggregation of information about their bioactive moieties. We have developed an algorithm and software package called "Cheminformatics Tool for Probabilistic Identification of Carbohydrates" (CTPIC) that analyzes the covalent structure of a compound to yield a probabilistic measure for distinguishing saccharides and saccharide-derivatives from non-saccharides. CTPIC analysis of the RCSB Ligand Expo (database of small molecules found to bind proteins in the Protein Data Bank) led to a substantial increase in the number of ligands characterized as saccharides. CTPIC analysis of Protein Data Bank identified 7.7% of the proteins as saccharide-binding. CTPIC is freely available as a webservice at (http://ctpic.nmrfam.wisc.edu).

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Carbohydrates / chemistry*
  • Databases, Protein
  • Datasets as Topic
  • Ligands
  • Proteins / chemistry*
  • Software

Substances

  • Carbohydrates
  • Ligands
  • Proteins