Assigning new GO annotations to protein data bank sequences by combining structure and sequence homology

Proteins. 2005 Mar 1;58(4):855-65. doi: 10.1002/prot.20355.

Abstract

Accompanying the discovery of an increasing number of proteins, there is the need to provide functional annotation that is both highly accurate and consistent. The Gene Ontology (GO) provides consistent annotation in a computer readable and usable form; hence, GO annotation (GOA) has been assigned to a large number of protein sequences based on direct experimental evidence and through inference determined by sequence homology. Here we show that this annotation can be extended and corrected for cases where protein structures are available. Specifically, using the Combinatorial Extension (CE) algorithm for structure comparison, we extend the protein annotation currently provided by GOA at the European Bioinformatics Institute (EBI) to further describe the contents of the Protein Data Bank (PDB). Specific cases of biologically interesting annotations derived by this method are given. Given that the relationship between sequence, structure, and function is complicated, we explore the impact of this relationship on assigning GOA. The effect of superfolds (folds with many functions) is considered and, by comparison to the Structural Classification of Proteins (SCOP), the individual effects of family, superfamily, and fold.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Antigens / chemistry
  • Cluster Analysis
  • Computational Biology / methods*
  • Databases as Topic
  • Databases, Factual
  • Databases, Protein
  • Imaging, Three-Dimensional
  • Information Storage and Retrieval
  • Models, Biological
  • Models, Statistical
  • Peptides / chemistry
  • Protein Binding
  • Protein Conformation
  • Protein Folding
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteomics / methods*
  • Reproducibility of Results
  • Sequence Analysis, Protein
  • Sequence Homology
  • Software
  • Structure-Activity Relationship
  • Terminology as Topic

Substances

  • Antigens
  • Peptides
  • Proteins

Associated data

  • PDB/1CXWA