Consistency and variation of protein subcellular location annotations

Proteins. 2021 Feb;89(2):242-250. doi: 10.1002/prot.26010. Epub 2020 Sep 26.

Abstract

A major challenge for protein databases is reconciling information from diverse sources. This is especially difficult when some information consists of secondary, human-interpreted rather than primary data. For example, the Swiss-Prot database contains curated annotations of subcellular location that are based on predictions from protein sequence, statements in scientific articles, and published experimental evidence. The Human Protein Atlas (HPA) consists of millions of high-resolution microscopic images that show protein spatial distribution on a cellular and subcellular level. These images are manually annotated with protein subcellular locations by trained experts. The image annotations in HPA can capture the variation of subcellular location across different cell lines, tissues, or tissue states. Systematic investigation of the consistency between HPA and Swiss-Prot assignments of subcellular location, which is important for understanding and utilizing protein location data from the two databases, has not been described previously. In this paper, we quantitatively evaluate the consistency of subcellular location annotations between HPA and Swiss-Prot at multiple levels, as well as variation of protein locations across cell lines and tissues. Our results show that annotations of these two databases differ significantly in many cases, leading to proposed procedures for deriving and integrating the protein subcellular location data. We also find that proteins having highly variable locations are more likely to be biomarkers of diseases, providing support for incorporating analysis of subcellular location in protein biomarker identification and screening.

Keywords: Swiss-Prot database; annotation consistency; human protein atlas; location biomarker; protein subcellular location.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Atlases as Topic
  • Cell Compartmentation
  • Cell Line
  • Databases, Protein / standards*
  • Eukaryotic Cells / metabolism
  • Eukaryotic Cells / ultrastructure
  • Humans
  • Molecular Sequence Annotation / standards*
  • Observer Variation
  • Proteins / chemistry
  • Proteins / genetics
  • Proteins / metabolism*
  • Reproducibility of Results
  • Uncertainty

Substances

  • Proteins