Measuring pathway database coverage of the phosphoproteome

PeerJ. 2021 May 25:9:e11298. doi: 10.7717/peerj.11298. eCollection 2021.

Abstract

Protein phosphorylation is one of the best known post-translational mechanisms playing a key role in the regulation of cellular processes. Over 100,000 distinct phosphorylation sites have been discovered through constant improvement of mass spectrometry based phosphoproteomics in the last decade. However, data saturation is occurring and the bottleneck of assigning biologically relevant functionality to phosphosites needs to be addressed. There has been finite success in using data-driven approaches to reveal phosphosite functionality due to a range of limitations. The alternate, more suitable approach is making use of prior knowledge from literature-derived databases. Here, we analysed seven widely used databases to shed light on their suitability to provide functional insights into phosphoproteomics data. We first determined the global coverage of each database at both the protein and phosphosite level. We also determined how consistent each database was in its phosphorylation annotations compared to a global standard. Finally, we looked in detail at the coverage of each database over six experimental datasets. Our analysis highlights the relative strengths and weaknesses of each database, providing a guide in how each can be best used to identify biological mechanisms in phosphoproteomic data.

Keywords: Bioinformatics; Databases; Phosphoproteomics; Proteomics.

Grants and funding

This work was supported by the National Health and Medical Research Council funding (Project Grant 1128609 to Melissa J. Davis), Cancer Council Victoria funding (Project grant 1187825 to MJD), and the National Breast Cancer Foundation and Cure Brain Cancer Foundation funding (Project Grant CBCNBCF-19-009 to MJD). Melissa J. Davis is the recipient of the Betty Smyth Centenary Fellowship in Bioinformatics. HAH was supported by the Peter Hall Scholarship. This study was made possible through Victorian State Government Operational Infrastructure Support and Australian Government NHMRC Independent Research Institute Infrastructure Support scheme. Liam G. Fearnley was supported by the DHB Foundation Centenary Postdoctoral Fellowship in Neurogenetic Systems Biology and philanthropic funding provided through the Walter and Eliza Hall Institute. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.