Exploring the dark foldable proteome by considering hydrophobic amino acids topology

Tristan Bitard-Feildel; Isabelle Callebaut

doi:10.1038/srep41425

Exploring the dark foldable proteome by considering hydrophobic amino acids topology

Sci Rep. 2017 Jan 30:7:41425. doi: 10.1038/srep41425.

Authors

Tristan Bitard-Feildel¹, Isabelle Callebaut¹

Affiliation

¹ CNRS UMR7590, Sorbonne Universités, Université Pierre et Marie Curie - Paris 6 - MNHN - IRD - IUC, Paris, France.

Abstract

The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a converging number of domain families. The un-annotated part of the protein universe is referred to as the dark proteome and remains poorly characterized. In this study, we quantify the amount of foldable domains within the dark proteome by using the hydrophobic cluster analysis methodology. These un-annotated foldable domains were grouped using a combination of remote homology searches and domain annotations, leading to define different levels of darkness. The dark foldable domains were analyzed to understand what make them different from domains stored in databases and thus difficult to annotate. The un-annotated domains of the dark proteome universe display specific features relative to database domains: shorter length, non-canonical content and particular topology in hydrophobic residues, higher propensity for disorder, and a higher energy. These features make them hard to relate to known families. Based on these observations, we emphasize that domain annotation methodologies can still be improved to fully apprehend and decipher the molecular evolution of the protein universe.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Amino Acids / chemistry*
Cluster Analysis
Hydrophobic and Hydrophilic Interactions*
Molecular Sequence Annotation
Proteome / metabolism*
Thermodynamics

Substances

Amino Acids
Proteome