Identification of hidden relationships from the coupling of hydrophobic cluster analysis and domain architecture information

Bioinformatics. 2013 Jul 15;29(14):1726-33. doi: 10.1093/bioinformatics/btt271. Epub 2013 May 14.

Abstract

Motivation: Describing domain architecture is a critical step in the functional characterization of proteins. However, some orphan domains do not match any profile stored in dedicated domain databases and are thereby difficult to analyze.

Results: We present here an original novel approach, called TREMOLO-HCA, for the analysis of orphan domain sequences and inspired from our experience in the use of Hydrophobic Cluster Analysis (HCA). Hidden relationships between protein sequences can be more easily identified from the PSI-BLAST results, using information on domain architecture, HCA plots and the conservation degree of amino acids that may participate in the protein core. This can lead to reveal remote relationships with known families of domains, as illustrated here with the identification of a hidden Tudor tandem in the human BAHCC1 protein and a hidden ET domain in the Saccharomyces cerevisiae Taf14p and human AF9 proteins. The results obtained in such a way are consistent with those provided by HHPRED, based on pairwise comparisons of HHMs. Our approach can, however, be applied even in absence of domain profiles or known 3D structures for the identification of novel families of domains. It can also be used in a reverse way for refining domain profiles, by starting from known protein domain families and identifying highly divergent members, hitherto considered as orphan.

Availability: We provide a possible integration of this approach in an open TREMOLO-HCA package, which is fully implemented in python v2.7 and is available on request. Instructions are available at http://www.impmc.upmc.fr/∼callebau/tremolohca.html.

Contact: isabelle.callebaut@impmc.upmc.fr

Supplementary information: Supplementary Data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Cluster Analysis
  • Humans
  • Hydrophobic and Hydrophilic Interactions
  • Molecular Sequence Data
  • Protein Structure, Tertiary*
  • Proteins / chemistry
  • Saccharomyces cerevisiae Proteins / chemistry
  • Sequence Alignment*
  • Sequence Analysis, Protein / methods*
  • Transcription Factor TFIID / chemistry

Substances

  • BAHCC1 protein, human
  • Proteins
  • Saccharomyces cerevisiae Proteins
  • TAF14 protein, S cerevisiae
  • Transcription Factor TFIID