TransPTM: a transformer-based model for non-histone acetylation site prediction

Brief Bioinform. 2024 Mar 27;25(3):bbae219. doi: 10.1093/bib/bbae219.

Abstract

Protein acetylation is one of the extensively studied post-translational modifications (PTMs) due to its significant roles across a myriad of biological processes. Although many computational tools for acetylation site identification have been developed, there is a lack of benchmark dataset and bespoke predictors for non-histone acetylation site prediction. To address these problems, we have contributed to both dataset creation and predictor benchmark in this study. First, we construct a non-histone acetylation site benchmark dataset, namely NHAC, which includes 11 subsets according to the sequence length ranging from 11 to 61 amino acids. There are totally 886 positive samples and 4707 negative samples for each sequence length. Secondly, we propose TransPTM, a transformer-based neural network model for non-histone acetylation site predication. During the data representation phase, per-residue contextualized embeddings are extracted using ProtT5 (an existing pre-trained protein language model). This is followed by the implementation of a graph neural network framework, which consists of three TransformerConv layers for feature extraction and a multilayer perceptron module for classification. The benchmark results reflect that TransPTM has the competitive performance for non-histone acetylation site prediction over three state-of-the-art tools. It improves our comprehension on the PTM mechanism and provides a theoretical basis for developing drug targets for diseases. Moreover, the created PTM datasets fills the gap in non-histone acetylation site datasets and is beneficial to the related communities. The related source code and data utilized by TransPTM are accessible at https://www.github.com/TransPTM/TransPTM.

Keywords: Non-histone acetylation; deep learning; protein language model; transformer.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Acetylation
  • Algorithms
  • Computational Biology / methods
  • Databases, Protein
  • Humans
  • Neural Networks, Computer*
  • Protein Processing, Post-Translational*
  • Proteins / chemistry
  • Proteins / metabolism
  • Software

Substances

  • Proteins