ProtTrans and multi-window scanning convolutional neural networks for the prediction of protein-peptide interaction sites

J Mol Graph Model. 2024 Apr 17:130:108777. doi: 10.1016/j.jmgm.2024.108777. Online ahead of print.

Abstract

This study delves into the prediction of protein-peptide interactions using advanced machine learning techniques, comparing models such as sequence-based, standard CNNs, and traditional classifiers. Leveraging pre-trained language models and multi-view window scanning CNNs, our approach yields significant improvements, with ProtTrans standing out based on 2.1 billion protein sequences and 393 billion amino acids. The integrated model demonstrates remarkable performance, achieving an AUC of 0.856 and 0.823 on the PepBCL Set_1 and Set_2 datasets, respectively. Additionally, it attains a Precision of 0.564 in PepBCL Set 1 and 0.527 in PepBCL Set 2, surpassing the performance of previous methods. Beyond this, we explore the application of this model in cancer therapy, particularly in identifying peptide interactions for selective targeting of cancer cells, and other fields. The findings of this study contribute to bioinformatics, providing valuable insights for drug discovery and therapeutic development.

Keywords: Bioinformatics; Machine learning; Multi-view window scanning CNNs; Pre-trained language models; Protein-peptide interactions.