AlphaCutter: Efficient removal of non-globular regions from predicted protein structures

Proteomics. 2023 Aug;23(16):e2300176. doi: 10.1002/pmic.202300176. Epub 2023 Jun 13.

Abstract

A huge number of high-quality predicted protein structures are now publicly available. However, many of these structures contain non-globular regions, which diminish the performance of downstream structural bioinformatic applications. In this study, we develop AlphaCutter for the removal of non-globular regions from predicted protein structures. A large-scale cleaning of 542,380 predicted SwissProt structures highlights that AlphaCutter is able to (1) remove non-globular regions that are undetectable using pLDDT scores and (2) preserve high integrity of the cleaned domain regions. As useful applications, AlphaCutter improved the folding energy scores and sequence recovery rates in the re-design of domain regions. On average, AlphaCutter takes less than 3 s to clean a protein structure, enabling efficient cleaning of the exploding number of predicted protein structures. AlphaCutter is available at https://github.com/johnnytam100/AlphaCutter. AlphaCutter-cleaned SwissProt structures are available for download at https://doi.org/10.5281/zenodo.7944483.

Keywords: AlphaFold; non-globular regions; protein structure validation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Databases, Protein
  • Proteins* / metabolism

Substances

  • Proteins