A simple algorithm for detecting circular permutations in proteins

Bioinformatics. 1999 Nov;15(11):930-6. doi: 10.1093/bioinformatics/15.11.930.

Abstract

Motivation: Circular permutation of a protein is a genetic operation in which part of the C-terminal of the protein is moved to its N-terminal. Recently, it has been shown that proteins that undergo engineered circular permutations generally maintain their three dimensional structure and biological function. This observation raises the possibility that circular permutation has occurred in Nature during evolution. In this scenario a protein underwent circular permutation into another protein, thereafter both proteins further diverged by standard genetic operations. To study this possibility one needs an efficient algorithm that for a given pair of proteins can detect the underlying event of circular permutations. A possible formal description of the question is: given two sequences, find a circular permutation of one of them under which the edit distance between the proteins is minimal. A naive algorithm might take time proportional to N3 or even N4, which is prohibitively slow for a large-scale survey. A sophisticated algorithm that runs in asymptotic time of N2 was recently suggested, but it is not practical for a large-scale survey.

Results: A simple and efficient algorithm that runs in time N2 is presented. The algorithm is based on duplicating one of the two sequences, and then performing a modified version of the standard dynamic programming algorithm. While the algorithm is not guaranteed to find the optimal results, we present data that indicate that in practice the algorithm performs very well.

Availability: A Fortran program that calculates the optimal edit distance under circular permutation is available upon request from the authors.

Contact: ron@biocom1.ls.biu.ac.il.

MeSH terms

  • Algorithms*
  • Computational Biology / methods
  • Computer Simulation
  • Data Display
  • Databases, Factual
  • Evaluation Studies as Topic
  • Evolution, Molecular*
  • Lectins / genetics
  • Models, Genetic*
  • Proteins / genetics*
  • Sequence Alignment / methods*
  • Sequence Analysis, Protein / methods*
  • Time Factors

Substances

  • Lectins
  • Proteins