The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences

J Chem Inf Comput Sci. 2004 Mar-Apr;44(2):427-36. doi: 10.1021/ci0341823.

Abstract

We present a new algorithm to canonize molecular graphs using the signature molecular descriptor introduced in the previous papers of this series. While developed specifically for molecular structures, the algorithm can be used for any graph and is not limited to acyclic graphs, planar graphs, bounded valence, or bounded genus graphs, for which polynomial time algorithms exist. The algorithm is tested with benzenoid hydrocarbons and a database of 126,705 organic compounds. The algorithm's performances are compared against Brendan Mc Kay's Nauty algorithm, which is believed to be the fastest graph canonization algorithm for general graphs, with five series of graphs each comprising up to 30,000 vertices: 2D meshes (pericondensed benzenoids), 3D cages (fullerenes and nanotubes), 3D meshes (crystal lattices), 4D cages, and power law graphs (protein and gene networks). The algorithm can be downloaded as an open source code at http://www.cs.sandia.gov/ approximately jfaulon/QSAR.