In silico prediction of buffer solubility based on quantum-mechanical and HQSAR- and topology-based descriptors

J Chem Inf Model. 2006 Mar-Apr;46(2):648-58. doi: 10.1021/ci0503210.

Abstract

We present an artificial neural network (ANN) model for the prediction of solubility of organic compounds in buffer at pH 6.5, thus mimicking the medium in the human gastrointestinal tract. The model was derived from consistently performed solubility measurements of about 5000 compounds. Semiempirical VAMP/AM1 quantum-chemical wave function derived, HQSAR-derived logP, and topology-based descriptors were employed after preselection of significant contributors by statistical and data mining approaches. Ten ANNs were trained each with 90% as a training set and 10% as a test set, and deterministic analysis of prediction quality was used in an iterative manner to optimize ANN architecture and descriptor space, based on Corina 3D molecular structure and AM1/COSMO single point wave function. In production mode, a mean prediction value of the 10 ANNs is created, as is a standard deviation based quality parameter. The productive ANN based on Corina geometries and AM1/COSMO wave function gives an r2cv of 0.50 and a root-mean-square error of 0.71 log units, with 87 and 96% of the compounds having an error of less than 1 and 1.5 log units, respectively. The model is able to predict permanently charged species, e.g. zwitterions or quaternary amines, and problematic structures such as tautomers and unresolved diastereomers almost as well as neutral compounds.

MeSH terms

  • Buffers
  • Computer Simulation*
  • Drug Design*
  • Hydrogen-Ion Concentration
  • Neural Networks, Computer*
  • Predictive Value of Tests
  • Quantitative Structure-Activity Relationship*
  • Quantum Theory*
  • Solubility
  • Solvents / chemistry

Substances

  • Buffers
  • Solvents