Probabilistic Substructure Mining From Small-Molecule Screens

Mol Inform. 2011 Sep;30(9):809-15. doi: 10.1002/minf.201100058. Epub 2011 Aug 4.

Abstract

Identifying the overrepresented substructures from a set of molecules with similar activity is a common task in chemical informatics. Existing substructure miners are deterministic, requiring the activity of all mined molecules to be known with high confidence. In contrast, we introduce pGraphSig, a probabilistic structure miner, which effectively mines structures from noisy data, where many molecules are labeled with their probability of being active. We benchmark pGraphSig on data from several small-molecule high throughput screens, finding that it can more effectively identify overrepresented structures than a deterministic structure miner.

Keywords: Chemoinformatics; Drug discovery; High throughput screening; Virtual screening.