Probabilistic Substructure Mining From Small-Molecule Screens

Sayan Ranu; Bradley T Calhoun; Ambuj K Singh; S Joshua Swamidass

doi:10.1002/minf.201100058

Probabilistic Substructure Mining From Small-Molecule Screens

Mol Inform. 2011 Sep;30(9):809-15. doi: 10.1002/minf.201100058. Epub 2011 Aug 4.

Authors

Sayan Ranu¹, Bradley T Calhoun², Ambuj K Singh¹, S Joshua Swamidass³

Affiliations

¹ Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA, USA.
² Division of Laboratory and Genomic Medicine, Department of Pathology and Immunology, Washington University, School of Medicine, St. Louis, MO, USA.
³ Division of Laboratory and Genomic Medicine, Department of Pathology and Immunology, Washington University, School of Medicine, St. Louis, MO, USA. swamidass@gmail.com.

PMID: 27467413
DOI: 10.1002/minf.201100058

Abstract

Identifying the overrepresented substructures from a set of molecules with similar activity is a common task in chemical informatics. Existing substructure miners are deterministic, requiring the activity of all mined molecules to be known with high confidence. In contrast, we introduce pGraphSig, a probabilistic structure miner, which effectively mines structures from noisy data, where many molecules are labeled with their probability of being active. We benchmark pGraphSig on data from several small-molecule high throughput screens, finding that it can more effectively identify overrepresented structures than a deterministic structure miner.

Keywords: Chemoinformatics; Drug discovery; High throughput screening; Virtual screening.