Managing missing measurements in small-molecule screens

J Comput Aided Mol Des. 2013 May;27(5):469-78. doi: 10.1007/s10822-013-9642-x. Epub 2013 Apr 13.

Abstract

In a typical high-throughput screening (HTS) campaign, less than 1 % of the small-molecule library is characterized by confirmatory experiments. As much as 99 % of the library's molecules are set aside--and not included in downstream analysis--although some of these molecules would prove active were they sent for confirmatory testing. These missing experimental measurements prevent active molecules from being identified by screeners. In this study, we propose managing missing measurements using imputation--a powerful technique from the machine learning community--to fill in accurate guesses where measurements are missing. We then use these imputed measurements to construct an imputed visualization of HTS results, based on the scaffold tree visualization from the literature. This imputed visualization identifies almost all groups of active molecules from a HTS, even those that would otherwise be missed. We validate our methodology by simulating HTS experiments using the data from eight quantitative HTS campaigns, and the implications for drug discovery are discussed. In particular, this method can rapidly and economically identify novel active molecules, each of which could have novel function in either binding or selectivity in addition to representing new intellectual property.

MeSH terms

  • Artificial Intelligence
  • Drug Discovery
  • High-Throughput Screening Assays*
  • Humans
  • Small Molecule Libraries*
  • Software

Substances

  • Small Molecule Libraries