Active machine learning-driven experimentation to determine compound effects on protein patterns

Armaghan W Naik; Joshua D Kangas; Devin P Sullivan; Robert F Murphy

doi:10.7554/eLife.10047

Active machine learning-driven experimentation to determine compound effects on protein patterns

Elife. 2016 Feb 3:5:e10047. doi: 10.7554/eLife.10047.

Authors

Armaghan W Naik^{1

2}, Joshua D Kangas^{1

2}, Devin P Sullivan^{1

2}, Robert F Murphy^{1

2

3

4

5

6

7}

Affiliations

¹ Computational Biology Department, Carnegie Mellon University, Pittsburgh, United States.
² Center for Bioimage Informatics, Carnegie Mellon University, Pittsburgh, United States.
³ Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, United States.
⁴ Department of Biomedical Engineering, Carnegie Mellon University, Pittsburgh, United States.
⁵ Machine Learning Department, Carnegie Mellon University, Pittsburgh, United States.
⁶ Freiburg Institute for Advanced Studies, Albert Ludwig University of Freiburg, Freiburg, Germany.
⁷ Faculty of Biology, Albert Ludwig University of Freiburg, Freiburg, Germany.

Abstract

High throughput screening determines the effects of many conditions on a given biological target. Currently, to estimate the effects of those conditions on other targets requires either strong modeling assumptions (e.g. similarities among targets) or separate screens. Ideally, data-driven experimentation could be used to learn accurate models for many conditions and targets without doing all possible experiments. We have previously described an active machine learning algorithm that can iteratively choose small sets of experiments to learn models of multiple effects. We now show that, with no prior knowledge and with liquid handling robotics and automated microscopy under its control, this learner accurately learned the effects of 48 chemical compounds on the subcellular localization of 48 proteins while performing only 29% of all possible experiments. The results represent the first practical demonstration of the utility of active learning-driven biological experimentation in which the set of possible phenotypes is unknown in advance.

Keywords: active learning; automation of research; cell biology; computational biology; high content screening; laboratory automation; machine learning; mouse; protein subcellular location; systems biology.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Automation, Laboratory
Cell Physiological Phenomena / drug effects*
Cytosol / chemistry*
Drug Evaluation, Preclinical / methods*
High-Throughput Screening Assays
Microscopy
Optical Imaging
Proteins / analysis*
Supervised Machine Learning*

Substances

Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding