Deep exploration of random forest model boosts the interpretability of machine learning studies of complicated immune responses and lung burden of nanoparticles

Sci Adv. 2021 May 26;7(22):eabf4130. doi: 10.1126/sciadv.abf4130. Print 2021 May.

Abstract

The development of machine learning provides solutions for predicting the complicated immune responses and pharmacokinetics of nanoparticles (NPs) in vivo. However, highly heterogeneous data in NP studies remain challenging because of the low interpretability of machine learning. Here, we propose a tree-based random forest feature importance and feature interaction network analysis framework (TBRFA) and accurately predict the pulmonary immune responses and lung burden of NPs, with the correlation coefficient of all training sets >0.9 and half of the test sets >0.75. This framework overcomes the feature importance bias brought by small datasets through a multiway importance analysis. TBRFA also builds feature interaction networks, boosts model interpretability, and reveals hidden interactional factors (e.g., various NP properties and exposure conditions). TBRFA provides guidance for the design and application of ideal NPs and discovers the feature interaction networks that contribute to complex systems with small-size data in various fields.

Publication types

  • Research Support, Non-U.S. Gov't