In Silico Prediction of Skin Sensitization for Compounds via Flexible Evidence Combination Based on Machine Learning and Dempster-Shafer Theory

Chem Res Toxicol. 2024 May 16. doi: 10.1021/acs.chemrestox.3c00396. Online ahead of print.

Abstract

Skin sensitization is increasingly becoming a significant concern in the development of drugs and cosmetics due to consumer safety and occupational health problems. In silico methods have emerged as alternatives to traditional in vivo animal testing due to ethical and economic considerations. In this study, machine learning methods were used to build quantitative structure-activity relationship (QSAR) models on five skin sensitization data sets (GPMT, LLNA, DPRA, KeratinoSens, and h-CLAT), achieving effective predictive accuracies (correct classification rates of 0.688-0.764 on test sets). To address the complex mechanisms of human skin sensitization, the Dempster-Shafer theory was applied to merge multiple QSAR models, resulting in an evidence-based integrated decision model. Various evidence combinations and combination rules were explored, with the self-defined Q3 rule showing superior balance. The combination of evidence such as GPMT and KeratinoSens and h-CLAT achieved a correct classification rate (CCR) of 0.880 and coverage of 0.893 while maintaining the competitiveness of other combinations. Additionally, the Shapley additive explanations (SHAP) method was used to interpret important features and substructures related to skin sensitization. A comparative analysis of an external human test set demonstrated the superior performance of the proposed method. Finally, to enhance accessibility, the workflow was implemented into a user-friendly software named HSkinSensDS.