Integration of the Butina algorithm and ensemble learning strategies for the advancement of a pharmacophore ligand-based model: an in silico investigation of apelin agonists

Xuan-Truc Dinh Tran; Tieu-Long Phan; Van-Thinh To; Ngoc-Vi Nguyen Tran; Nhu-Ngoc Song Nguyen; Dong-Nghi Hoang Nguyen; Ngoc-Tam Nguyen Tran; Tuyen Ngoc Truong

doi:10.3389/fchem.2024.1382319

Integration of the Butina algorithm and ensemble learning strategies for the advancement of a pharmacophore ligand-based model: an in silico investigation of apelin agonists

Front Chem. 2024 Apr 16:12:1382319. doi: 10.3389/fchem.2024.1382319. eCollection 2024.

Authors

Xuan-Truc Dinh Tran¹, Tieu-Long Phan^{2

3}, Van-Thinh To¹, Ngoc-Vi Nguyen Tran⁴, Nhu-Ngoc Song Nguyen¹, Dong-Nghi Hoang Nguyen¹, Ngoc-Tam Nguyen Tran¹, Tuyen Ngoc Truong¹

Affiliations

¹ Faculty of Pharmacy, University of Medicine and Pharmacy at Ho Chi Minh City, Ho Chi Minh City, Vietnam.
² Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Leipzig, Germany.
³ Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark.
⁴ Faculty of Pharmacy, Uppsala University, Uppsala, Sweden.

Abstract

Introduction: 3D pharmacophore models describe the ligand's chemical interactions in their bioactive conformation. They offer a simple but sophisticated approach to decipher the chemically encoded ligand information, making them a valuable tool in drug design. Methods: Our research summarized the key studies for applying 3D pharmacophore models in virtual screening for 6,944 compounds of APJ receptor agonists. Recent advances in clustering algorithms and ensemble methods have enabled classical pharmacophore modeling to evolve into more flexible and knowledge-driven techniques. Butina clustering categorizes molecules based on their structural similarity (indicated by the Tanimoto coefficient) to create a structurally diverse training dataset. The learning method combines various individual pharmacophore models into a set of pharmacophore models for pharmacophore space optimization in virtual screening. Results: This approach was evaluated on Apelin datasets and afforded good screening performance, as proven by Receiver Operating Characteristic (AUC score of 0.994 ± 0.007), enrichment factor of (EF1% of 50.07 ± 0.211), Güner-Henry score of 0.956 ± 0.015, and F-measure of 0.911 ± 0.031. Discussion: Although one of the high-scoring models achieved statistically superior results in each dataset (AUC of 0.82; an EF1% of 19.466; GH of 0.131 and F1-score of 0.071), the ensemble learning method including voting and stacking method balanced the shortcomings of each model and passed with close performance measures.

Keywords: 3D pharmacophore model; APJ receptor agonist; butina clustering algorithm; drug discovery; ensemble learning method.

Grants and funding

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.