Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS

Sara Domínguez-Rodríguez; Miquel Serna-Pascual; Andrea Oletto; Shaun Barnabas; Peter Zuidewind; Els Dobbels; Siva Danaviah; Osee Behuhuma; Maria Grazia Lain; Paula Vaz; Sheila Fernández-Luis; Tacilta Nhampossa; Elisa Lopez-Varela; Kennedy Otwombe; Afaaf Liberty; Avy Violari; Almoustapha Issiaka Maiga; Paolo Rossi; Carlo Giaquinto; Louise Kuhn; Pablo Rojo; Alfredo Tagarro; EPIICAL Consortium

doi:10.1371/journal.pone.0276116

Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS

PLoS One. 2022 Oct 14;17(10):e0276116. doi: 10.1371/journal.pone.0276116. eCollection 2022.

Authors

Sara Domínguez-Rodríguez¹, Miquel Serna-Pascual¹, Andrea Oletto², Shaun Barnabas³, Peter Zuidewind³, Els Dobbels³, Siva Danaviah⁴, Osee Behuhuma⁴, Maria Grazia Lain⁵, Paula Vaz⁵, Sheila Fernández-Luis^{6

7}, Tacilta Nhampossa⁷, Elisa Lopez-Varela⁷, Kennedy Otwombe⁸, Afaaf Liberty⁸, Avy Violari⁸, Almoustapha Issiaka Maiga⁹, Paolo Rossi¹⁰, Carlo Giaquinto¹¹, Louise Kuhn¹², Pablo Rojo¹, Alfredo Tagarro^{1

13

14}; EPIICAL Consortium

Affiliations

¹ Pediatric Infectious Diseases Unit, Fundación para la Investigación Biomédica del Hospital 12 de Octubre, Madrid, Spain.
² PENTA Foundation, Padova, Italy.
³ Family Centre For Research With Ubuntu (FAMCRU), Stellenbosch University, Cape Town, South Africa.
⁴ Africa Health Research Institute (AHRI), Durban, South Africa.
⁵ Fundação Ariel Glaser contra o SIDA Pediátrico, Maputo, Mozambique.
⁶ Centro de Investigação em Saúde de Manhiça (CISM), Maputo, Mozambique.
⁷ Barcelona Institute for Global Health (ISGLOBAL), Barcelona, Spain.
⁸ Perinatal HIV Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa.
⁹ Gabriel Toure University Hospital, Bamako, Mali.
¹⁰ Division of Immune and Infectious Diseases, Istituto di Ricovero e Cura a Carattere Scientifico, Ospedale Pediatrico Bambino Gesu, Rome, Italy.
¹¹ Department of Surgery, Oncology and Gastroenterology, Section of Oncology and Immunology, University of Padova, Padova, Italy.
¹² Gertrude H. Sergievsky Center, Vagelos College of Physititlcians and Surgeons, Columbia University Irving Medical Center, New York, NY, United States of America.
¹³ Universidad Europea de Madrid, Madrid, Spain.
¹⁴ Fundación para la Investigación e Innovación Biomédica del Hospital Universitario Infanta Sofía, Hospital Universitario Infanta Sofía, San Sebastián de los Reyes, Madrid, Spain.

Abstract

Logistic regression (LR) is the most common prediction model in medicine. In recent years, supervised machine learning (ML) methods have gained popularity. However, there are many concerns about ML utility for small sample sizes. In this study, we aim to compare the performance of 7 algorithms in the prediction of 1-year mortality and clinical progression to AIDS in a small cohort of infants living with HIV from South Africa and Mozambique. The data set (n = 100) was randomly split into 70% training and 30% validation set. Seven algorithms (LR, Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naïve Bayes (NB), Artificial Neural Network (ANN), and Elastic Net) were compared. The variables included as predictors were the same across the models including sociodemographic, virologic, immunologic, and maternal status features. For each of the models, a parameter tuning was performed to select the best-performing hyperparameters using 5 times repeated 10-fold cross-validation. A confusion-matrix was built to assess their accuracy, sensitivity, and specificity. RF ranked as the best algorithm in terms of accuracy (82,8%), sensitivity (78%), and AUC (0,73). Regarding specificity and sensitivity, RF showed better performance than the other algorithms in the external validation and the highest AUC. LR showed lower performance compared with RF, SVM, or KNN. The outcome of children living with perinatally acquired HIV can be predicted with considerable accuracy using ML algorithms. Better models would benefit less specialized staff in limited resources countries to improve prompt referral in case of high-risk clinical progression.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Acquired Immunodeficiency Syndrome*
Bayes Theorem
Child
Humans
Logistic Models
Machine Learning
Neural Networks, Computer

Grants and funding

This work has been supported within EPIICAL project through an independent ViiV grant to the PENTA (Paediatric European Network for Treatment of AIDS) Foundation. The funders had no role in study design, data collection, analysis, interpretation, or manuscript preparation.