Diagnostic test accuracy of artificial intelligence in screening for referable diabetic retinopathy in real-world settings: A systematic review and meta-analysis

Holijah Uy; Christopher Fielding; Ameer Hohlfeld; Eleanor Ochodo; Abraham Opare; Elton Mukonda; Deon Minnies; Mark E Engel

doi:10.1371/journal.pgph.0002160

Diagnostic test accuracy of artificial intelligence in screening for referable diabetic retinopathy in real-world settings: A systematic review and meta-analysis

PLOS Glob Public Health. 2023 Sep 20;3(9):e0002160. doi: 10.1371/journal.pgph.0002160. eCollection 2023.

Authors

Holijah Uy¹, Christopher Fielding², Ameer Hohlfeld³, Eleanor Ochodo^{4

5}, Abraham Opare¹, Elton Mukonda², Deon Minnies¹, Mark E Engel^{3

6}

Affiliations

¹ Community Eye Health Institute, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.
² Division of Epidemiology and Biostatistics, School of Public Health, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa.
³ South African Medical Research Council, Cape Town, South Africa.
⁴ Centre for Global Health Research, Kenya Medical Research Institute, Nairobi, Kenya.
⁵ Centre for Evidence-Based Health Care, Department of Global Health, Faculty of Medicine and Health Sciences, Stellenbosch University, Stellenbosch, South Africa.
⁶ Department of Medicine, University of Cape Town, Cape Town, South Africa.

Abstract

Retrospective studies on artificial intelligence (AI) in screening for diabetic retinopathy (DR) have shown promising results in addressing the mismatch between the capacity to implement DR screening and increasing DR incidence. This review sought to evaluate the diagnostic test accuracy (DTA) of AI in screening for referable diabetic retinopathy (RDR) in real-world settings. We searched CENTRAL, PubMed, CINAHL, Scopus, and Web of Science on 9 February 2023. We included prospective DTA studies assessing AI against trained human graders (HGs) in screening for RDR in patients with diabetes. Two reviewers independently extracted data and assessed methodological quality against QUADAS-2 criteria. We used the hierarchical summary receiver operating characteristics (HSROC) model to pool estimates of sensitivity and specificity and, forest plots and SROC plots to visually examine heterogeneity in accuracy estimates. From our initial search results of 3899 studies, we included 15 studies comprising 17 datasets. Meta-analyses revealed a sensitivity of 95.33% (95%CI: 90.60-100%) and specificity of 92.01% (95%CI: 87.61-96.42%) for patient-level analysis (10 datasets, N = 45,785) while, for the eye-level analysis, sensitivity was 91.24% (95%CI: 79.15-100%) and specificity, 93.90% (95%CI: 90.63-97.16%) (7 datasets, N = 15,390). Subgroup analyses did not provide variations in the diagnostic accuracy of country classification and DR classification criteria. However, a moderate increase was observed in diagnostic accuracy in the primary-level healthcare settings: sensitivity of 99.35% (95%CI: 96.85-100%), specificity of 93.72% (95%CI: 88.83-98.61%) and, a minimal decrease in the tertiary-level healthcare settings: sensitivity of 94.71% (95%CI: 89.00-100%), specificity of 90.88% (95%CI: 83.22-98.53%). Sensitivity analyses did not show any variations in studies that included diabetic macular edema in the RDR definition, nor studies with ≥3 HGs. This review provides evidence, for the first time from prospective studies, for the effectiveness of AI in screening for RDR in real-world settings. The results may serve to strengthen existing guidelines to improve current practices.

Copyright: © 2023 Uy et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Grants and funding

MR/T008768/1/MRC_/Medical Research Council/United Kingdom