DeepCOVID-XR: An Artificial Intelligence Algorithm to Detect COVID-19 on Chest Radiographs Trained and Tested on a Large U.S. Clinical Data Set

Radiology. 2021 Apr;299(1):E167-E176. doi: 10.1148/radiol.2020203511. Epub 2020 Nov 24.

Abstract

Background There are characteristic findings of coronavirus disease 2019 (COVID-19) on chest images. An artificial intelligence (AI) algorithm to detect COVID-19 on chest radiographs might be useful for triage or infection control within a hospital setting, but prior reports have been limited by small data sets, poor data quality, or both. Purpose To present DeepCOVID-XR, a deep learning AI algorithm to detect COVID-19 on chest radiographs, that was trained and tested on a large clinical data set. Materials and Methods DeepCOVID-XR is an ensemble of convolutional neural networks developed to detect COVID-19 on frontal chest radiographs, with reverse-transcription polymerase chain reaction test results as the reference standard. The algorithm was trained and validated on 14 788 images (4253 positive for COVID-19) from sites across the Northwestern Memorial Health Care System from February 2020 to April 2020 and was then tested on 2214 images (1192 positive for COVID-19) from a single hold-out institution. Performance of the algorithm was compared with interpretations from five experienced thoracic radiologists on 300 random test images using the McNemar test for sensitivity and specificity and the DeLong test for the area under the receiver operating characteristic curve (AUC). Results A total of 5853 patients (mean age, 58 years ± 19 [standard deviation]; 3101 women) were evaluated across data sets. For the entire test set, accuracy of DeepCOVID-XR was 83%, with an AUC of 0.90. For 300 random test images (134 positive for COVID-19), accuracy of DeepCOVID-XR was 82%, compared with that of individual radiologists (range, 76%-81%) and the consensus of all five radiologists (81%). DeepCOVID-XR had a significantly higher sensitivity (71%) than one radiologist (60%, P < .001) and significantly higher specificity (92%) than two radiologists (75%, P < .001; 84%, P = .009). AUC of DeepCOVID-XR was 0.88 compared with the consensus AUC of 0.85 (P = .13 for comparison). With consensus interpretation as the reference standard, the AUC of DeepCOVID-XR was 0.95 (95% CI: 0.92, 0.98). Conclusion DeepCOVID-XR, an artificial intelligence algorithm, detected coronavirus disease 2019 on chest radiographs with a performance similar to that of experienced thoracic radiologists in consensus. © RSNA, 2020 Supplemental material is available for this article. See also the editorial by van Ginneken in this issue.

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • COVID-19 / diagnostic imaging*
  • Datasets as Topic
  • Female
  • Humans
  • Lung / diagnostic imaging*
  • Male
  • Middle Aged
  • Radiographic Image Interpretation, Computer-Assisted / methods*
  • Radiography, Thoracic / methods*
  • SARS-CoV-2
  • Sensitivity and Specificity
  • United States