Efficient and Effective Training of COVID-19 Classification Networks With Self-Supervised Dual-Track Learning to Rank

IEEE J Biomed Health Inform. 2020 Oct;24(10):2787-2797. doi: 10.1109/JBHI.2020.3018181. Epub 2020 Aug 20.

Abstract

Coronavirus Disease 2019 (COVID-19) has rapidly spread worldwide since first reported. Timely diagnosis of COVID-19 is crucial both for disease control and patient care. Non-contrast thoracic computed tomography (CT) has been identified as an effective tool for the diagnosis, yet the disease outbreak has placed tremendous pressure on radiologists for reading the exams and may potentially lead to fatigue-related mis-diagnosis. Reliable automatic classification algorithms can be really helpful; however, they usually require a considerable number of COVID-19 cases for training, which is difficult to acquire in a timely manner. Meanwhile, how to effectively utilize the existing archive of non-COVID-19 data (the negative samples) in the presence of severe class imbalance is another challenge. In addition, the sudden disease outbreak necessitates fast algorithm development. In this work, we propose a novel approach for effective and efficient training of COVID-19 classification networks using a small number of COVID-19 CT exams and an archive of negative samples. Concretely, a novel self-supervised learning method is proposed to extract features from the COVID-19 and negative samples. Then, two kinds of soft-labels ('difficulty' and 'diversity') are generated for the negative samples by computing the earth mover's distances between the features of the negative and COVID-19 samples, from which data 'values' of the negative samples can be assessed. A pre-set number of negative samples are selected accordingly and fed to the neural network for training. Experimental results show that our approach can achieve superior performance using about half of the negative samples, substantially reducing model training time.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Betacoronavirus*
  • COVID-19
  • COVID-19 Testing
  • Clinical Laboratory Techniques / statistics & numerical data*
  • Cohort Studies
  • Computational Biology
  • Coronavirus Infections / classification
  • Coronavirus Infections / diagnosis*
  • Coronavirus Infections / diagnostic imaging*
  • Deep Learning
  • Diagnostic Errors / statistics & numerical data
  • Humans
  • Neural Networks, Computer
  • Pandemics* / classification
  • Pneumonia, Viral / classification
  • Pneumonia, Viral / diagnosis*
  • Pneumonia, Viral / diagnostic imaging*
  • Radiographic Image Interpretation, Computer-Assisted / statistics & numerical data*
  • Retrospective Studies
  • SARS-CoV-2
  • Supervised Machine Learning*
  • Tomography, X-Ray Computed / statistics & numerical data*

Grants and funding

This work was supported in part by the Key Area Research and Development Program of Guangdong Province, China under Grant 2018B010111001, in part by National Key Research and Development Project under Grant 2018YFC2000702, in part by the Natural Science Foundation of China under Grant 61702339, in part by the Science and Technology Program of Shenzhen, China under Grant ZDSYS201802021814180, and in part by Emergency Science and Technology Project on COVID-19 Pneumonia supported by the Department of Science and Technology of Hubei Province Project under Grant 2020FCA016.