Comparison of evaluation metrics of deep learning for imbalanced imaging data in osteoarthritis studies

Shen Liu; Frank Roemer; Yong Ge; Edward J Bedrick; Zong-Ming Li; Ali Guermazi; Leena Sharma; Charles Eaton; Marc C Hochberg; David J Hunter; Michael C Nevitt; Wolfgang Wirth; C Kent Kwoh; Xiaoxiao Sun

doi:10.1016/j.joca.2023.05.006

Comparison of evaluation metrics of deep learning for imbalanced imaging data in osteoarthritis studies

Osteoarthritis Cartilage. 2023 Sep;31(9):1242-1248. doi: 10.1016/j.joca.2023.05.006. Epub 2023 May 19.

Authors

Shen Liu¹, Frank Roemer², Yong Ge³, Edward J Bedrick⁴, Zong-Ming Li⁵, Ali Guermazi⁶, Leena Sharma⁷, Charles Eaton⁸, Marc C Hochberg⁹, David J Hunter¹⁰, Michael C Nevitt¹¹, Wolfgang Wirth¹², C Kent Kwoh¹³, Xiaoxiao Sun¹⁴

Affiliations

¹ Department of Epidemiology and Biostatistics, University of Arizona, 1295 N. Martin Ave., Tucson, AZ 85724, USA. Electronic address: shenliu@arizona.edu.
² Department of Radiology, University of Erlangen - Nuremberg, Erlangen, Germany; Department of Radiology, Boston University School of Medicine, MA, USA. Electronic address: frank.roemer@uk-erlangen.de.
³ Department of Management Information Systems, University of Arizona, AZ, USA. Electronic address: yongge@arizona.edu.
⁴ Department of Epidemiology and Biostatistics, University of Arizona, 1295 N. Martin Ave., Tucson, AZ 85724, USA. Electronic address: edwardjbedrick@arizona.edu.
⁵ University of Arizona Arthritis Center, University of Arizona College of Medicine, Tucson, AZ, USA. Electronic address: lizongming@ortho.arizona.edu.
⁶ Department of Radiology, Boston University School of Medicine, MA, USA. Electronic address: guermazi@bu.edu.
⁷ Feinberh School of Medicine, Northwestern University, IL, USA. Electronic address: l-sharma@northwestern.edu.
⁸ Kent Memorial Hospital, and Department of Family Medicine, Warren Alpert Medical School, and Department of Epidemiology, School of Public Health, Brown University, RI, USA. Electronic address: Charles_Eaton@brown.edu.
⁹ School of Medicine, University of Maryland, and Medical Care Clinical Center, VA Maryland Health Care System, Baltimore, MD, USA. Electronic address: mhochber@som.umaryland.edu.
¹⁰ Sydney Musculoskeletal Health, Kolling Institute, Faculty of Medicine and Health, The University of Sydney, Sydney, 2065 NSW, Australia, and Rheumatology Department, Royal North Shore Hospital, St Leonards, NSW 2065 Australia. Electronic address: david.hunter@sydney.edu.au.
¹¹ Department of Epidemiology and Biostatistics, University of California San Francisco, CA, USA. Electronic address: Michael.Nevitt@ucsf.edu.
¹² Department of Imaging & Functional Musculoskeletal Research, Institute of Anatomy & Cell Biology, Paracelsus Medical University Salzburg & Nuremberg, Salzburg, Austria, and Ludwig Boltzmann Inst. for Arthritis and Rehabilitation, Paracelsus Medical University Salzburg & Nuremberg, Salzburg, Austria, and Chondrometrics GmbH, Ainring, Germany. Electronic address: wirth@chondrometrics.de.
¹³ University of Arizona Arthritis Center, University of Arizona College of Medicine, Tucson, AZ, USA. Electronic address: CKwoh@arthritis.arizona.edu.
¹⁴ Department of Epidemiology and Biostatistics, University of Arizona, 1295 N. Martin Ave., Tucson, AZ 85724, USA. Electronic address: xiaosun@arizona.edu.

PMID: 37209993
PMCID: PMC10524686 (available on 2024-09-01)
DOI: 10.1016/j.joca.2023.05.006

Abstract

Purpose: To compare the evaluation metrics for deep learning methods that were developed using imbalanced imaging data in osteoarthritis studies.

Materials and methods: This retrospective study utilized 2996 sagittal intermediate-weighted fat-suppressed knee MRIs with MRI Osteoarthritis Knee Score readings from 2467 participants in the Osteoarthritis Initiative study. We obtained probabilities of the presence of bone marrow lesions (BMLs) from MRIs in the testing dataset at the sub-region (15 sub-regions), compartment, and whole-knee levels based on the trained deep learning models. We compared different evaluation metrics (e.g., receiver operating characteristic (ROC) and precision-recall (PR) curves) in the testing dataset with various class ratios (presence of BMLs vs. absence of BMLs) at these three data levels to assess the model's performance.

Results: In a subregion with an extremely high imbalance ratio, the model achieved a ROC-AUC of 0.84, a PR-AUC of 0.10, a sensitivity of 0, and a specificity of 1.

Conclusion: The commonly used ROC curve is not sufficiently informative, especially in the case of imbalanced data. We provide the following practical suggestions based on our data analysis: 1) ROC-AUC is recommended for balanced data, 2) PR-AUC should be used for moderately imbalanced data (i.e., when the proportion of the minor class is above 5% and less than 50%), and 3) for severely imbalanced data (i.e., when the proportion of the minor class is below 5%), it is not practical to apply a deep learning model, even with the application of techniques addressing imbalanced data issues.

Keywords: Bone marrow lesion; Deep learning; Imbalanced data; Osteoarthritis; Precision recall curve; Receiver operating characteristic.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Benchmarking
Cartilage Diseases* / pathology
Deep Learning*
Humans
Knee Joint / pathology
Magnetic Resonance Imaging / methods
Osteoarthritis, Knee* / diagnostic imaging
Osteoarthritis, Knee* / pathology
Retrospective Studies

Abstract

Publication types

MeSH terms

Grants and funding