Use of Response Permutation to Measure an Imaging Dataset's Susceptibility to Overfitting by Selected Standard Analysis Pipelines

Acad Radiol. 2024 Apr 12:S1076-6332(24)00097-7. doi: 10.1016/j.acra.2024.02.028. Online ahead of print.

Abstract

Rationale and objectives: This study demonstrates a method for quantifying the impact of overfitting on the receiving operator characteristic curve (AUC) when using standard analysis pipelines to develop imaging biomarkers. We illustrate the approach using two publicly available repositories of radiology and pathology images for breast cancer diagnosis.

Materials and methods: For each dataset, we permuted the outcome (cancer diagnosis) values to eliminate any true association between imaging features and outcome. Seven types of classification models (logistic regression, linear discriminant analysis, Naïve Bayes, linear support vector machines, nonlinear support vector machine, random forest, and multi-layer perceptron) were fitted to each scrambled dataset and evaluated by each of four techniques (all data, hold-out, 10-fold cross-validation, and bootstrapping). After repeating this process for a total of 50 outcome permutations, we averaged the resulting AUCs. Any increase over a null AUC of 0.5 can be attributed to overfitting.

Results: Applying this approach and varying sample size and the number of imaging features, we found that failing to control for overfitting could result in near-perfect prediction (AUC near 1.0). Cross-validation offered greater protection against overfitting than the other evaluation techniques, and for most classification algorithms a sample size of at least 200 was required to assess as few as 10 features with less than 0.05 AUC inflation attributable to overfitting.

Conclusion: This approach could be applied to any curated dataset to suggest the number of features and analysis approaches to limit overfitting.

Keywords: AUC; Bias; Classifier performance; Machine learning; Overfitting; Radiomics.