Ranks underlie outcome of combining classifiers: Quantitative roles for diversity and accuracy

Patterns (N Y). 2021 Dec 22;3(2):100415. doi: 10.1016/j.patter.2021.100415. eCollection 2022 Feb 11.

Abstract

Combining classifier systems potentially improves predictive accuracy, but outcomes have proven impossible to predict. Classification most commonly improves when the classifiers are "sufficiently good" (generalized as " accuracy ") and "sufficiently different" (generalized as " diversity "), but the individual and joint quantitative influence of these factors on the final outcome remains unknown. We resolve these issues. Beginning with simulated data, we develop the DIRAC framework (DIversity of Ranks and ACcuracy), which accurately predicts outcome of both score-based fusions originating from exponentially modified Gaussian distributions and rank-based fusions, which are inherently distribution independent. DIRAC was validated using biological dual-energy X-ray absorption and magnetic resonance imaging data. The DIRAC framework is domain independent and has expected utility in far-ranging areas such as clinical biomarker development/personalized medicine, clinical trial enrollment, insurance pricing, portfolio management, and sensor optimization.

Keywords: accuracy; correlation; data fusion; decision fusion; diversity; information fusion; model fusion; ranks; scores; system fusion.