Ranks underlie outcome of combining classifiers: Quantitative roles for diversity and accuracy

Matthew J Sniatynski; John A Shepherd; Thomas Ernst; Lynne R Wilkens; D Frank Hsu; Bruce S Kristal

doi:10.1016/j.patter.2021.100415

Ranks underlie outcome of combining classifiers: Quantitative roles for diversity and accuracy

Patterns (N Y). 2021 Dec 22;3(2):100415. doi: 10.1016/j.patter.2021.100415. eCollection 2022 Feb 11.

Authors

Matthew J Sniatynski^{1

2}, John A Shepherd³, Thomas Ernst⁴, Lynne R Wilkens⁵, D Frank Hsu⁶, Bruce S Kristal^{1

2}

Affiliations

¹ Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, 221 Longwood Avenue, LM322B, Boston, MA 02115, USA.
² Division of Sleep Medicine, Harvard Medical School, Boston, MA 02115, USA.
³ School of Medicine, University of California San Francisco, San Francisco, CA 94143, USA.
⁴ John A. Burns School of Medicine, University of Hawaii at Mānoa, Honolulu, HI 96813, USA.
⁵ University of Hawaii Cancer Center, University of Hawaii at Mānoa, Honolulu, HI 96813, USA.
⁶ Department of Computer and Information Science, Fordham University, LL813, 113 West 60th Street, New York, NY 10023, USA.

Abstract

Combining classifier systems potentially improves predictive accuracy, but outcomes have proven impossible to predict. Classification most commonly improves when the classifiers are "sufficiently good" (generalized as " accuracy ") and "sufficiently different" (generalized as " diversity "), but the individual and joint quantitative influence of these factors on the final outcome remains unknown. We resolve these issues. Beginning with simulated data, we develop the DIRAC framework (DIversity of Ranks and ACcuracy), which accurately predicts outcome of both score-based fusions originating from exponentially modified Gaussian distributions and rank-based fusions, which are inherently distribution independent. DIRAC was validated using biological dual-energy X-ray absorption and magnetic resonance imaging data. The DIRAC framework is domain independent and has expected utility in far-ranging areas such as clinical biomarker development/personalized medicine, clinical trial enrollment, insurance pricing, portfolio management, and sensor optimization.

Keywords: accuracy; correlation; data fusion; decision fusion; diversity; information fusion; model fusion; ranks; scores; system fusion.

Abstract

Grants and funding