The DIRAC framework: Geometric structure underlies roles of diversity and accuracy in combining classifiers

Matthew J Sniatynski; John A Shepherd; Lynne R Wilkens; D Frank Hsu; Bruce S Kristal

doi:10.1016/j.patter.2024.100924

The DIRAC framework: Geometric structure underlies roles of diversity and accuracy in combining classifiers

Patterns (N Y). 2024 Feb 5;5(3):100924. doi: 10.1016/j.patter.2024.100924. eCollection 2024 Mar 8.

Authors

Matthew J Sniatynski^{1

2}, John A Shepherd³, Lynne R Wilkens⁴, D Frank Hsu⁵, Bruce S Kristal¹

Affiliations

¹ Division of Sleep and Circadian Disorders, Department of Medicine, Brigham and Women's Hospital, Boston, MA 02115, USA.
² Division of Sleep Medicine, Department of Medicine, Harvard Medical School, Boston, MA 02115, USA.
³ School of Medicine, University of California, San Francisco, San Francisco, CA, USA.
⁴ University of Hawaii Cancer Center, University of Hawaii at Mānoa, Honolulu, HI, USA.
⁵ Department of Computer and Information Science, Fordham University, New York, NY 10023, USA.

Abstract

Combining classification systems potentially improves predictive accuracy, but outcomes have proven impossible to predict. Similar to improving binary classification with fusion, fusing ranking systems most commonly increases Pearson or Spearman correlations with a target when the input classifiers are "sufficiently good" (generalized as "accuracy") and "sufficiently different" (generalized as "diversity"), but the individual and joint quantitative influence of these factors on the final outcome remains unknown. We resolve these issues. Building on our previous empirical work establishing the DIRAC (DIversity of Ranks and ACcuracy) framework, which accurately predicts the outcome of fusing binary classifiers, we demonstrate that the DIRAC framework similarly explains the outcome of fusing ranking systems. Specifically, precise geometric representation of diversity and accuracy as angle-based distances within rank-based combinatorial structures (permutahedra) fully captures their synergistic roles in rank approximation, uncouples them from the specific metrics of a given problem, and represents them as generally as possible.

Keywords: DIRAC; accuracy; correlation; diversity; information fusion; permutahedron; ranks; system fusion.