The interplay of machine learning-based resonant anomaly detection methods

Tobias Golling; Gregor Kasieczka; Claudius Krause; Radha Mastandrea; Benjamin Nachman; John Andrew Raine; Debajyoti Sengupta; David Shih; Manuel Sommerhalder

doi:10.1140/epjc/s10052-024-12607-x

The interplay of machine learning-based resonant anomaly detection methods

Eur Phys J C Part Fields. 2024;84(3):241. doi: 10.1140/epjc/s10052-024-12607-x. Epub 2024 Mar 8.

Authors

Affiliations

¹ Département de physique nucléaire et corpusculaire, Université de Genève, 1211 Geneva, Switzerland.
² Institut für Experimentalphysik, Universität Hamburg, 22761 Hamburg, Germany.
³ Institut für Theoretische Physik, Universität Heidelberg, 69120 Heidelberg, Germany.
⁴ Department of Physics, University of California, Berkeley, CA 94720 USA.
⁵ Physics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720 USA.
⁶ Berkeley Institute for Data Science, University of California, Berkeley, CA 94720 USA.
⁷ NHETC, Department of Physics and Astronomy, Rutgers University, Piscataway, NJ 08854 USA.

Abstract

Machine learning-based anomaly detection (AD) methods are promising tools for extending the coverage of searches for physics beyond the Standard Model (BSM). One class of AD methods that has received significant attention is resonant anomaly detection, where the BSM physics is assumed to be localized in at least one known variable. While there have been many methods proposed to identify such a BSM signal that make use of simulated or detected data in different ways, there has not yet been a study of the methods' complementarity. To this end, we address two questions. First, in the absence of any signal, do different methods pick the same events as signal-like? If not, then we can significantly reduce the false-positive rate by comparing different methods on the same dataset. Second, if there is a signal, are different methods fully correlated? Even if their maximum performance is the same, since we do not know how much signal is present, it may be beneficial to combine approaches. Using the Large Hadron Collider (LHC) Olympics dataset, we provide quantitative answers to these questions. We find that there are significant gains possible by combining multiple methods, which will strengthen the search program at the LHC and beyond.