Core Concepts
A data-driven method for efficiently identifying signal-rich regions in high-dimensional feature spaces to enable the discovery of new physics beyond the Standard Model.
Abstract
The content presents a novel approach for detecting new physics signals in high-energy physics experiments, particularly in the context of searching for the production of two Higgs bosons decaying into four b-jets (HH→4b).
Key highlights:
- The authors address the challenge of setting up signal and control regions when there is no prior knowledge about the expected signal, as is the case for completely new types of particles.
- They propose a method that leverages the assumption that signal events are localized in the high-dimensional feature space, without relying heavily on domain knowledge.
- The approach employs the notion of a low-pass filter to extract low-frequency components of the density ratio between 4b and 3b events, allowing the identification of high-frequency features that may correspond to the signal.
- By training a classifier to distinguish between 3b events with added noise and 4b events with added noise, the authors efficiently estimate the smoothed density ratio without directly computing the convolution operation.
- The method is demonstrated on simulated HH→4b events, showing its ability to identify a data-driven signal region that is enriched with signal events compared to its size.
- The authors discuss the importance of choosing an appropriate noise scale for the convolution kernel to balance the preservation of low-frequency features and the suppression of high-frequency features.
- Future work includes extending the method to estimate the background distribution and perform hypothesis testing to determine the presence of new physics signals.
Stats
The following sentences contain key metrics or important figures used to support the author's key logics:
The authors used 3b event samples of size n ∈ {105, 106} and the same size of 4b events.
75% and 6.25% of all the samples were used to estimate γ and eγ, respectively.
The noise scale in each dimension for generating the training dataset for the smoothed density ratio eγ was set to η ∈ {0.01, 0.1, 1} times the length of the range of the corresponding representation.
Quotes
"Remarkably, eγ can be efficiently estimated by learning a classifier without directly evaluating the convolution operation. In particular, we can estimate it by training a classifier to distinguish (Z3b + E, 0) and (Z4b + E, 1), where Z3b and Z4b are the representations of 3b and 4b events, respectively, and E ∼ K is random noise."