SciPost logo

SciPost Submission Page

CURTAINs Flows For Flows: Constructing Unobserved Regions with Maximum Likelihood Estimation

by Debajyoti Sengupta, Sam Klein, John Andrew Raine, Tobias Golling

Submission summary

Authors (as registered SciPost users): John Raine · Debajyoti Sengupta
Submission information
Preprint Link: scipost_202305_00009v1  (pdf)
Data repository: https://zenodo.org/record/4536377
Date submitted: 2023-05-08 13:15
Submitted by: Raine, John
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Experiment
  • High-Energy Physics - Phenomenology
Approaches: Experimental, Phenomenological

Abstract

Model independent techniques for constructing background data templates using generative models have shown great promise for use in searches for new physics processes at the LHC. We introduce a major improvement to the CURTAINs method by training the conditional normalizing flow between two side-band regions using maximum likelihood estimation instead of an optimal transport loss. The new training objective improves the robustness and fidelity of the transformed data and is much faster and easier to train. We compare the performance against the previous approach and the current state of the art using the LHC Olympics anomaly detection dataset, where we see a significant improvement in sensitivity over the original CURTAINs method. Furthermore, CURTAINsF4F requires substantially less computational resources to cover a large number of signal regions than other fully data driven approaches. When using an efficient configuration, an order of magnitude more models can be trained in the same time required for ten signal regions, without a significant drop in performance.

Current status:
In refereeing

Reports on this Submission

Anonymous Report 2 on 2024-3-1 (Invited Report)

Report

I recommend the paper "CURTAINs Flows For Flows: Constructing Unobserved Regions with Maximum Likelihood Estimation" for publication in SciPost Physics after a (very) minor revision. The paper is a valuable addition to the literature on weakly supervised anomaly searches. CURTAINSF4F improves on the original CURTAINS idea and makes it competitive with other state of the art methods in terms of performance and has an edge in terms of run time as argued by the authors. Hence, CURTAINSF4F is becoming one of the standard methods for weakly supervised anomaly detection in bump hunt searches at the LHC and publication in SciPost Physics is clearly indicated. In the meanwhile, it has been well received by the community and has, for example, been used in a comparison of weakly supervised methods in 2307.11157. It is well written in general. The work and the corresponding results are convincingly presented.

Requested changes

A few points to be considered in a minor revision are listed in the following:

1. The central reference for the "Flows for Flows" technique in the paper draft is 2211.02487. However, this paper seems to be no longer supported by the authors. It seems it has been replaced by the later publication 2309.06472. The situation should be clarified in the paper draft by referring to the relevant publication. The publication 2309.06472 contains several options for the generic "Flows for Flows" technique. The authors should make it clear which one has been used here.

2. In section 2 you write "Events are required to have exactly two large radius jets". Maybe I am mistaken, but I do not think that there is a jet veto against a third hard jet in the data. I guess, just the two leading p_T jets are selected for further analysis and those are ordered in mass, correct?

3. In the second paragraph of section 3.2 you write "Data are drawn from both SBs and target $m_{JJ}$ values (m_target) are randomly assigned to each data point using all $m_{JJ}$ values in the batch." Can you comment if this assignment is strictly necessary or if it would be also possible to draw target values from the $m_{JJ}$ distribution?

4. In the third paragraph of section 3.2 you describe the averaging procedure for the loss and then refer to figure 2 for visualization. However, I do not think that the averaging is included in figure 2, is it? Otherwise I might have misunderstood the averaging procedure and some clarification might be helpful.

5. At the end of section 3.4 you write "For each SR, the network would only be evaluated for values in SB1 and SB2, and no bias would be introduced from data in the SR." I perfectly believe that there is no noticeable effect in you analysis. Nevertheless, I find this statement too strong. I do not think that it is guaranteed that the fitting procedure of the conditional density estimator to the data in the side bands cannot be influenced by a potential signal in the signal region. After all the density is estimated as a whole, not specifically for a given $m_{JJ}$ value or region. If you agree you could make this statement a bit less strict.

6. In the second paragraph of section 4 you write "We define a SR centred on the signal process with a width of 400 GeV, which contains nearly all of the signal events." Later on you state that it contains 2214 of 3000 events. So, "nearly all" is a slightly misleading statement.

7. The name CURTAINSF4F appears in the abstract without being properly introduced. This could for example be easily done in the third line of the abstract: "We introduce CURTAINSF4F, a major... "

8. You use "top flow" for the first time on the bottom of page 3 as if this is a well-known phrase. I think this phrase should be properly introduced. On page 4 you also call it "transformer flow".

9. There are a few typos:
better significant improvement -> better significance improvement (section 4.1)
performs equally as well as CATHODE -> performs equally well as CATHODE (section 4.1)
the require time -> the required time (section 4.3)

  • validity: -
  • significance: -
  • originality: -
  • clarity: -
  • formatting: -
  • grammar: -

Report 1 by Ramon Winterhalder on 2024-2-27 (Invited Report)

Strengths

1. The CurtainsF4F approach is a significant improvement compared to the standard Curtains approach, both in terms of performance and elegance, due to its simplified loss and training.
2. The paper nicely compares its method with other existing approaches to date (keeping in mind that the paper's submission date was May 2023!).
3. It is nicely demonstrated that the method can be further improved regarding computational efficiency with very simple and little adjustments.

Weaknesses

1. There are only minor points in terms of description and notation, which would improve the readability and make the paper even more accessible. Details are given in the requested changes.

Report

I very much appreciate the work and the effort being put into this manuscript. It is an excellent extension to previous works and is very much worth publishing. Before doing so, I would only like to ask for a minor revision as indicated in the requested changes.

Requested changes

1. In the section around Eq. (1), it is unclear what the indices \theta and \phi indicate. Maybe this can be clarified with a sentence. Also, it is not clear to me why the Gaussian prior has some index \theta? Maybe I just do not fully understand the notation here.
2. On the same line, in Figure 1, it should be clarified what the index \gamma stands for.
3. In general, all Figures 3-8 would be easier to read if the plot labels were slightly larger. However, this is only a personal preference and should not be considered a critical point for not publishing.
4. In section 4.3, while it has been mentioned before that Cathode has been trained on wider SB, it might be worth mentioning again that the difference in training size resulting from that is why Cathode has slightly longer training times compared to Curtains.
5. In section 4.4, in contrast to the "default" setup, the base flow is trained on the entire spectrum, potentially including the relevant SR for later evaluation. When only reading this sentence, it is not directly obvious why this is ok. While the validity of this approach has become clear after rereading the approach and looking at the formulas, this might cause similar confusion to other readers. It would be worth clarifying this difference again and emphasizing why this is not a problem.
6. Another point that is not directly clear to me is why your Top flow in the "efficiency" configuration with only 2 coupling blocks works in this scenario where it is acting on 6 or, more precisely, 5 + 1 (condition) dimensions. In general, as it has been shown in the first i-flow paper, to catch all correlations in a 5-6 dimensional space, you would require at least 6 (or 3 if your single coupling block modifies both partitions) coupling blocks, assuming optimal permutation between them. I assume you did some hyperparameter optimization on the number of coupling blocks? If so, it would be great to mention this. Also, the authors might comment on what this means if only 2 coupling blocks are sufficient to describe all relevant features.

  • validity: top
  • significance: top
  • originality: high
  • clarity: high
  • formatting: excellent
  • grammar: perfect

Login to report or comment