SciPost Submission Page
Ephemeral Learning -- Augmenting Triggers with Online-Trained Normalizing Flows
by Anja Butter, Sascha Diefenbacher, Gregor Kasieczka, Benjamin Nachman, Tilman Plehn, David Shih, Ramon Winterhalder
This Submission thread is now published as
|Authors (as registered SciPost users):||Sascha Diefenbacher · Tilman Plehn · Ramon Winterhalder|
|Preprint Link:||https://arxiv.org/abs/2202.09375v2 (pdf)|
|Date submitted:||2022-06-29 16:18|
|Submitted by:||Diefenbacher, Sascha|
|Submitted to:||SciPost Physics|
The large data rates at the LHC require an online trigger system to select relevant collisions. Rather than compressing individual events, we propose to compress an entire data set at once. We use a normalizing flow as a deep generative model to learn the probability density of the data online. The events are then represented by the generative neural network and can be inspected offline for anomalies or used for other analysis purposes. We demonstrate our new approach for a toy model and a correlation-enhanced bump hunt.
Published as SciPost Phys. 13, 087 (2022)
Author comments upon resubmission
We are grateful to the referees for their input and thoughtful comments and suggestions. We integrated the several suggestions into the text, with any modifications indicated in the list of Changes. Additionally we would like to address some of the raised points in more specific detail:
1- The impact the signal contamination has on the training could be made clearer. For example, one might naively guess that the signal is tiny in the beginning and that each update will make the signal significance larger. Here in lies the confusion. If the signal is larger after the update, won’t the subsequent update absorb the signal in the background model? Some clarification may alleviate such concerns.
-> Thank you for raising this point, however we are unsure about the source of the confusion. The signal fraction remains constant throughout a single training, as it would be purely defined by the underlying physics. The update indicated in Fig.2 is not intended as an automated, continuous process that takes effect during the training, but rather as an after-the-fact cross check in case any hints of new physics are found in the OnlineFlow scheme.
2- As a follow-up to the previous point, can/does OnlineFlow account for changing pileup conditions? Assuming that it does this well, to what order-of-magnitude level of S/B can this tool find new physics? Presumably given enough training cycles and updates it can squeeze out the significance given any S/B.
-> We agree that the exact behavior of the OnlineFlow for various signal rates would be interesting to explore, however we feel this to be beyond the current scope of this work. We added a remark about the behavior for various S/B ratios and how this is an interesting region to explore. As for pileup, the objects that go into the flow would be pileup corrected, so there should be no difference compared to an offline search.
3- The physics case considered is the LHC Olympics dataset containing a W’ decay filtering on having a jet with pT > 1.2 TeV. This feel like an odd choice to showcase as the threshold for the lowest unprescaled single jet trigger at CMS and ATLAS was much lower at around 500 GeV during Run-2. One can imagine an argument that the specifics of this problem does not matter as it is demonstrating the capability and that for deployment lower pT jets would be targeted. Is that true? Assuming that it is, the reader is left to wonder whether the results of Figure 9 still holds. Some clarification would be appreciated.
-> Thank you for noting this, while this dataset features high mass resonances that are not perfectly in line with the intended application range of OnlineFlow, we feel that the proven and well known nature of the LHCO data, as well as its availability make up for this.
4- The role, rationale, and impact of dummy variables is vague. The appearance of dummy variables in the parametric example seems less controversial since standard optimization tools may not work for for 1d inputs for technical reasons. Three is a mystery here, but not of big concern. However, for the LHCO problem the inputs are doubled with dummy variables. What is going on? -> The origin of the three dummy variables stems from doubling the input twice, from 1 to 2 to 4 in total. The addition of the dummy variable in the LHCO setting is indeed not required, however we saw and improvement to the training behavior through their inclusion. This however not systematically explored. We further added more clarification on the origin as use of the dummy variables to the paper
5- The introductory sentence gives the impression that the collision rate might be going up, rather than the trigger rate at the same collision rate. Perhaps there is a way to reword.
-> Thank you for pointing this out, however the purpose of the first sentence is to say that the collision rate is too high to record all events - there is nothing to do with the trigger there. Therefore, we prefer to keep these sentences as they are. The logical flow here is: (1) collision rate is too high to save everything. (2) Need to save partial information, either by throwing out events, throwing out parts of events, or both. Each method has advantages and disadvantages.
6- The last sentence of the penultimate paragraph is confusing. Isn’t the network determination based on training data?
-> While the network is based on the training data, those data are not saved. What we mean by this sentence is that the generative model trained online could be equivalent to saving all of the training data for offline”.
7- Would a sketch or example of the idea of “generative ML” be too pedagogical? Perhaps even a simple definition would help.
-> While we can understand the point, we do feel that a introduction into generative models would not be the best fit for this paper.
8- The bulleted steps is helpful. One part that triggered a question is step 3, where it mentions “indication of new physics”. How can the system distinguish new physics compared to a detector flaw? The authors may know of a paper by A. Pol et al (https://arxiv.org/abs/1808.00911) that discusses detector monitoring using anomaly detection. It would be interesting to know what the authors think of the new physics vs. detector issues ambiguity, even if it is beyond the scope of the paper. If appropriate, please add the citation.
-> We understand how this question arises, however the OnlineFlow setup itself does not aim to distinguish between new and known physics. The principled idea is to store a representation of the measured data while using less effective storage space. How this representation of the data is later on treated in specific is beyond the scope of this work, the two anomaly detection methods used (BumpHunt and CWoLa) serve as a non-exhaustive example. The update to the trigger system is not intended to happen in an automated online setting, but rather involves modifying trigger menus after analyzing the representation of the data in an offline setting.
9- The presentation of where and how OnlineFlow could be deployed could be made more concrete. Only after reading the paper does the reader understand that the tool is to be deployed at HLT-like environments where all of the reconstructed inputs, including sophisticated variables that require tracking information, are made available as inputs. This preprocessing is taken for granted—which may be somewhat reasonable for HLT, although that would depend whether full tracking is needed—and this would not be a given for FPGA/GPU-like setups.
-> We do mention the deployment on HLT as the first proposed deployment level quite early on in this section and we feel that an even earlier mention of this would disrupt the text flow too much.
10- Section 3.1, after referencing figure 4, states that the OnlineFlow reproduces the peak. OnlineFlow seems to have a weak bump above background, but it is much flatter than the S+B. Is that what you mean? This feels like it did not reproduce the peak. Please clarify.
-> As the goal is in finding evidence of new physics in a subsequent offline analysis of the OnlineFlow data, any abundance, even if it does not perfectly model the signal peak, may still be sufficient to find new physics. We have added a sentence to this section detailing this and further elaborating on the signal peak agreement.
11- In section 3.3, it’s not clear to the reader that the parameters given in equation 4 are important enough to document it in the paper. Perhaps it is sufficient to state them in the figure? Also, the relevant figure should be referenced somewhere earlier in section 3.3.
-> We do feel the parameters are important as they allow for comparison between the fit result on the OnlineFlow data and the true underlying function parameters. Stating them in the figure unfortunately resulted in a difficult to read figure. Unfortunately we are not sure which figure is referred to here, as in our eye the fit parameters would not seem appropriate or relevant for the significance comparison in Fig.6. Further Fig.5, which does show a fit, does not use OnlineFlow data, but the analysis performed on the training data, so adding the parameters here does not seem intuitive either.
12- Is there significance of prescale factor ~ 4? It’s not obvious to the reader why 4, if significant. Perhaps it is empirical to the problem at hand. Please clarify.
-> The prescale factor of 4 is not inherently significant, however it is the prescale factor at which the OnlineFlow significance and data significance crossover. To further help that point we specifically indicated the crossover point in the relevant figure.
List of changes
- Section 1, Paragraph 2: removed superfluous citation
- Section 1, Paragraph 2: changed “the training of sophisticated networks on such devices is still an active area of research” to “the training of graph-based networks on such devices is still an active area of research
- Section 1, Paragraph 2: changed “On the other hand, the all-GPU first trigger stage of LHCb might allow the ready deployment of this idea” to “At the same time, the available resources limit the size and therefore complexity of possible ML models.”
- Section 1, Paragraph 3: added footnote: “as training (as opposed to inference) models on FPGA hardware deployed at earlier trigger stages is currently not possible”
- Section 1, Paragraph 3: changed “However, a sufficiently optimized version of this approach could transform data taking by removing the need for triggers altogether.” to “However, a sufficiently optimized version of this approach could transform data taking by instead learning the overall distribution of data without the need for triggers altogether.”
- Section 2, Paragraph 1: changed “they are hardware-based and do not perform complex reconstruction.” to “they are hardware-based and at most perform low-complexity reconstruction.”
- Section 2, Itemized list: changed “analyse” to “analyze”
- Section 2, Figure 2: separated L1T and HLT into separate boxes and to clarify the input the OnflineFlow receives
- Section 3.1, Paragraph 2: changed “we add three dummy dimensions drawn from a normal
distribution, resulting in a total of four input and output dimensions.“ to “we add three dummy dimensions drawn from a normal distribution, quadrupling the total number of input and output dimensions to four.“
- Section 3.1, Paragraph 3: changed “In Fig. 4 we show how the generated OnlineFlow events, trained on signal plus background” to “In Fig. 4 we show how OnlineFlow, trained on signal plus background”
- Section 3.1, Paragraph 3: added “While the width of the peak is not correctly described we do see a noticeable abundance. As the goal is in finding evidences of new physics in a subsequent offline analysis of the OnlineFlow data, this abundance may still, however, be sufficient for our purpose even if the peak width is mismodeled.”
- Section 3.3, Paragraph 8 (end of section): changed “We see that the OnlineFlow fake rate for the flow varies more than for the training data, but stays well below the signal significance we achieve for the 0.5% signal contamination.” to “We see that there is a larger error margin for the OnlineFlow significance, owing to the larger fluctuations between individual OnlineFlow training, however the average fake rate for the flow stays well below the signal significance we achieve for the 0.5% signal contamination.”
- Section 3.3, Paragraph 8 (end of section): changed “Throughout, the fake rate is at the level of the classical bump hunt.” to “The average also stays consistent with the fake rate of the classical bump hunt.”
- Section 3.3, Paragraph 8 (end of section): added “The precise behavior for intermediate signal rates presents an interesting question that exceeds the scope of this work, but would warrant further investigation.”
- Section 3.3, Figure 6: added grey line indicating the position of the crossover point
- Section 3.3, Figure 6: added in caption “The dotted grey line indicates the crossover point at a datafaction of 1 , which corresponds to a prescale factor of 4.”
- Section 4.1, Paragraph 3: added “While this dataset features high mass resonances that are not perfectly in line with the intended application range of ONLINEFLOW, we feel that the proven and well known nature of the LHCO data, as well as its availability make up for this shortcoming.”
- Section 4.1, Paragraph 6: changed “... , five features and five additional noise dimensions, which we find improves the performance.” to “... .These comprise five features and five additional noise dimensions, the additional noise was found to increase the performance, although no systematic scan over this hyperparameter was performed.”
- Section 4.2, Figure 9: added in caption “Vertical order of the Data lines corresponds to their order in the legend.”
- Section 4.2, Figure 9: increased thickness of OnlineFlow line
- Section 5, Paragraph 4: changed “We believe these examples serve as a prove of concept for the proposed OnlineFlow, warranting further investigation into ways to optimize the setup and into applying it to current trigger systems” to “Implementing OnlineFlow into an existing trigger system will require further work to scale up the networks input dimensionality as well as its expressiveness to handle the more complex data structures of real LHC events. Further, the challenge of integrating the model training into the infrastructure of a real experiment will require further work and exploration. However, regardless of these challenges, we believe the examples demonstrated here serve as a proof of concept for the proposed OnlineFlow, warranting further investigation.”
- Acknowledgments: changed “Quantum Universe" to Quantum Universe
- References: numerous style changes.
Submission & Refereeing History
You are currently on this page