SciPost Submission Page
Highdimensional and Permutation Invariant Anomaly Detection
by Vinicius Mikuni, Benjamin Nachman
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users):  Vinicius Mikuni 
Submission information  

Preprint Link:  https://arxiv.org/abs/2306.03933v5 (pdf) 
Date accepted:  20240222 
Date submitted:  20240209 19:51 
Submitted by:  Mikuni, Vinicius 
Submitted to:  SciPost Physics 
Ontological classification  

Academic field:  Physics 
Specialties: 

Approaches:  Experimental, Computational, Phenomenological 
Abstract
Methods for anomaly detection of new physics processes are often limited to lowdimensional spaces due to the difficulty of learning highdimensional probability densities. Particularly at the constituent level, incorporating desirable properties such as permutation invariance and variablelength inputs becomes difficult within popular density estimation methods. In this work, we introduce a permutationinvariant density estimator for particle physics data based on diffusion models, specifically designed to handle variablelength inputs. We demonstrate the efficacy of our methodology by utilizing the learned density as a permutationinvariant anomaly detection score, effectively identifying jets with low likelihood under the backgroundonly hypothesis. To validate our density estimation method, we investigate the ratio of learned densities and compare to those obtained by a supervised classification algorithm.
Author comments upon resubmission
The authors would like to thank the referees for the insightful comments made during the review process. We incorporated the feedback in the text and give detailed answers below to comments received.
The paper presents a seriously interesting application of modern neural networks to a key task at the LHC. It is novel, relevant, and well written. Sorry for being late with my report. I only have a few questions/remarks for the authors to consider, largely voluntarily:
 At the top of p.2 you say that there are no applications of diffusion models for density estimation. I view generative diffusion models over phase space as density estimation, but I could be easily convinced to agree with a more specific statement…
Even though the generation step for diffusion models implicitly sample from the density, you normally don’t have direct access to the value of the density from the trained model unless you solve the associated ODE. We mention that our application is the first explicit method for density estimation with low level inputs to make this distinction more clear. We also added to the text to make the point more clear: Similarly, generative models are capable to implicitly learn the the density through data generation. In this case, high quality samples can be created but the exact value of the density is not explicitly known.
 As I have said for other papers, any chance you could include some kind of graphic representation of your network setup? So people can use it during talks?
Our model uses the same architecture as the one we introduced in the fast point cloud generative model published in Phys. Rev. D 108, 036025. There we provide the network setup in Fig. 2

Why do you modify our two darkjet datasets? I know it is extra work, but I think it would be very useful to show results for our Aachen and Heidelberg' datasets. Any chance you could add that, to see what your network does when challenged more seriously? In our initial studies we have used similar settings to the samples used in the darkjets paper. However, at the time, we observed some issues in the performance evaluated over these samples, which were fixed when we improved the likelihood estimation of our model. Nevertheless, at the time we had already changed the settings of the dataset we were using to evaluate the model performance. Since there are no public repositories containing the dark jets dataset, we have, to the best of our ability, tried to reproduce the dataset from the darkjets publication and provide an appendix with the results of the ROC and maximum SIC curves for this dataset.

I am sorry, but I do not understand the argument before Eq.(7). The number of constituents is also one of the most useful observables to find semivisible jets. What is it exactly that you want to achieve with this anomaly score in relation to N? And N = N_part? Sorry for not getting your point.
Indeed, the number of constituents is an important feature. The problem we faced is that comparing jets with different multiplicities in terms of the density is tricky since the dimensional space where the densities are calculated is not the same (note that density has dimensions!). Our solution was to use the particle multiplicity to normalize the particle density model (p(partjet)), such that the difference in dimensionality, irrespective to the other properties of the particles, is not the deciding factor when comparing the densities. We still include the particle multiplicity in the jet density model, p(jet), as one of the features such that we are still sensitive to this feature.

Could you maybe add some more information to Tab.1, for instance on the inverse, Top background vs QCD signal performance? As far as I can see this table is the only way to compare your results to, for instance, Fig.4 in `QCD or What' or Fig.4 in the NAE paper. We have also included the values of the maximum SIC curve and AUC for Top quarks vs. QCD and the Z’ sample.

Similarly, any change you could show the inverse QCD signal performance in Fig.5? Do the ROC curves look the same? We have now added the additional ROC curves for top vs QCD and top vs Z’ to figure 5 left.
 I am a little lost in your conclusions. In the beginning, your paper seems to be about the unsupervised density estimation, but the conclusion then ends with the positive note on the density ratio? In this paper we also introduced the comparison of the likelihood ratio, estimated from the ratio of learned densities, with the estimation derived from the results of a trained classifier. Based on Figure 5 right we see that currently, the density estimator produces a worse ROC curve compared to the classifier. This comparison is important since without knowledge of the true likelihood, determining how good the density estimation is becomes nontrivial.
List of changes
 Textual changes to address the referee's comments:
Added: Similarly, generative models are capable to implicitly learn the the density through data generation. In this case, high quality samples can be created but the exact value of the density is not explicitly known.
 Added a new appendix comparing the result of our model with previously used dark jets datasets
Added the maximum SIC curve and AUC for Top quarks vs. QCD and the Z’ sample as a table.
 Added the additional ROC curves for top vs QCD and top vs Z’ to figure 5 left.
Published as SciPost Phys. 16, 062 (2024)