How to Understand Limitations of Generative Networks

Ranit Das; Luigi Favaro; Theo Heimel; Claudius Krause; Tilman Plehn; David Shih

SciPost Submission Page

How to Understand Limitations of Generative Networks

by Ranit Das, Luigi Favaro, Theo Heimel, Claudius Krause, Tilman Plehn, David Shih

This Submission thread is now published as

SciPost Phys. 16, 031 (2024)

Submission summary

Authors (as registered SciPost users):

Luigi Favaro · Claudius Krause · Tilman Plehn

Submission information
Preprint Link:	https://arxiv.org/abs/2305.16774v2 (pdf)
Code repository:	https://github.com/heidelberg-hepml/discriminator-metric
Data repository:	https://zenodo.org/records/10277550
Date accepted:	2024-01-04
Date submitted:	2023-12-11 18:07
Submitted by:	Favaro, Luigi
Submitted to:	SciPost Physics

Ontological classification
Academic field:	Physics
Specialties:	High-Energy Physics - Phenomenology

Abstract

Well-trained classifiers and their complete weight distributions provide us with a well-motivated and practicable method to test generative networks in particle physics. We illustrate their benefits for distribution-shifted jets, calorimeter showers, and reconstruction-level events. In all cases, the classifier weights make for a powerful test of the generative network, identify potential problems in the density estimation, relate them to the underlying physics, and tie in with a comprehensive precision and uncertainty treatment for generative networks.

Author comments upon resubmission

We thank the referees for their feedback on the manuscript.

A major point raised by both reports is the application of classifier knowledge to improve the generative network. Our studies cover examples where the generators are well understood and, therefore, we can easily pin down failure modes from the weight distributions. In general this is not true, the classifier can highlight biases in the generative model which are not visible from histograms of 1-d observables. This knowledge can directly be used during training, we include a discussion of different methods in the manuscript, or it can be used a posteriori as a diagnostic tool to highlight failure modes, for instance in the tail of the weight distribution. We demonstrate its effectiveness in the paper by matching features of the weight distribution to failure modes in the high-level features. Improving the generator afterwards is not the point of our paper, addressing the identified issues has to be done in a problem-dependent way, for instance by changing the input parameterization or the network architecture. Generally, the workflow would still follow our discussion: - look for tails or structure in the weight distriutions; - match these failure modes to physics aspects of the data and/or failures in the generator with manual or automated clustering methods. We added a simpified example of possible improvement direction in section 4.2 but we leave the details for future work.

List of changes

Reply to Report 2:

1) We added a comment on the ML literature on discriminative versus generative classifiers.

2-3) We added a discussion of the similarities and differences of using a classifier for reweighting, in the training of a GAN, for directly improving the generative network and as a diagnostic tool.

4) We believe that an analysis based on projection in a lower dimensional space, e.g. PCA or TSNE, is not more interpretable than studying physically motivated observables, especially for problems where complex features are known. For example the sharp $\Delta$R cut for event generation.

5) We included the Github repository in the manuscript which includes links to the datasets used to train the classifiers.

Reply to Report 1:

Requested change:
Fig. 1 is an example where a large correction found by the classifier is visible in the ROC curve (as you describe, background rejection is orders of magnitude higher), but it has almost no impact on the numerical value of the AUC, which is close to 0.5. Therefore, it confirms our statement that the AUC (i.e. the single number, not the full ROC curve) as a performance metric is insensitive to this failure mode. We rephrased the second statement to make the distinction between the sensitivity of the AUC and the ROC curve more clear.

Published as SciPost Phys. 16, 031 (2024)

SciPost Submission Page

How to Understand Limitations of Generative Networks

by Ranit Das, Luigi Favaro, Theo Heimel, Claudius Krause, Tilman Plehn, David Shih

This Submission thread is now published as

Submission summary

Abstract

Author comments upon resubmission

List of changes

Login to report or comment