SciPost Submission Page
GANplifying Event Samples
by Anja Butter, Sascha Diefenbacher, Gregor Kasieczka, Benjamin Nachman, Tilman Plehn
This is not the latest submitted version.
This Submission thread is now published as
|Authors (as registered SciPost users):
|Sascha Diefenbacher · Tilman Plehn
A critical question concerning generative networks applied to event generation in particle physics is if the generated events add statistical precision beyond the training sample. We show for a simple example with increasing dimensionality how generative networks indeed amplify the training statistics. We quantify their impact through an amplification factor or equivalent numbers of sampled events.
Submission & Refereeing History
You are currently on this page
Reports on this Submission
Report 2 by Veronica Sanz on 2021-2-2 (Invited Report)
- Cite as: Veronica Sanz, Report on arXiv:2008.06545v2, delivered 2021-02-02, doi: 10.21468/SciPost.Report.2496
This work explores the idea that event generation by generative networks encodes more statistical knowledge than one would naively expect.
This is an extremely interesting idea, and the paper goes some way to quantify the power of amplification when using GANs in event generation.
The only weakness I find in this paper is the level it is aimed to. Despite working on related subjects, as a referee I had a bit of a hard time understanding all the statements in the paper, and were left hungry for more explanations to some statements made.
I do suggest below some possible improvements in the write up to improve readability (physics-wise, not English).
I do strongly recommend its publication.
1. In the Introduction, the authors state that NNs go beyond naive interpolation due to the structure of the network, which e.g. introduces some level of smoothness in the functions it tries to represent. That statement seems intuitive, but one is left with the question of to what extent it holds for different NN architectures and datasets. For example, could the authors comment on why choosing this particular camel back example, and whether they have tested their procedure with other functional forms?
2. In Sec 2, discussion on Fig 2, the authors state 'the agreement between the sample or the fit and the truth generally improves with more quantiles', namely as we go left to right in Fig. 2. Could the authors clarify this point? It isn't straightforward when inspecting the figure.
3. In Sec 2, could the authors explain in a bit more detail why mode collapse is expected for this architecture and dataset, and how the tweak of per-batch statistics helps. Is this a generic problem and solution for this kind of analysis?
4. When moving from 1D/2D (Figs 2 and 4) to higher dimensional examples (Figs. 6 and 7), the number of datapoints goes from 100 to 500. How much do these choices impact the conclusions reached by the authors? In the conclusions we read a bit of discussion on this, but I would like to see something more quantitative.
- Cite as: Anonymous, Report on arXiv:2008.06545v2, delivered 2020-12-15, doi: 10.21468/SciPost.Report.2294
This interesting study comes at the right time, answering to the point raised by Matchev and Shyamsundar (Ref.  in the manuscript) on the uncertainties associated with using Generative Adversarial Networks as a dataset amplifier tool.
This paper demonstrates that a GAN-based strategy for simulation allows one to obtain a dataset larger than the training dataset size, making the whole process computationally advantageous.
No major weak point was identified
I believe that the manuscript is easy to read and basically ready to be accepted for publication. I just ask the authors to consider a few remarks (see detailed list) and submit a minor revision.
1- I think that the factor x25 of expected statistics increase should actually be 19. My math is: LHC data amount to 160 fb-1. HL-LHC should deliver 3000 fb-1. So 3000/160 ~ 19. To get you 25, One would need LHC data to amount to 120 fb-1.
2- I would add a reference to "challenging the current simulation tools". For instance the CMS or ATLAS upgrade TDRs.
3- In a few points (e.g., end of section 2), the paper refers to the fact that using GAN is a trade off between statistical and systematic precision of the interpolation applied by the GAN. This trade off is the essential point of this paper, which determines the saturation observed by the carried-on experiments. I think that this aspect should be expanded in the introduction. This fact should be spelled out more clearly, in my opinion, for the benefit of non-expert readers.
4- The next-to-last paragraph in page 2 is a key point to explain why GANs work. I would integrate this paragraph with some reference, if any. You do refer to related aspects in the next paragraph (e.g., video games). On the other hand, these are qualitative statements. Not sure if something more quantitative exists. If not, it should be stated.
5- I am concerned by the fact that 100 points are not a lot, and the fit in Fig. 1 shows some bias (within statistical uncertainties, of course). Did you verify that the fit is unbiased in average? Did you try more options that your 100-point experiments? I am worried that the results of Fig.1 might be biased by the low-statistics of the fit sample.
6- The discussion in section 2 should be integrated with more details, defining clearly what sample, fit and GAN are. The information is there, but scattered across the paper and this makes the reading and understanding more difficult.
7- Could you a legend to Fig. 1 and explain (e.g., in the caption) the meaning of the horizontal lines at 200 and 300, detailing how you computed them? Could you add the 1000 line as well?
8- You say at page 4 that your uncertainty evaluation for the fit is equivalent to using the full covariance matrix returned by the fit. Did you verify that? Why not just using the covariance matrix? I would be interested to see the comparison. I agree with you that this is true in absence of fit biases, for perfectly chi-sq distributions. But are you in that situation? You likelihood should have a two-fold ambiguity and the two minima might overlap if the parameter determination is not precise enough [here I am assuming that the bit parameters are the two mu and sigma + the relative fraction]
9- How was the GAN architecture chosen? Was any optimization performed? Did you try alternative architectures?
10- In the 2D example, why did you centre one of the two Gaussians at a negative r value? Doing so, you reduce this contribution to a tail below the other Gaussian, so the sense of the camel back shape is gone. I was puzzled by this choice. I would have used some other positive value of mu for the second Gaussian. The same comment applies to the N-dim case.
11- In the outlook section, I think that "a neural network ... they represent" should be "it represents".