SciPost Submission Page
How to GAN away Detector Effects
by Marco Bellagente, Anja Butter, Gregor Kasieczka, Tilman Plehn, Ramon Winterhalder
This is not the latest submitted version.
This Submission thread is now published as
|Authors (as registered SciPost users):||Tilman Plehn · Ramon Winterhalder|
|Preprint Link:||https://arxiv.org/abs/1912.00477v2 (pdf)|
|Date submitted:||2020-01-07 01:00|
|Submitted by:||Winterhalder, Ramon|
|Submitted to:||SciPost Physics|
LHC analyses directly comparing data and simulated events bear the danger of using first-principle predictions only as a black-box part of event simulation. We show how simulations, for instance, of detector effects can instead be inverted using generative networks. This allows us to reconstruct parton level information from measured events. Our results illustrate how, in general, fully conditional generative networks can statistically invert Monte Carlo simulations. As a technical by-product we show how a maximum mean discrepancy loss can be staggered or cooled.
Submission & Refereeing History
You are currently on this page
- Report 3 submitted on 2020-02-07 15:43 by Anonymous
- Report 2 submitted on 2020-02-05 00:24 by Anonymous
- Report 1 submitted on 2020-02-01 22:41 by Anonymous
Reports on this Submission
- Cite as: Anonymous, Report on arXiv:1912.00477v2, delivered 2020-02-07, doi: 10.21468/SciPost.Report.1482
The manuscript "How to GAN away Detector Effects" presents a novel method to unfold detector effects without significantly biasing the corrected distributions. It is highly valuable, and generally well written.
I have a few comments I would like to be addressed though.
1) The authors write they generate the events with MadGraph5 and detector simulate it with Delphes. It should also be stated which tools they use to simulate the parton shower evolution, multiple interactions, hadronisation, hadron decay and QED corrections that lie in between these two steps in a full event simulation chain, and which they surely include in their simulation, as detector responses to partons are not a sensible concept.
2) The authors generally refer to distributions at parton level and detector level. Unless, they are really referring to particle level (ie. particles that are stable and propagate freely on scales of c*tau about 1cm), the proposed unfolding procedure conflates two physically very separate effects: the above transition from short-distance partons into long-distance somewhat stable leptons, photons and hadrons, and the interaction of the latter with the detector material. Please comment and rename if appropriate.
3) In Section 2, how dependent are the results on the precise setup of batch size, the number of epochs and iterations per epochs. The authors did probably check this in detail, to my experience, the number of epochs vs the number of points per epoch seems ill-balanced. I would have expected significantly less epochs with significantly more points per epoch to not be affected by statistical effects for each epoch.
4) The authors focus on mostly on hadronic observables which should receive the largest detector corrections. Besides the lepton transverse momentum, which the authors show, it would be interesting to know how well the dilepton invariant mass is reproduced by the proposed unfolding techniques. It would give a glimpse into how well the method could perform for precision observables.
5) The authors correctly note that their method, when applied to only a moderately small subset (a fraction of 40%) can fail for selected distributions. While I want to strongly commend the authors to show the limitations of their method, I would like to ask them to comment on whether it can be anticipated under which circumstances the unfolding can or will fail. This is especially important as once it is applied to real data, the truth is of course unknown.
6) The fact that it unfolds the dijet-invariant mass distribution so differently under cut III and cut IV, does that imply that good unfolding of the same distribution under the more inclusive cuts I and III is largely accidental? In the sense that subsets that enter the distribution in the more inclusive case are unfolded so much worse when applied to only the subset.
7) I would like to encourage the authors to not use nouns as verbs, eg. to GAN something is ill-defined. It is akin to stating "pocket-calculator an equation". Please, in each such case, use verbs to describes what one is supposed to do with the GAN (or else) to the respective object.
- Cite as: Anonymous, Report on arXiv:1912.00477v2, delivered 2020-02-04, doi: 10.21468/SciPost.Report.1478
The article "How to GAN away Detector Effects" describes a method to unfold detector effects on measured differential distributions using generative networks trained on Monte Carlo simulation.
In particular they show how using event-by-event information at parton-level and after detector simulation a bias due to the acceptance can be avoided.
This is, to my knowledge, a novel technique which certainly deserves to be published in this journal.
Below you can find some questions/comments on a few points that I found more difficult to read and understand, and some mistypings. As a general commeny I can add that my main difficulty is understanding the meaning and usage of MMD in the context of this work, but I will come to that point while going through the text.
page 2, just before (1): in “(1.FCGAN)” remove “1.”
page 2, last line: “external masses”: what is the meaning of “external” here?
page 3: Just before eq. (5) you write, about MMD you write “It allows us to compare pre-defined distributions, for instance the one-dimensional invariant mass of an intermediate particle”. So, the variables x and y in eq. (5) are masses or they can be anything? And if they can be anything, which ones did you choose? I think an example here would be useful. Maybe I am confused by the choice to use Gaussian or Breit-Wigner kernel functions, which made me think of a resonance. But the MMD is a general statistical test and therefore can be applied to anything, if I understood it correctly. Some further clarifications here would be very useful.
page 4: why \lambda_D << \lambda_G? Is there a clear motivation?
page 4: which MC parton shower did you use with Madgraph5?
page 4: “statistically independent, but otherwise identical sets of detector-level events” means an independent sample drawn from the same population of the one used for the training? If that is the case, I think the sentence is not very clear. I could understand it only after I reached the point where the unfolding is applied to a sample with different acceptance cuts.
page 4, last paragraph: concerning the batch size, were you limited to 512 by the memory size? Some additional information on the hardware setup you used would be an added value to the paper.
What do you mean by “the matching requirement”? You just wrote that batches are “independently chosen”, so I understood that there is no matching requirement.
page 5: the cuts in (7) and (8) are applied at detector level: in figure 4, the corresponding original parton-level distribution correspond to this acceptance at detector level. Even if it is interesting to check if GAN-inverted distributions reproduce the parton-level ones, from a practical point of vue it would be more useful to unfold measurements to a well defined phase space at parton-level, i.e. a fiducial region for the measurement. Maybe it is a different problem and requires a different approach, but have you considered it?
page 6: “a classification network could be improved through a variational feature in latent space”: can you add a reference? “a standard solution”: why “standard”?
page 7, eq. (9): “P_T” has not been defined. Should it be “P_p”?
page 7: “we do not build a conditional MMD loss”: does it mean that MMD is unchanged with respect to section 2 GAN or that it is not used (and then should be removed from figure 5)?
page 8: what does it mean “at the 90% level”? do you mean that differences are < 10%?
page 10: “the MMD loss is is not actually conditional”: remove an “is”.
page 10: “the standard implementations are somewhat inefficient”: can you give a reference for the “standard implementations”?
page 14: “implementations of such a the kernel”: remove “the”
page 15, first line: “A subtlety is is that”: remove an “is”
page 15: can you clarify better, with an example, what does it mean: “we augment the batches of true and generated invariant masses with one of conditional invariant masses, computed from the same detector information used to condition the generator and the discriminator”?
page 17: in ref 22 there is a mis-typing.
- Cite as: Anonymous, Report on arXiv:1912.00477v2, delivered 2020-02-01, doi: 10.21468/SciPost.Report.1474
This paper discusses the use of generative networks to deal with certain detector effects and argues that an event-level matching before and after detector simulation helps to reduce the bias in a number of observables, as illustrated for the case of semileptonic $WZ$ decays. The methodology is interesting and the presented case study serves as a nice proof of principle. However, the paper could really benefit from a revised introductory discussion, which fails to outline major assumptions and simplifications made by the authors. I will elaborate my concerns below and hope that a revised discussion can further improve the quality of the paper.
1. Unfolding methods are routinely used by experimenters to correct the data for detector effects in a way that aims to be minimally model dependent in order to allow for a direct comparison of the data to current and future theory calculations, without the need for computationally expensive detector simulation. To this end, the data are usually unfolded back to the *particle* level, constructed from physically meaningful colourless (i.e. fully hadronised) final-state particles. Correcting the data back to the quantum-mechanically ambiguous *parton* level would require a correction for non-perturbative effects in addition to the effects of the detector, thereby extrapolating over essentially everything we don’t fully understand about (soft) QCD. In order to ease the comparison of the particle-level data with more state-of-the-art fixed-order or resummed calculations, which are both at the parton level, the experiments might also publish additional non-perturbative correction factors, but they are (if at all) provided on a "use at your own risk" basis.
Presumably, this must be known to the authors and the decision to choose the extrapolation back to the parton level was just a pragmatic one. Nevertheless, I would advise that the associated caveats be acknowledged properly to avoid further criticism from unfolding experts in the community.
That said, the second-to-last paragraph of the introductory section seems to advocate an unfolding to the parton level as the preferred way to report measurement results. I can only hope the authors are not actually serious about this suggestion, which of course would severely reduce the usefulness of the data compared to current LHC cross-section measurements.
On the other hand, if the argument is meant to be directed at the LHC searches who typically report their results at the detector level, then I think that's a fair appeal to the experimental search groups, but also not entirely obvious from the current wording. Nevertheless, search results reported at the parton level would similarly be of limited use for reinterpretation purposes due to their intrinsic model dependence and so my criticism regarding parton vs particle level still applies.
2. Training a machine-learning algorithm to map between the detector-level and parton-level of a specific process is a technically interesting problem, but the setup presented in this paper is very simplified, brushing aside a lot of the difficulty encountered in a realistic unfolding problem.
The data are comprised of all sorts of processes contributing to a given final state and in a typical analysis most of the time is actually spent on understanding the reducible and irreducible background processes in order to correct the data for them. It is important to realise that these "background subtractions" can only be applied to the data on a stochastic basis. The data are effectively just scaled, but they remain a potpourri of different event topologies originating from different processes, all giving rise to the same final state of interest. The experimenter won’t have the luxury of a pure sample of $Z(\to\ell\ell)W(\to jj)$ events where a simple migration matrix — constructed from a single process — is sufficient to perform an unfolding. In reality, a lot of effort is also going into assessing the robustness of the chosen unfolding method against the unknown true process composition in the data and the analysis is optimised to reduce potential biases as much as possible.
Now, the methodology presented in this paper is mainly concerned with providing an alternative approach to correct for the bin-to-bin and out-of-acceptance migrations of the events in a given distribution, without really addressing the more laborious parts of the unfolding problem. There is much room for future studies in this direction of course, and so it would be good to at least acknowledge the simplifications made for this study to put things a little more into perspective.
3. If an experimental collaboration were to publish a paper saying they "generated events using Madgraph5", giving no additional information on the type of setup used whatsoever, they would be guaranteed to receive a deluge of abuse from their theory colleagues, well justified one might argue. In the interest of reproducibility at least, may I suggest you elaborate a little: What is the perturbative accuracy of your event sample? Is it a multi-leg setup? What parton shower was used? etc.
4. The event selection on page 4 is far too flimsy for my taste: Fiducial selection criteria are only specified for jets. What about the leptons? Presumably there would have to be an ATLAS-like lepton selection if you’re using the Delphes ATLAS card? Can you please state which charged lepton flavours you even consider? Can you also clarify whether the same fiducial selection is applied at both parton level and detector level?
5. On page 2, the acronym "GAN" is being introduced as "generative network", which then makes it a noun and so grammar aficionados will find the sentences where the acronym is being used as a verb somewhat painful to read ("we [generative network] only part of the phase space" for instance).