SciPost Submission Page
MadNIS  Neural MultiChannel Importance Sampling
by Theo Heimel, Ramon Winterhalder, Anja Butter, Joshua Isaacson, Claudius Krause, Fabio Maltoni, Olivier Mattelaer, Tilman Plehn
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users):  Theo Heimel · Joshua Isaacson · Claudius Krause · Tilman Plehn · Ramon Winterhalder 
Submission information  

Preprint Link:  scipost_202303_00033v1 (pdf) 
Date accepted:  20230904 
Date submitted:  20230325 10:10 
Submitted by:  Winterhalder, Ramon 
Submitted to:  SciPost Physics 
Ontological classification  

Academic field:  Physics 
Specialties: 

Approaches:  Computational, Phenomenological 
Abstract
Theory predictions for the LHC require precise numerical phasespace integration and generation of unweighted events. We combine machinelearned multichannel weights with a normalizing flow for importance sampling, to improve classical methods for numerical integration. We develop an efficient bidirectional setup based on an invertible network, combining online and buffered training for potentially expensive integrands. We illustrate our method for the DrellYan process with an additional narrow resonance.
Author comments upon resubmission
We are grateful to the referees for their time and thoughtful feedback. We have integrated several suggestions into the text and fixed small typos. Below we give a detailed overview of all changes, including our responses to the referees' comments and questions:
Referee 3:

We added the definition of the unit hypercube where it appears first on page 3.

We have added these references and now cite them in the text accordingly.

As the raw network output is unconstrained before normalization, the righthand side of Eq.(19) allows for negative channel weights. While this is mathematically allowed and fulfills all necessary requirements for the channel splittings, the weights indeed lose their interpretation as probabilities. We also found that this normalization is numerically unstable and hence used the presented normalization on the lefthand side of Eq.(19) which includes a softmax activation function. We added two more lines to the text to further clarify this.

There was indeed a typo in Eq. (23) and we fixed this in the new version.

As mentioned in the text, the inclusion of channels with zero events during training causes errors in the optimization. To be precise, as each channel is covered by its own normalizing flow, each flow expects an inputtensor with a length > 0 to be able to calculate gradients. If, however, no events are sampled into a certain channel, the inputtensor to the corresponding normalizing flow is None and hence causes a runtime error. We could ignore this channel if no events are sampled into it, however, this might cause that this channel will never be populated again even though this might have only been a fluctuation during optimization. Hence, we decided to make sure each channel is always populated at least by some fraction of events to prevent instabilities. In contrast, after optimization, where we expect these fluctuations to be averaged away, we can safely ignore channels that are not populated during integration and hence no runtimeerror occurs.

The rotations we implemented are proper rotations in the R^n. Since they are described by elements of SO(n), they are not distorting features (the transformations are angle preserving) on this space and they are defined for any angle, not necessarily close to the identity. Note, that these rotations are only properly defined on R^n and not on the unithypercube, where the rotations would be illdefined in general. In order to combine these soft permutation with a flow acting on a compact space, letâ€™s assume for instance [0,1]^n, these features have to be mapped onto the full R^n before applying. For instance, a possible combination would be a LogitSoftPermutationSigmoid transformation. However, in this case, rotations which are not multiplicities of Pi/2 will indeed cause distortion near the boundaries of the unit hypercube.

This is indeed an idea we consider in a followup project where we try to consider possible symmetries between different channels which could potentially decrease the computational overhead.

As each channel is handled individually and governs its own phasespace mapping as well as its own normalizing flow, we do not expect severe problems in scenarios where phasespace cuts affect different channels in a different way.

At the time being, the paper was not meant to serve as benchmarking of standard MG5aMC versus a ML supplemented scenario but meant to introduce new concepts which are relevant to improve phasespace integration and which has not been covered in previous applications. A proper and more detailed comparison between MadNIS and standard MadGraph is subject to a current followup project.

This is indeed a typo. We have fixed this in the new version.
Referee 2:
 Conceptually, this is similar to two channelmappings when in fact three channel mappings would be required to cover all features. This is a scenario we have tested and is illustrated in the right panel of Fig. 11. Indeed, interferences usually also come with a negative effect which might alter the total integrand differently. However, we do not see any limitations to our approach in this case. In addition, the introduction of the overflow channel captures all structures that have not been learned properly by the other channels, as shown in the ring example. We are currently working on a madgraphspecific followup in which we will further investigate this and many other questions on several standard processes. If there are any other concerns or ideas the referee thinks would be interesting to look at, please let us know and we can investigate this further in the followup paper. We would also like to stretch again that this manuscript here is meant to be conceptual and serves as a proofofconcept for several new ideas which go way beyond what has been done before in the literature. Hence, we think the manuscript as it is fully serving this purpose and does not need any more additional examples being tested to illustrate our advances.
List of changes
Except for the changes already mentioned above we further updated and fixed the following:
1. There was a typo in the definition of stratified sampling in Eq.(17) which has been fixed now.
2. There was a smalle error in the ring mapping both in Eq.(48) as well as in the code. We fixed this and rerun
our code with the corrected mapping. We updated the righthand side of Table 2 and Figure 9 accordingly.
Published as SciPost Phys. 15, 141 (2023)