SciPost Submission Page
Simulation-Prior Independent Neural Unfolding Procedure
by Anja Butter, Theo Heimel, Nathan Huetsch, Michael Kagan, Tilman Plehn
Submission summary
| Authors (as registered SciPost users): | Theo Heimel · Nathan Huetsch · Tilman Plehn |
| Submission information | |
|---|---|
| Preprint Link: | https://arxiv.org/abs/2507.15084v1 (pdf) |
| Date submitted: | Oct. 14, 2025, 4:27 p.m. |
| Submitted by: | Nathan Huetsch |
| Submitted to: | SciPost Physics |
| Ontological classification | |
|---|---|
| Academic field: | Physics |
| Specialties: |
|
| Approaches: | Computational, Phenomenological |
Abstract
Machine learning allows unfolding high-dimensional spaces without binning at the LHC. The new SPINUP method extracts the unfolded distribution based on a neural network encoding the forward mapping, making it independent of the prior from the simulated training data. It is made efficient through neural importance sampling, and ensembling can be used to estimate the effect of information loss in the forward process. We showcase SPINUP for unfolding detector effects on jet substructure observables and for unfolding to parton level of associated Higgs and single-top production.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Awaiting resubmission
Reports on this Submission
Strengths
- It is well written and the need for unfolding and a new solution is made clear
- Benefits of generative unfolding are clear
Weaknesses
- Lacking practical implementation details, such as discussion on what you are mitigation with the ensemble (a key point of the paper).
Report
Unfolding is definitely an important topic in the field and a nice description is given to explain the benefits of generative unfolding methods. The idea and description of the method itself are clear, however on how this is implemented or used in practice it could do with more explanation. Especially in the Ensembling methodology there seem to be missing some more crucial bits of information. Some more detailed comments can be fond below.
In the introduction in the second paragraph it is written that “removing the detector or other, low-energy aspects of the forward simulation” While in general I would not call it ‘removing the detector’, this is a detail. I do not understand what is meant with “low-energy aspects of the forward simulation”. Does this refer to the detector now being able to measure this low-energy aspects? Because then it is related to the detector itself again and I am not sure why it is singled out.
In the second paragraph you give an overview of traditional matrix based method, but miss out quite a few. Instead of referencing all of them individually I would reference https://inspirehep.net/literature/1762378
In the fourth paragraph you write that “Traditional unfolding techniques mitigate this prior dependence through Itera- tive Bayesian Unfolding” This is just not right…. Just because there are two methods that do this, it is not generally used. Actually at the LHC (especially CMS) it is hardly (never) used. Tikhonov (which singular value decomposition is also a form of) or un-regularised methods are way more used. By referencing the paper i mentioned above you mitigate some of the discussion on this here, but I would rewrite this sentence to not single any of these types of regularisation out as THE “traditional” way.
In the introduction of section two it is written that “we assume that p(xreco|xpart) is correctly described by the simulators.” It is true that this in general is done for all unfolding methods, however I would like to stress here that your further claims that you cannot be biased by the simulation are therefore not fully correct. This is not a problem, and it is still true that this method minimises the bias, however you should not oversell it either and rather say this explicitly.
On page 5 I really appreciate figure 1, it really helped me understand the way the networks are related.
On page 5, below equation (11) it is explained that we have an efficient importance sampling distribution as long as pθ (xpart) is not too far from its simulation counterpart. I don’t quite understand the importance of this statement. Either this means your unfolding only works if your simulation is accurate, and that, once again, there’s a bias introduced by the use of the simulation (again not a problem, but should not be denied either), or it needs to be specified why this is not a problem for the algorithm.
Just below that it is written that “unlike for generative unfolding” which confuses me since this is a generative unfolding method, no?
On page 6 below equation (13) it is written that “we have cross-checked that”. How was it checked? You need a few more sentences to explain this issue.
On page 7 the Ensembling method is introduced. In general I think it is missing a reference here. Moreover, I am confused by the description of the lower bound. Aren’t you mostly interested in the limit and hence the upper bound? Mostly here you need a clearer description on how YOU are using ensembling and what uncertainty you are trying to estimate with this. What is varied/sampled? Which parameter in your formulas does it effect?
This comes back for example in your Gaussian example (page 8 below equation 20) where you explain that for Fig 2 you used an ensemble of 32 networks, what is sampled here? How do these networks differ? Why did you pick 32?
Again this comes back in section 5.1 on page 13, where you mention training an ensemble of networks. What are they? What does it test? What is varried in this case?
On page 9 under figure 3 you explain that eco and part-level should be the same. From the formulas above you can see this should be the case by construction. This makes me wonder, what are you testing then? Is this a closure test to check you didn’t screw up the implementation?
On page 10 below equation (26) you explain for the first time that SPINUP does not recover the correct truth. Is this here using the esembled step or only one step? Also, did you leave out the last iteration of your algorithm? This is why I think you should mention more explicitly above that you need to be close to the truth in your data, which here you are not (as is clear in fig 4). This shows exactly the limitations of your model and making that clearer will mean people might use the method in the correct way rather than finding it too scary and not trying to use it.
On page 16 in the Results part you write “αtrue ± 15”. Do you mean in the region [-15,15]?
On page 17 above figure 11 you say the “the uncertainty is dominated by the spread of the minima positions “. I believe this deserves a bit more discussion. This is not a feature you want (non-overlapping parabola’s) so it’s worth it to discuss why you’re not worried about this (because you take the ensemble later I presume). Otherwise this really scares of anyone reading it.
On page 18 in section 5.3 it reads “measured reco-level events”
Do you mean measured data or reco-level MC?
Finally, in your outlook and discussion section on page 18-19 I think you can be a bit more honest and careful with your statements in general. I think it would be more likely to be used as a method if you can more clearly indicate its limits. For example in the first paragraph of page 19 it is written that “we find that the ensemble maps out the space of possible part-level distributions that map to the same reco-level distributions, again confirming the effectiveness of our ensembling procedure. “
However, if I didn’t misunderstand it didn’t…. there was no gap between the solutions in the space that was clearly incorrect, instead you just have a massive uncertainty. Which kind of negates the point you’re trying to make.
Similarly in the paragraph below you write “violating a fundamental assumption of unfolding. “ Again this isn’t technically true, it only violates an assumption of your unfolding method.
In the introduction in the second paragraph it is written that “removing the detector or other, low-energy aspects of the forward simulation” While in general I would not call it ‘removing the detector’, this is a detail. I do not understand what is meant with “low-energy aspects of the forward simulation”. Does this refer to the detector now being able to measure this low-energy aspects? Because then it is related to the detector itself again and I am not sure why it is singled out.
In the second paragraph you give an overview of traditional matrix based method, but miss out quite a few. Instead of referencing all of them individually I would reference https://inspirehep.net/literature/1762378
In the fourth paragraph you write that “Traditional unfolding techniques mitigate this prior dependence through Itera- tive Bayesian Unfolding” This is just not right…. Just because there are two methods that do this, it is not generally used. Actually at the LHC (especially CMS) it is hardly (never) used. Tikhonov (which singular value decomposition is also a form of) or un-regularised methods are way more used. By referencing the paper i mentioned above you mitigate some of the discussion on this here, but I would rewrite this sentence to not single any of these types of regularisation out as THE “traditional” way.
In the introduction of section two it is written that “we assume that p(xreco|xpart) is correctly described by the simulators.” It is true that this in general is done for all unfolding methods, however I would like to stress here that your further claims that you cannot be biased by the simulation are therefore not fully correct. This is not a problem, and it is still true that this method minimises the bias, however you should not oversell it either and rather say this explicitly.
On page 5 I really appreciate figure 1, it really helped me understand the way the networks are related.
On page 5, below equation (11) it is explained that we have an efficient importance sampling distribution as long as pθ (xpart) is not too far from its simulation counterpart. I don’t quite understand the importance of this statement. Either this means your unfolding only works if your simulation is accurate, and that, once again, there’s a bias introduced by the use of the simulation (again not a problem, but should not be denied either), or it needs to be specified why this is not a problem for the algorithm.
Just below that it is written that “unlike for generative unfolding” which confuses me since this is a generative unfolding method, no?
On page 6 below equation (13) it is written that “we have cross-checked that”. How was it checked? You need a few more sentences to explain this issue.
On page 7 the Ensembling method is introduced. In general I think it is missing a reference here. Moreover, I am confused by the description of the lower bound. Aren’t you mostly interested in the limit and hence the upper bound? Mostly here you need a clearer description on how YOU are using ensembling and what uncertainty you are trying to estimate with this. What is varied/sampled? Which parameter in your formulas does it effect?
This comes back for example in your Gaussian example (page 8 below equation 20) where you explain that for Fig 2 you used an ensemble of 32 networks, what is sampled here? How do these networks differ? Why did you pick 32?
Again this comes back in section 5.1 on page 13, where you mention training an ensemble of networks. What are they? What does it test? What is varried in this case?
On page 9 under figure 3 you explain that eco and part-level should be the same. From the formulas above you can see this should be the case by construction. This makes me wonder, what are you testing then? Is this a closure test to check you didn’t screw up the implementation?
On page 10 below equation (26) you explain for the first time that SPINUP does not recover the correct truth. Is this here using the esembled step or only one step? Also, did you leave out the last iteration of your algorithm? This is why I think you should mention more explicitly above that you need to be close to the truth in your data, which here you are not (as is clear in fig 4). This shows exactly the limitations of your model and making that clearer will mean people might use the method in the correct way rather than finding it too scary and not trying to use it.
On page 16 in the Results part you write “αtrue ± 15”. Do you mean in the region [-15,15]?
On page 17 above figure 11 you say the “the uncertainty is dominated by the spread of the minima positions “. I believe this deserves a bit more discussion. This is not a feature you want (non-overlapping parabola’s) so it’s worth it to discuss why you’re not worried about this (because you take the ensemble later I presume). Otherwise this really scares of anyone reading it.
On page 18 in section 5.3 it reads “measured reco-level events”
Do you mean measured data or reco-level MC?
Finally, in your outlook and discussion section on page 18-19 I think you can be a bit more honest and careful with your statements in general. I think it would be more likely to be used as a method if you can more clearly indicate its limits. For example in the first paragraph of page 19 it is written that “we find that the ensemble maps out the space of possible part-level distributions that map to the same reco-level distributions, again confirming the effectiveness of our ensembling procedure. “
However, if I didn’t misunderstand it didn’t…. there was no gap between the solutions in the space that was clearly incorrect, instead you just have a massive uncertainty. Which kind of negates the point you’re trying to make.
Similarly in the paragraph below you write “violating a fundamental assumption of unfolding. “ Again this isn’t technically true, it only violates an assumption of your unfolding method.
Requested changes
- Fix introduction in accordance with comments in report.
- Fix implementation information in accordance with comments in report.
- Fix discussion on limitations of the methods in accordance with comments in report.
Recommendation
Ask for major revision
Strengths
- The paper is well written, clearly elucidating the problem statement and the method adapted to solve that.
- The SPINUP method appears robust, and the details of how potential pitfalls were fixed is appreciated.
- Multiple examples, covering different physics processes have been presented to validate the method.
Weaknesses
- While the method itself seems well constructed, I have a concern about the applicability of it, please see the report.
Report
The paper addresses one of the classical problems in particle physics, the good old unfolding by using machine learning. While there have been a substantial body of work, the authors proposed a new approach. While the approach seems technically sound, it relies on minimising the difference between the detector level data and MC. This inherently assumes that the detector level MC is an accurate representation of the data. While in an ideal world, where MC modelling is perfect, this is fine, but we know MC mis-modelling is a real feature for many common processes and extreme phases spaces. Therefore this method will certainly introduce a bias, and since for many measurements (where unfolding is the most relevant) we want to probe MC mis-modelling, this bias is rather undesirable.
This drawback, unless I severely misunderstood the method limits the usefulness of the paper. However, I feel the method itself can be of use in other cases. Therefore I feel it is worth publishing once the authors clarify the above point, but probably Scipost Physics Core may be a better choice.
This drawback, unless I severely misunderstood the method limits the usefulness of the paper. However, I feel the method itself can be of use in other cases. Therefore I feel it is worth publishing once the authors clarify the above point, but probably Scipost Physics Core may be a better choice.
Requested changes
- Please address the above concern about the usefulness of the method as it seems to be based on minimising the difference with detector level MC.
- Abstract: the method is not just limited to the LHC, it can be useful for any collider experiment?
- I feel truth level or reco level (and in figure sim) reads like slang/shorthand, can the authors consistently use detector level, particle level, or parton level, as the case may be?
- In the sam vein, can the authors please not use part-level?
- "While NIS significantly eases its computational cost", can this be quantified?
- For the JSS dataset (section 4), it is known that the tune does affect the distributions. While for this proof of principle demonstration, it does not matter if the latest and the greatest tune has been used, please cite tune 26 (what tune is it?) and clarify what standard tune mains for H7.
Recommendation
Accept in alternative Journal (see Report)
