SciPost Submission Page
Accurate Surrogate Amplitudes with Calibrated Uncertainties
by Henning Bahl, Nina Elmer, Luigi Favaro, Manuel Haußmann, Tilman Plehn, Ramon Winterhalder
This is not the latest submitted version.
Submission summary
| Authors (as registered SciPost users): | Henning Bahl · Nina Elmer · Luigi Favaro · Tilman Plehn · Ramon Winterhalder |
| Submission information | |
|---|---|
| Preprint Link: | scipost_202412_00037v1 (pdf) |
| Date submitted: | Dec. 19, 2024, 12:17 p.m. |
| Submitted by: | Ramon Winterhalder |
| Submitted to: | SciPost Physics |
| Ontological classification | |
|---|---|
| Academic field: | Physics |
| Specialties: |
|
| Approaches: | Theoretical, Computational |
Abstract
Neural networks for LHC physics have to be accurate, reliable, and controlled. Using surrogate loop amplitudes as a use case, we first show how activation functions can be systematically tested with KANs. For reliability and control, we learn uncertainties together with the target amplitude over phase space. Systematic uncertainties can be learned by a heteroscedastic loss, but a comprehensive learned uncertainty requires Bayesian networks or repulsive ensembles. We compute pull distributions to show to what level learned uncertainties are calibrated correctly for cutting-edge precision surrogates.
Author indications on fulfilling journal expectations
- Provide a novel and synergetic link between different research areas.
- Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
- Detail a groundbreaking theoretical/experimental/computational discovery
- Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Reports on this Submission
Report #2 by Anonymous (Referee 2) on 2025-6-9 (Invited Report)
- Cite as: Anonymous, Report on arXiv:scipost_202412_00037v1, delivered 2025-06-09, doi: 10.21468/SciPost.Report.11369
Report
scattering amplitudes. As an example, the loop-induced scattering amplitude $g g \rightarrow \gamma \gamma g$ is considered.
The authors distinguish statistical uncertainties and systematic uncertainties.
By definition, statistical uncertainties vanish in the limit of infinite training data, systematic uncertainties are related
for example to the network architecture.
The major part of the manuscript is very well written: Section 2 gives a very good introduction to learned uncertainties,
section 3 defines the concrete scattering amplitude and the neural networks considered, section 5 and 6 present the results
for the systematic and statistical uncertainties, respectively.
The only section, which appears rather unmotivated and breaks the flow of reading, is section 4 on KAN amplitudes.
It consists of two parts, one introductory parts for Kolmogorov-Arnold networks and a second part on the study of activation
functions.
The manuscript could profit if the authors include between section 2 and 3 a further section (in the style of section 2),
where they briefly introduce multi-layer perceptron networks, deep sets networks, geometric algebra transformer networks
and the concept of activation functions.
The introductory parts for Kolmogorov-Arnold networks can also be part of this section and the authors could motivate a little
bit more why they would like to study the difference between fixed activation functions and learned activation functions.
Afterwards they could continue with the previous section 3 "Amplitude data and network architectures" and the remaining
part "Activation functions" of the previous section 4.
This would have the advantage that it clearly divides the manuscript on the one hand into a general introductory part,
which does not depend on scattering amplitudes, and on the other hand the concrete study of the $g g \rightarrow \gamma \gamma g$
scattering amplitude.
Minor comments:
Page 2, second paragraph: one reference appears as "?".
Section 3: The full name for the abbreviations MLP and GATr can be given as well.
Section 4: ReLU, GELU, leakyReLU: these are standard functions for machine learning, but for readers from particle physics
the full name will be helpful. The authors might also consider providing the definitions in an appendix.
Requested changes
See above.
Recommendation
Ask for minor revision
Report #1 by Anonymous (Referee 1) on 2025-5-27 (Invited Report)
- Cite as: Anonymous, Report on arXiv:scipost_202412_00037v1, delivered 2025-05-27, doi: 10.21468/SciPost.Report.11273
Strengths
The mathematical derivations are precise and pedagogical, quite in contrast to much of the ML literature in the field, and will benefit a wider audience.
Because HEP is unique in the depth of mathematical modelling, new non-HEP ML techniques rarely satisfy requirements on uncertainty quantification. This work is an exception as it fills several gaps in this regard, and opens a door for making future algorithms uncertainty-aware.
The studies are comprehensive and probably generalize much beyond the considered examples.
Weaknesses
The language of the main body of the text should be improved, and often does not sufficiently carry the reader through the rather intricate reasoning. This can be seen implicitly in the various small comments I attach below and the general remarks.
A simple way to phrase my main comment would be that the current text requires the reader to have already digested the ideas and the main developments of the paper. This can not be expected. The text should address the reader at a much earlier level by providing relevant context and guidance at each step. This does NOT mean to inflate the jargon (of which there is enough) or merely the word count (which is good), but to use precise and lean language that carries meaning in a more condensed, logical, and structured way.
Report
Requested changes
General remarks:
The reader is not sufficiently prepared in the abstract and introductory sections to follow the reasoning. Because I believe a paper should be comprehensible on the first pass, I advise the author to carefully re-read the manuscript and, in each paragraph, ask whether it is clear to the reader where we are going, what to expect, and what the next steps are. In particular, there should be carefully introductory paragraphs that explain in words what the developments are.
The long derivations are a strength, but I believe it is distracting to the narrative. The general derivations should go to an appendix, and the main body of the text should discuss the specifics, adopting a precise language free from jargon. For example, most of the derivations of Sec. 2.2, 2.3, 2.4, and 4 are not necessary to understand the results.
As an exemplary comment:
Types of uncertainties should be defined more precisely. There is a section on p4, but the reader is not prepared for the (in hindsight) very general derivation. On the other hand, it is left unclear what the exact relation is to the statistical and experimental systematic uncertainty in HEP language, as well as what the ML domain refers to as aleatoric and epistemic. While the derivation is not too long, I do not think that so much is gained from the rule of total uncertainty. Instead, the reader should be able to understand from the text how each component is understood.
Detailed comments:
Abstract:
(General) I believe the abstract should describe what the goal of the paper is, what tools are used, and what the result is.
Currently, the sentences are not logically connected unless the reader has already read the paper.
L3 KANs were not introduced.
L5 comprehensive"ly"
"precision surrogates" is unclear
p2
Paragraph 3.
"amplifying" "essentially interpretable" are jargon and unclear
Paragraph 5. Where does the percent-level requirement come from? This is not true in general and highly application specific.
Paragraph 6. "tested assumption" what does this mean?
p5
After Eq. 13 sentence not finished.
p7 "In reality" is too colloquial and does not add information
p8 "Realistically, ..." same.
p9 "define the underlying problem." A problem is set or solved, but not defined.
Sec 4. The choice for KANs is not sufficiently explained, here or in the introduction.
p13, one but last paragraph. This is a typical example of my comments. It is not clear to the reader or explained (as far as I see) why a flatter activation function is desirable. This leaves the reader with the impression they are not on the same page as the author.
Fig. 4. The imperfections at low f_smear are interesting. Please add general remarks on implications or limitations in general. Clearly, this is a hint of a ceiling for any such methods and of general interest.
p18 "Bayesianize" is entirely unclear
Recommendation
Ask for major revision
