SciPost logo

SciPost Submission Page

Forecasting Generative Amplification

by Henning Bahl, Sascha Diefenbacher, Nina Elmer, Tilman Plehn, Jonas Spinner

Submission summary

Authors (as registered SciPost users): Nina Elmer · Tilman Plehn · Jonas Spinner
Submission information
Preprint Link: https://arxiv.org/abs/2509.08048v3  (pdf)
Code repository: https://github.com/heidelberg-hepml/gan_estimate
Date submitted: Oct. 17, 2025, 1:58 p.m.
Submitted by: Jonas Spinner
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Phenomenology
Approaches: Computational, Phenomenological

Abstract

Generative networks are perfect tools to enhance the speed and precision of LHC simulations. It is important to understand their statistical precision, especially when generating events beyond the size of the training dataset. We present two complementary methods to estimate the amplification factor without large holdout datasets. Averaging amplification uses Bayesian networks or ensembling to estimate amplification from the precision of integrals over given phase-space volumes. Differential amplification uses hypothesis testing to quantify amplification without any resolution loss. Applied to state-of-the-art event generators, both methods indicate that amplification is possible in specific regions of phase space, but not yet across the entire distribution.

Author indications on fulfilling journal expectations

  • Provide a novel and synergetic link between different research areas.
  • Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
  • Detail a groundbreaking theoretical/experimental/computational discovery
  • Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
In refereeing

Reports on this Submission

Report #1 by Humberto Reyes-González (Referee 1) on 2025-12-20 (Invited Report)

Disclosure of Generative AI use

The referee discloses that the following generative AI tools have been used in the preparation of this report:

ChatGPT based on GPT-5.2 was used to
-help organize notes into a clear text
-assist with minor discussions on statistical interpretations.

Strengths

The paper presents a novel strategy to quantify generative amplification when the ground truth is not known. The construction is mathematically well defined and illustrated with toy models and physics examples.

Weaknesses

Clarification in particular on the exact interpretation of 'amplification' should be added.

Report

The paper introduces a strategy to quantify generative amplification when the ground truth is not known, defined via an effective sample size derived from two complementary approaches: average amplification and differential amplification. As the field of high-energy physics moves toward heavier usage of generative networks, quantifying generative precision is increasingly important. The work is relevant in this context.

The construction is mathematically well defined and illustrated with toy models and physics examples. The paper highlights the role of inductive bias and smoothing in generative models and provides a potentially useful performance-based diagnostic. However, several conceptual clarifications are needed to avoid misinterpretation, particularly from a statistical perspective. In this regard, I kindly ask the authors to address the following comments:

-What is meant by 'amplification' should be made explicit. From a purely frequentist point of view, amplification is not possible without introducing additional assumptions, and there is no amplification in the Fisher-information sense. The reported amplification arises entirely from inductive bias (priors, smoothness assumptions, architectural constraints, uncertainty modeling), not from additional information extracted from the data. This is implicit in the manuscript but the distinction should be stated explicitly to avoid misinterpretation.

-It should also be made explicit that the factor G measures performance under a specific, task-dependent metric and within a particular region of phase space. It quantifies variance reduction under a chosen discrepancy measure induced by model assumptions, rather than a general improvement of the learned distribution.

-Section 3.2 effectively probes a single parameter mu; this should be stated explicitly in the interpretation of the results. A simple counterexample where amplification is not obtained would be useful. As a stress test, the authors could repeat this example with very small training samples (e.g. n_train = 10).

-In this context, the interpretation of G would benefit from a discussion of the small-n_train regime, where uncertainty estimates may be prior-dominated and large values of G may arise even when the data weakly constrain the model.

-The statement that the likelihood ratio is the most powerful statistic according to the Neyman–Pearson lemma is strictly true only for known probability density functions. A classifier provides an approximation to a monotonic function of the optimal likelihood ratio, with optimality holding only asymptotically. This distinction should be clarified. Moreover, compressing multivariate data into a one-dimensional score can hide discrepancies present in the full space, and potential dimensionality issues should be mentioned.

-The use of KS tests on classifier outputs has been explored previously (see, e.g., Fig. 14 of arXiv:2305.14137), where more powerful alternatives are discussed. While asymptotic distributions are only known for univariate test statistics, multivariate tests can be calibrated via Neyman constructions (see arXiv:2511.09118, arXiv:2508.02275, arXiv:2409.16336). In particular, arXiv:2511.09118 discusses generative limits due to model mismodeling and is directly relevant here.

-All of the above mentioned references should be cited.

-Exact values of G should be quoted in the text, including cases where (G < 1).

Recommendation

Ask for minor revision

  • validity: high
  • significance: high
  • originality: high
  • clarity: good
  • formatting: excellent
  • grammar: excellent

Login to report or comment