SciPost logo

SciPost Submission Page

Generative Unfolding of Jets and Their Substructure

by Antoine Petitjean, Anja Butter, Kevin Greif, Sofia Palacios Schweitzer, Tilman Plehn, Jonas Spinner, Daniel Whiteson

Submission summary

Authors (as registered SciPost users): Sofia Palacios Schweitzer · Antoine Petitjean · Tilman Plehn · Jonas Spinner
Submission information
Preprint Link: https://arxiv.org/abs/2510.19906v2  (pdf)
Code repository: http://github.com/heidelberg-hepml/high-dim-unfolding
Date submitted: Nov. 10, 2025, 5:55 p.m.
Submitted by: Antoine Petitjean
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Experiment
  • High-Energy Physics - Phenomenology
Approaches: Computational, Phenomenological

Abstract

Unfolding, for example of distortions imparted by detectors, provides suitable and publishable representations of LHC data. Many methods for unbinned and high-dimensional unfolding using machine learning have been proposed, but no generative method scales to the several hundred dimensions necessary to fully characterize LHC collisions. This paper proposes a 3-stage generative unfolding framework that is capable of unfolding several hundred dimensions. It is effective to unfold the jet-level kinematics as well as the full substructure of light-flavor jets and of top jets, and is the first generative unfolding study to achieve high precision on high-dimensional jet substructure.

Author indications on fulfilling journal expectations

  • Provide a novel and synergetic link between different research areas.
  • Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
  • Detail a groundbreaking theoretical/experimental/computational discovery
  • Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Awaiting resubmission

Reports on this Submission

Report #1 by Anonymous (Referee 1) on 2026-1-14 (Invited Report)

Strengths

  1. The paper is very clearly and concisely written and introductory sections and the rest of the content as well as the summary are very well balanced.
  2. The problem setup up (variable-length and high-dimensional) unfolding is an important challenge in the field.
  3. The staging "multiplicity->kinematics->constituents" is pragmatic and, as far as I know, a novel approach.
  4. The physics motivated pre-processing is is state-of-the-art.
  5. The architecture comparison with the various flavors of equivariance is highly interesting.

Weaknesses

Please see the requested changes.

Report

The paper is well suited for the journal and I recommend publication after the following two items have been addressed.

Requested changes

  1. A possible prior dependence is noted (Eq. 5), but apparently no attempt is made to even quantify it (let alone correct - which is maybe not a requirement). I believe this should be done. All these methods are only as strong as their weakest element, and more often than not do we encounter showstoppers when moving from demonstrators (like this work) to real-world applications.
  2. The performance evaluation should be more elaborate and go beyond 1D comparisons. In particular, the separation of multiplicity from kinematics warrants more detailed studies. Waat happens if the former is off but not the latter and/or vice versa? This could be probed with a C2ST test on the unfolded quantities. Can an independent classifier tell apart the unfolded data from the training data when that should be very close? And which aspects? I don't mean ML-based performance evaluation as a binary metric (success or failure of the method) - there will always be tails where the algorithm doesn't work; but as a tool to quantify how much of the bulk phase-space is accessible to unfolding and in which regions more work is needed. I appreciate the tau_21 remark, but follow up on this.
  3. Consider making the top quark data set available on Zenodo or reference it otherwise. "Upon request from the authors" probably does not satisy the journal's criteria: "Provide (directly in appendices, or via links to external repositories) all reproducibility-enabling resources: explicit details of experimental protocols, datasets and processing methods, or processed data and code snippets used to produce figures, etc."

Recommendation

Ask for major revision

  • validity: top
  • significance: top
  • originality: top
  • clarity: top
  • formatting: perfect
  • grammar: perfect

Login to report or comment