SciPost logo

SciPost Submission Page

Jet substructure observables for jet quenching in Quark Gluon Plasma: a Machine Learning driven analysis

by Miguel Crispim Romão, José Guilherme Milhano, Marco van Leeuwen

Submission summary

Authors (as registered SciPost users): Miguel Crispim Romao
Submission information
Preprint Link: https://arxiv.org/abs/2304.07196v1  (pdf)
Code repository: https://gitlab.com/lip_ml/jet-substructure-observables-ml-analysis
Data repository: https://zenodo.org/record/7808000
Date submitted: 2023-05-02 16:38
Submitted by: Crispim Romao, Miguel
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Phenomenology
  • Nuclear Physics - Experiment
  • Nuclear Physics - Theory
Approach: Phenomenological

Abstract

We present a survey of a comprehensive set of jet substructure observables commonly used to study the modifications of jets resulting from interactions with the Quark Gluon Plasma in Heavy Ion Collisions. The \jewel{} event generator is used to produce simulated samples of quenched and unquenched jets. Three distinct analyses using Machine Learning techniques on the jet substructure observables have been performed to identify both linear and non-linear relations between the observables, and to distinguish the Quenched and Unquenched jet samples. We find that most of the observables are highly correlated, and that their information content can be captured by a small set of observables. We also find that the correlations between observables are robust to quenching effects and that specific pairs of observables exhaust the full sensitivity to quenching effects. The code, the datasets, and instructions on how to reproduce this work are also provided.

Current status:
Awaiting resubmission


Submission & Refereeing History


Reports on this Submission

Anonymous Report 2 on 2023-10-6 (Invited Report)

Strengths

1 - The presented work responds well to a clear need for ML study of high-level jet substructure observables and their relative strengths. It contributes to the discussion on efficient (targeted observable use) capturing the nature of interactions of jets within quark-gluon plasma created in high-energy nuclear collisions. The conclusions of the paper may help to steer experimental programs providing a remarkably concise conclusion that measurements "of more than a select few observables has a very limited added value".

2 - The manuscript constitutes a useful report since it presents a rather complete study of variety of observables, both IRC-safe and IRC-unsafe (relevant for completeness as the information on jet-qgp medium interactions and in-medium jet modifications - also referred to as jet quenching - span both regimes) with applications of jet grooming techniques that cover the span of current directions in the field and expose sensitivity to domains accessible and not accessible with perturbative techniques.

3 - The paper employs techniques that attempt to capture linear relation between observables (Principal Component Analysis) but also addresses their non-linear relations (via an autoencoder implementation). It also incorporates a boosted decision tree technique to isolate most disciminating observables for jet quenching and presents an informative "heat map" as a concise summary that can guide experimental research (at the limit of the model dependence and experimental difficulties in performing jet measurements in high energy heavy-ion collisions or the expected precision of the measurements that can be observable-dependent).

4 - The paper is well written with clear goals in mind. It does represent useful conceptual conclusions regarding the correlations of the jet quenching phenomena. The authors employed a thoughtful way of representing the numerical conclusions of the study.

5 - The authors have done a good job enabling the reproducibility of their analysis and the results.

Weaknesses

1 - While the paper focuses on the conceptual development to systematize the information regarding jet quenching based on high-level observables it does so only using a single model (called JEWEL). Some of the conclusions might be model dependent. In particular, this may be of importance for observables that are sensitive to the details of the energy/particle flux excited from the medium by the passage of an energetic jet (coined as QGP response) and - related to it - the quasi-particle (if assumed) implementation of the medium itself (driving the magnitude of quenching effects on a jet-by-jet basis).

2 - As authors note themselves the understanding of experimental applicability of the presented methodology and conclusions are beyond the scope of the paper. However, it remains somewhat important as of what of the presented work will, in the end, be conclusive or applicable at all within the experiments. Having that, not to ponder too hard on this point the work remains useful as it may provide some qualitative guidance for experiments.

3 - The conclusions from the PCA analysis are lacking the reflection on the relation of the variance with respect to the relevant information of jet quenching. Also, there is little discussion on the explainability of the findings. For example, in the case of the conclusion that dynamical grooming observables provide information not captured by angulatiry-type observables the paper lacks a discussion or follow-up on what this information would be. This is likely the consequence of general difficulty of interpretability in the methodology used. We are exposed with an observation of an effect w/o the explanation of it's cause. Nonetheless, the resulting conclusion that very limited number of observables can capture all medium-induced effects is valuable.

Report

In general, the paper satisfies criteria for publication. In particular, I believe the presented work touches on a previously-identified and long-standing research stumbling block - namely, respond to a quite obvious need within the field of rigorous analysis of usefulness of high-level observables for a particular purpose of jet quenching measurements. Moreover, it has a potential to open a new pathway in the existing research direction with enough promise for phenomenological, theoretical, and experimental follow-up work.

I have no need to go into details - I believe the paper fullfiled *all* of the six general acceptance criteria in SciPost Physics.

Requested changes

1 - The paper is well written and I do not request major changes. However, I'd urge the authors to deliver more discussion to the point #3 I raise in weaknesses.

2 - One minor point is that I have a difficulty of fully capturing the discussion of "correlations between observables are mostly robust to quenching effects". What does this actually mean? The text in this section does not really clarify the point of robustness to quenching. Perhaps authors could expand on this point. Perhaps making the point around the wording "resilient" (in the vicinity of meaning of 'robust') would be better.

  • validity: good
  • significance: good
  • originality: good
  • clarity: good
  • formatting: excellent
  • grammar: excellent

Anonymous Report 1 on 2023-9-28 (Contributed Report)

Strengths

1-Well-organized code and dataset, high reproducibility
2-Clear analysis with good interpretability
3-Clear representation, free of grammar errors

Weaknesses

1-Relatively elementary analysis
2-Little connection to current studies
3-Need to better motivate why more complicated methods are needed
4-Some results not fully included (e.g., accuracy)

Report

The manuscript aims to cover high-level jet sub-structure through machine learning analysis. Section 2 introduces the observables, or features used in the study. Section 3 covers the simulation details used to generate the jet dataset. Sections 4-6 present three different machine learning studies including Principal Component Analysis (PCA), neural network, and boosted tree. These lead to analysis in Quenched/Unquenched discrimination and the impact of quark-gluon plasma (QGP).

Overall, this is a "proof-of-principle" analysis. As the data is high-level, the conclusion is not surprising. While there is some good insight in the draft, the study feels simplistic and serves as a good starting point for further studies. Nonetheless, this is a well-written draft on a new topic. I would be happy to recommend it for publication with a few revisions.

Requested changes

Major:

1-In the introduction, why is a high-level data study relevant? What are the challenges and drawbacks of this consideration?

2-In Section 3, what's the motivation behind splitting the dataset equally?

3-In Section 6, briefly explain ROC and AUC to readers with less background in ML. Why were they chosen instead of just accuracy? What does a score of 0.7 mean?

4-In Sections 6 or 8, a comparison versus traditional methods for Quenched/Unquenched discrimination would be helpful to validate the motivation of the study.

Minor:

- Section 2, first paragraph. What are the assumptions behind the distributions when omitting information?
- Section 2.2, IRC never introduced.
- Section 3, "FastJet Contrib packages..." reformat with proper spacing.
- Section 3, "between in the different dynamical grooming"... remove "in".
- Add commas when necessary, for e.g., "that is, how they change between..."
- "For each, Unquenched and Quenched, sample we compute...," better write "For each sample, Unquenched and Quenched, we compute..."
- Section 4, "The principal component analysis only captures linear relations between observables, which can hide further non-linear relations, ...", using the word "hide" here is incorrect. PCA simply cannot detect non-linear effects.
- Section 5, "This bottleneck layer is usually referred to as the latent space ...", "bottleneck" has a different meaning here, better just used "hidden".
- "For this reason, we developed a hyperparameter optimisation loop using the python package optuna", can just say "we optimised model hyperparameter using the Python package optuna"
- Section 6, "we start by creating a strong baseline by training a BDT using all observables...", rephrase "we create a strong baseline by training ..."

  • validity: high
  • significance: good
  • originality: high
  • clarity: high
  • formatting: perfect
  • grammar: excellent

Login to report or comment