Synthetic Data Generation with Lorenzetti for Time Series Anomaly Detection in High-Energy Physics Calorimeters

Laura Boggia; Bogdan Malaescu

SciPost Submission Page

Synthetic Data Generation with Lorenzetti for Time Series Anomaly Detection in High-Energy Physics Calorimeters

by Laura Boggia, Bogdan Malaescu

This is not the latest submitted version.

Submission summary

Authors (as registered SciPost users):

Laura Boggia

Submission information
Preprint Link:	https://arxiv.org/abs/2509.07451v1 (pdf)
Date submitted:	Sept. 10, 2025, 3:42 p.m.
Submitted by:	Laura Boggia
Submitted to:	SciPost Physics Proceedings
Proceedings issue:	The 2nd European AI for Fundamental Physics Conference (EuCAIFCon2025)

Ontological classification
Academic field:	Physics
Specialties:	High-Energy Physics - Experiment
Approaches:	Experimental, Computational

Disclosure of Generative AI use

The author(s) disclose that the following generative AI tools have been used in the preparation of this submission:

ChatGPT (GPT-5 and GPT-4, free version) for suggestions regarding language for writing of the article.

Abstract

Anomaly detection in multivariate time series is crucial to ensure the quality of data coming from a physics experiment. Accurately identifying the moments when unexpected errors or defects occur is essential, yet challenging due to scarce labels, unknown anomaly types, and complex correlations across dimensions. To address the scarcity and unreliability of labelled data, we use the Lorenzetti Simulator to generate synthetic events with injected calorimeter anomalies. We then assess the sensitivity of several time series anomaly detection methods, including transformer-based and other deep learning models. The approach employed here is generic and applicable to different detector designs and defects.

Current status:

Has been resubmitted

Reports on this Submission

Report #1 by Anonymous (Referee 1) on 2025-10-31 (Invited Report)

Disclosure of Generative AI use

The referee discloses that the following generative AI tools have been used in the preparation of this report:

GPT-5 (mini) used to help with formatting and polishing the report and the requested changes

Report

The submission motivates the application of time-series anomaly detection methods on synthetic high-energy physics (HEP) data. Such techniques could significantly aid data quality monitoring in particle physics experiments by identifying new types of detector issues and reducing the human expert workload.

Different deep learning approaches are investigated and compared to a simple baseline. Various anomaly labeling strategies are explored to handle anomalies in multidimensional outputs. The study also examines the impact of different numbers of overlapping proton-proton collisions (pileup) on model performance.

The manuscript is transparent about its limitations, particularly that some anomalies correlated with the physics signal remain challenging for the tested methods. These challenges are primarily due to constraints of the underlying simulation framework rather than the methods themselves. The study should therefore be seen as a proof-of-concept.

Overall, it is a well-written and timely submission.

Requested changes

Content: 1- Introduction: cite other relevant recent literature, e.g., arXiv:2501.13789. 2- Section 2.1 (Lorenzetti showers): include a citation for the ATLAS experiment; clarify how the limitations in the framework affect the final time series data (e.g., for the truth-level simulation in the penultimate sentence). 3- Section 2.3: clarify the dimensionality (N) of the final time series. 4- Results: briefly discuss or speculate why the deep learning–based approaches struggle compared to the simple baseline.

Minor / Formatting: 5- Ensure the abstract is a single block of text without line breaks. 6- Explain technical terms where possible (pileup, ESD, TranAD, USAD, MCC), or otherwise minimize their use. 7- Dataset sizes: clarify the mismatch (64k training jets + 35k testing jets ≠ 100k). 8- References: provide a report number for Ref. 4 and DOI/arXiv information for Ref. 8. 9- Conclusion formatting: ensure the “Lorenzetti” text fits within the page margins.

Recommendation

Ask for minor revision

validity: high
significance: good
originality: good
clarity: high
formatting: excellent
grammar: excellent

Author: Laura Boggia on 2025-11-01 [id 5980]

(in reply to Report 1 on 2025-10-31)

Category:

remark

Hello,
thank you very much for your valuable comments!
They were very useful and I now implemented these changes as well as possible while respecting the length limitations. The revised version will appear on arXiv in the next few days and I'll then re-submit it here on SciPost.
Best regards,
Laura

SciPost Submission Page

Synthetic Data Generation with Lorenzetti for Time Series Anomaly Detection in High-Energy Physics Calorimeters

by Laura Boggia, Bogdan Malaescu

This is not the latest submitted version.

Submission summary

Abstract

Current status:

Reports on this Submission

Report #1 by Anonymous (Referee 1) on 2025-10-31 (Invited Report)

Report

Requested changes

Recommendation

Author: Laura Boggia on 2025-11-01 [id 5980]

Login to report or comment