SciPost Submission Page
Synthetic Data Generation with Lorenzetti for Time Series Anomaly Detection in High-Energy Physics Calorimeters
by Laura Boggia, Bogdan Malaescu
This is not the latest submitted version.
Submission summary
| Authors (as registered SciPost users): | Laura Boggia |
| Submission information | |
|---|---|
| Preprint Link: | https://arxiv.org/abs/2509.07451v1 (pdf) |
| Date submitted: | Sept. 10, 2025, 3:42 p.m. |
| Submitted by: | Laura Boggia |
| Submitted to: | SciPost Physics Proceedings |
| Proceedings issue: | The 2nd European AI for Fundamental Physics Conference (EuCAIFCon2025) |
| Ontological classification | |
|---|---|
| Academic field: | Physics |
| Specialties: |
|
| Approaches: | Experimental, Computational |
The author(s) disclose that the following generative AI tools have been used in the preparation of this submission:
ChatGPT (GPT-5 and GPT-4, free version) for suggestions regarding language for writing of the article.
Abstract
Anomaly detection in multivariate time series is crucial to ensure the quality of data coming from a physics experiment. Accurately identifying the moments when unexpected errors or defects occur is essential, yet challenging due to scarce labels, unknown anomaly types, and complex correlations across dimensions. To address the scarcity and unreliability of labelled data, we use the Lorenzetti Simulator to generate synthetic events with injected calorimeter anomalies. We then assess the sensitivity of several time series anomaly detection methods, including transformer-based and other deep learning models. The approach employed here is generic and applicable to different detector designs and defects.
Current status:
Reports on this Submission
Report #1 by Anonymous (Referee 1) on 2025-10-31 (Invited Report)
The referee discloses that the following generative AI tools have been used in the preparation of this report:
GPT-5 (mini) used to help with formatting and polishing the report and the requested changes
Report
Different deep learning approaches are investigated and compared to a simple baseline. Various anomaly labeling strategies are explored to handle anomalies in multidimensional outputs. The study also examines the impact of different numbers of overlapping proton-proton collisions (pileup) on model performance.
The manuscript is transparent about its limitations, particularly that some anomalies correlated with the physics signal remain challenging for the tested methods. These challenges are primarily due to constraints of the underlying simulation framework rather than the methods themselves. The study should therefore be seen as a proof-of-concept.
Overall, it is a well-written and timely submission.
Requested changes
Content: 1- Introduction: cite other relevant recent literature, e.g., arXiv:2501.13789. 2- Section 2.1 (Lorenzetti showers): include a citation for the ATLAS experiment; clarify how the limitations in the framework affect the final time series data (e.g., for the truth-level simulation in the penultimate sentence). 3- Section 2.3: clarify the dimensionality (N) of the final time series. 4- Results: briefly discuss or speculate why the deep learning–based approaches struggle compared to the simple baseline.
Minor / Formatting: 5- Ensure the abstract is a single block of text without line breaks. 6- Explain technical terms where possible (pileup, ESD, TranAD, USAD, MCC), or otherwise minimize their use. 7- Dataset sizes: clarify the mismatch (64k training jets + 35k testing jets ≠ 100k). 8- References: provide a report number for Ref. 4 and DOI/arXiv information for Ref. 8. 9- Conclusion formatting: ensure the “Lorenzetti” text fits within the page margins.
Recommendation
Ask for minor revision

Author: Laura Boggia on 2025-11-01 [id 5980]
(in reply to Report 1 on 2025-10-31)Hello,
thank you very much for your valuable comments!
They were very useful and I now implemented these changes as well as possible while respecting the length limitations. The revised version will appear on arXiv in the next few days and I'll then re-submit it here on SciPost.
Best regards,
Laura