SciPost logo

SciPost Submission Page

Lecture Notes on Normalizing Flows for Lattice Quantum Field Theories

by Miranda C. N. Cheng, Niki Stratikopoulou

Submission summary

Authors (as registered SciPost users): Miranda C. N. Cheng
Submission information
Preprint Link: scipost_202509_00049v1  (pdf)
Date accepted: Oct. 15, 2025
Date submitted: Sept. 28, 2025, 6:15 a.m.
Submitted by: Miranda C. N. Cheng
Submitted to: SciPost Physics Lecture Notes
Ontological classification
Academic field: Physics
Specialties:
  • High-Energy Physics - Theory
Approaches: Theoretical, Computational

Abstract

Numerical simulations of quantum field theories on lattices serve as a fundamental tool for studying the non-perturbative regime of the theories, where analytic tools often fall short. Challenges arise when one takes the continuum limit or as the system approaches a critical point, especially in the presence of non-trivial topological structures in the theory. Rapid recent advances in machine learning provide a promising avenue for progress in this area. These lecture notes aim to give a brief account of lattice field theories, normalizing flows, and how the latter can be applied to study the former. The notes are based on the lectures given by the first author in various recent research schools.

List of changes

First we would like to thank the referees for the careful reading and helpful comments.

Adjustments Based on Comments from Referee 1 - Table on pg. 5 is not referenced from the text. It contains refernce to the gauge field which has not been introduced at tht point. We have added a reference.

  • Figure on pg.6 is not referenced in the text. We have added a reference.

  • Footnote 2 should appear after the full stop, not before. Corrected.

  • Pg.10, 6th line: "a topological observables A" → "a topological observable A"

Corrected.

  • Pg.12, 4th line after Eq.(33): "The corresponding classical solution is" →"The corresponding classical solutions are" Corrected.

  • Section 3.1, 3rd line: Ref. [35] is repeated Corrected

  • Pg.15, 1st line of last paragraph: "by a set trainable parameters" → "by a set of trainable parameters" Corrected

  • Pg.18, 2nd paragraph: "This property makes reverse KL is a convenient choice" → "This property makes reverse KL a convenient choice" Corrected

  • Figure 3 on pg.18 is not referenced in the text. We have added a reference.

-Section 3.3, 1st sentence: why "Another"? There were no other families mentioned. Changed into “A family of bijective transformations is the coupling flows… ”

-Figure 4 on pg.19 is not referenced in the text. Reference added.

-Pg.22, 2nd-to-last paragraph, 1st sentence: "Neural ODEs" →"neural ODEs" Corrected

-Eq.(83): remove the full stop Done

-Before eq.(93): "with ^μ denotes" → "with ^μ denoting" Corrected

  • Pg.30, 2nd-to-last line: the parameters ωμν are not defined. Reference added.

-Pg.34, figure 12 is not referenced in the text. In addition, the figure is not compatible with eq.(128) which it should probably illustrate? Reference added and role clarified.

-Pg.40, after eq.(40): "vanishes at the continuum limit" → "vanishes in the continuum limit" Done

-Three lines below: "in the continuous limit" → "in the continuum limit" Corrected.

  • Pg.45, last line: remove the "and", otherwise this is not a complete sentence. Removed.

-Section 5.3.2, 2nd-to-last sentence sounds strange Adjusted

-Pg.49, 1st line: "to be doubler-free Dirac-Wilson operator" → "to be the doubler-free Dirac-Wilson operator" Corrected

-Pg.49, footnote 9: "satisfy" →"satisfies" Corrected

  • Pg.50: "estimattion" → "estimation" Corrected

  • Pg.50, after eq.(210): repetition of "which" Corrected

  • Pg.53, line above eq.(220): "transofmration" →"transformation" Corrected

  • Section 6.2.1: the acronym "NVP" is not intoduced A footnote 10 is added to explain the acronym.

  • Pg.55, section 6.2.2, 3rd line of 3rd paragraph: repetition of "both" Corrected

  • Pg.59: 2nd line below eq.(246): "detaills"→ "details" Corrected

  • Refs.[59] and [65] are identical Corrected

  • Refs.[131] and [133] are identical Corrected
  • Refs.[122] and [134] are identical Corrected

Adjustments Based on Comments from Referee 2 Section 3. This section introduces NF. Overall, it provides a useful and enlightening overview. I have a few comments: Sec 3.1 1) the eq below eq 50: it is not clear whether the normalisation, i.e. the partition function Z, is to be included in the approximate equality between q() and p(). It is important to make this explicit, given the usual intractability to compute Z. A footnote is added to emphasize this point.

Sec 3.2 2) eq 51 is a bit unclear: phi appears on the LHS and z on the RHS, without any relation between them. In fig 1 and further down, this is clarified as: phi=f(z(0)), or phi=z(T), or phi=f(z). It would be good to make this precise early on, with unambiguous notation. (Eqs 53-55 help to clarify this.) “where $\phi=f(z)$. ” added.

3) from eq 52 we may deduce that Z, see comment 1), is not necessarily matched, since only log p and log q are matched, up to a constant. Can this be clarified? We believe the added footnote has addressed this.

4) eq 57: using this KL divergence requires sampling from the target distribution, which is the problem we are trying to resolve. Although this is mentioned further down, I would raise this already here as a possible concern. “ as opposed to the KL divergence,” added to stress this.

5) figs 3,4,5,6 (and maybe more) are not referred to in the text. Please refer to all figures and explain what is in them. References added.

6) fig 3: it is clear what mode collapse is and why it occurs for the reverse KL. Why the forward KL leads to the poor representation as in fig 3 is not so clear. Can this be explained better, or is fig 3 exaggerated?

Explanation added: When approximating a multi-modal distribution with forward KL optimization, underestimation of the probability mass in one of the mode is heavily penalized, while overestimation in areas between modes is tolerated. As a result, when the parametrization of the proposal probability is not expressive enough to properly represent multi-modality, forward KL optimization may place excessive probability mass in the low-probability valleys between modes, as shown in \cref{fig:kldiv}.

BTW: Fig3 has become fig4, now that a new caption has been added to a previous illustration.

7) figs 4, 5, 14 all show a building block of a transformation. But in figs 4, 14 'time' goes down and in fig 5 'time' goes up. Also the choices of boxes and actions is not the same. It would be helpful, especially in lecture notes (!), if a uniform graphical presentation is followed. Time direction in Fig5 (Now Fig6) changed.

8) fig 4 and the eqs next to it: I can see that the RHS is supposed to be the inverse of the LHS. But what about s_b and t_b? The eqs suggest that s_b(z_a) on the LHS and s_b(phi_a) are the inverses, and the same for t_b. Is this meant to be the case, or is the notation a bit sloppy? The eqn is meant to depict the inverse of the map $z\mapsto \phi$, for given functions s_b and t_b.

9) below eq 62: active and passive sub lattices. This is common in LFT with checkerboard-style algorithms, so I am not sure whether this is worse for the NF-based algorithms. Can this be clarified?

Comment added: Another drawback is, compared to some other normalizing flow architecture such as continuous normalizing flow, the coupling flows can have more limited expressivity due to the structure of the division of degrees of freedom into passive and active ones at each step.

10) eq 68: I am not sure whether I understood why this is a loss function. Is the meaning that the loss function is a function of z(t_f), whose form has not yet been given? Explanation: Indeed, the loss function is a function of z(t_f), which is defined as the solution of the differential equation (66) which is parameterized by the neural network through the “speed” v. In other words, it is obtained by integrating the NN-parametrized ODE.

11) eq 69: it may be worth stating that a(t) is independent of theta, which is used in eq 73 (?) Or is it not, looking at eqs 70 and 71 (?). Please clarify. Explanation: As we said, a(t) is a Lagrange multiplier, introduced to solve a constrained optimization problem. So indeed, from the outside it is a thing of its own and in particular independent of the network parameter, as in (68). The equations you are referring to are in the context of the proposition 1, in which the time evolution of a(t), and hence a(t) itself, are dependent on the network parameter through the ODE (69), which is parametrized by the neural network. We believe that this confusion stems from the same origin as in (10).

12) fig 6: 'a single call of the ODE solver', but still iterated during training, right? For the generation, a single forward call is needed for each new configuration? The sentence “From the proposition, we see that the gradient of the loss function \eqref{eqn:mainODE2} in terms of the solution to the ODE \eqref{eqn:mainODE} can be obtained by performing the following integration … (eqn 73) which can be done by one single call of the ODE solver. ” refers to the computation of the gradient of the loss. As usual, training requires multiple gradient steps and hence multiple rounds of evaluation of the gradients.

13) eq 81: the implementation of the symmetry leads to a specific form of weight sharing. Is this correct? Indeed. And this is completely analogous to the usual CNN. Section 6.

This section provides a useful, albeit quite formal, discussion of gauge symmetry and equivariance. It would be nice to see implementations with actual numerical experiments, but I can see that that is not in scope for these lecture notes. 14) the acronym NVP is never given in full. A footnote 10 is added to explain the acronym.

Sec 6.1 15) page 52: arbitrary element U, then the process is given for W. I assume this is meant to be U? Corrected.

16) eq 219: Lie algebra vs group element. Could it have been different, or am I missing the point? While we are not totally sure what the question is regarding “Lie algebra vs group element”, we checked and believe the equation is correct.

17) eq 220 and the discussion: it might help to add a figure? Sentence added: “See figure 2 of \cite{favoni_lattice_2022-1} for a pictorial depiction. “

Sec. 6.2 18) below eq 224: 'context functions': this concept has not been introduced yet. What does it mean? They are called I, but are not the identity. A few lines down, the I's are called features. This could be presented and explained a bit better. We’ve added a sentence “In the above, $I$ are functions, often referred to as context functions in the machine learning literature, containing contextual information that are used to determine the transformation. “

19) fig 14 is comparable to figs 4,5, see comment 7. But the starting point is U_mu(x), which is somewhere on the RHS, quite hidden. I would expect that quantity to appear at the top of the graph, for clarity. The figure is actually correct as a computational graph: the starting point is P_{\mu\nu}, which depends on U.

20) fig 15: for colour-blind people (like myself) the distinction between green and orange was challenging. Hence it took me longer than needed to realise one is really updating only the single vertical link. This then helped in understanding the difference between passive and frozen. I think fig 15 can be improved by putting this single link in the centre, and not the green plaquette. It is a single link in the center now. 21) i-ResNet has not been introduced in these terms (what is i?) i-ResNet is just the “product name”. We explain what it is conceptually and it doesn’t play an important further role in the lecture notes.

21) trivialising flow: 'first, he noted...'. We all know who 'he' is, but it is better to put the name. We replaced “he” with “the author of [122]”

22) eq 242: linear dependence --> nonlinear dependence. We actually mean linear dependence, as is depicted in eqn (242): the linear spans of different sets of trace operators have dependences.

Sec. 6.3 23) eq 246: p(U) depends on det DD^(U). To sample from p(U), how is the det incorporated? Explicit evaluations of the det are expensive. Below it is suggested that the flow model for the marginal distribution f_m(z) is easy, but this needs DD^(U). Please clarify.

The point of training the transformation $f_m$ is to avoid directly sampling from the complicated distribution involving the determinant. One only needs to evaluate/approximate the det on the samples proposed by the flow, instead on all the proposed samples along the MC process (which typically have low acceptance rate and are highly correlated). This is inline with the usual philosophy of “flow-based sampling as high quality proposal generators”. Section 7 While I share the enthusiasm with the author to explore these methods, the sentence 'prohibitive scaling of costs of traditional methods' is too strong, given that a) current LFT simulations take place at the physical point, taking the continuum limit; b) the scaling of NFs is far from settled and might be prohibitively expensive (!) So I would suggest to weaken this statement somewhat.

We’ve changed “the prohibitive scaling of costs” to “the steep scaling of costs”. I think the scaling and other potential challenges of AI-based methods are captured by “it is by no means an easy task”.

Current status:
Accepted in target Journal

Editorial decision: For Journal SciPost Physics Lecture Notes: Publish
(status: Editorial decision fixed and (if required) accepted by authors)


Reports on this Submission

Report #2 by Anonymous (Referee 1) on 2025-10-8 (Invited Report)

Strengths

The authors have taken on board the minor revisions suggested by both referees, while leaving the overall structure as it was. The manuscript can now be accepted for publication.

Report

Accept, useful for newcomers in the field with references to other work to gain a full understanding.

Recommendation

Publish (easily meets expectations and criteria for this Journal; among top 50%)

  • validity: high
  • significance: high
  • originality: good
  • clarity: good
  • formatting: excellent
  • grammar: perfect

Report #1 by Anonymous (Referee 2) on 2025-10-3 (Invited Report)

Report

I believe that the points raised by both referees have been addresses sufficiently well, hence the review can be published as is.

Recommendation

Publish (meets expectations and criteria for this Journal)

  • validity: -
  • significance: -
  • originality: -
  • clarity: -
  • formatting: -
  • grammar: -

Login to report or comment