SciPost logo

SciPost Submission Page

Super-Resolving Normalising Flows for Lattice Field Theories

by Marc Bauer, Renzo Kapust, Jan Martin Pawlowski, Finn Leon Temmen

Submission summary

Authors (as registered SciPost users): Renzo Kapust
Submission information
Preprint Link: scipost_202502_00013v1  (pdf)
Date submitted: Feb. 8, 2025, 10:06 a.m.
Submitted by: Kapust, Renzo
Submitted to: SciPost Physics
Ontological classification
Academic field: Physics
Specialties:
  • Condensed Matter Physics - Computational
  • High-Energy Physics - Theory
  • Quantum Physics
Approach: Computational

Abstract

We propose a renormalisation group inspired normalising flow that combines benefits from traditional Markov chain Monte Carlo methods and standard normalising flows to sample lattice field theories. Specifically, we use samples from a coarse lattice field theory and learn a stochastic map to the targeted fine theory. The devised architecture allows for systematic improvements and efficient sampling on lattices as large as $128 \times 128$ in all phases when only having sampling access on a $4\times 4$ lattice. This paves the way for reaping the benefits of traditional MCMC methods on coarse lattices while using normalising flows to learn transformations towards finer grids, aligning nicely with the intuition of super-resolution tasks. Moreover, by optimising the base distribution, this approach allows for further structural improvements besides increasing the expressivity of the model.

Author indications on fulfilling journal expectations

  • Provide a novel and synergetic link between different research areas.
  • Open a new pathway in an existing or a new research direction, with clear potential for multi-pronged follow-up work
  • Detail a groundbreaking theoretical/experimental/computational discovery
  • Present a breakthrough on a previously-identified and long-standing research stumbling block
Current status:
Awaiting resubmission

Reports on this Submission

Report #2 by Anonymous (Referee 2) on 2025-5-2 (Invited Report)

Report

This work is a study of normalizing flows for lattice field theories, a class of ML methods explored in recent years to improve lattice Monte Carlo calculations. In particular, this work explores the use of "super-resolving" normalizing flow architectures to sample lattice scalar field theories, which employ an RG-inspired hierarchical structure. The work also proposes and studies a new avenue for improving model quality, optimizing the parameters of the base distribution alongside the flow model parameters, called "IR matching". They find that better performance can be obtained than with an unstructured continuous normalizing flow (CNF) architecture, especially in the symmetry-broken phase of the scalar field theory.

This is a nice piece of work overall. The statistical formalism is correct and (other than some notational issues and missing details) the presentation is clear and readable. The numerical results appear to be thorough and demonstrate that the proposed approach is effective for the problem treated. Absent a few details, the descriptions should be sufficient to reproduce the results. The paper goes out of its way to make physical interpretations of various features of the ML setup and results, which is interesting and informative.

There are two notable but excusable weaknesses to the paper: 1) the application to scalar field theory, and 2) the emphasis on the "super-resolving" structure as a novel aspect of this work. On 2), RG-inspired hierarchical architectures have been explored and discussed since the earliest studies of this subject: the Neural Network Renormalization Group (NNRG) paper was one of the very first flows for lattice papers. On 1), scalar field theory is a common testbed for these methods and excellent results have been obtained for it using many different flow model approaches; it is not a challenging theory to model, or sample from in the first place. Nevertheless, given the expense of testing in more complicated systems like non-Abelian gauge theories, it's still worth first checking whether an idea works in scalar theory. The most convincing demonstrations in scalar field theory are of methods that generalize without complications to gauge theories, but that is not the case for the hierarchical architectures discussed here. The generalization of this sort of approach to gauge theories is complicated, as explored in Ref [8].

However, the "IR matching" aspect of this work is novel so far as I know, generalizes immediately without any complications to any lattice field theory including QCD, and seems like a promising avenue for exploration to construct improved flow models. I don't think it's necessary to restructure the paper around this point (although I would not object if the authors wanted to do so), but I do think some adjustment of the framing and discussion of novelty would be good. This aspect of the work by itself is more than sufficient to meet the journal's acceptance criteria, once minor issues are addressed.

Some detailed comments and questions for the authors:

  1. As noted above, the paper needs some explicit discussion added of what are the novel aspects versus previous works. For example, it seems that the super-resolving architecture proposed here, where noise is fed in in hierarchical upscaling steps, is the same as in NNRG. Other than fine details, it seems that the new parts over NNRG are the use of an ODE flow, as well as the "IR Matching" part of the training procedure.

  2. What batch size is used for training? What batch size is used in ESS evaluations? (The ESS can be extremely inflated at small batch sizes, so this is important.) These should be noted somewhere.

  3. The ESS for a fixed standard flow (when it can be evaluated on multiple volumes) scales between volumes exponentially like $ESS(V) \approx ESS(V_0)^{V/V_0}$. For a plot like Fig. 5 with a standard flow, using this relation to rescale and compare all the curves at some fiducial volume $V_*$ will result in curve collapse (up to finite-volume effects at small volumes and effects due to volume averaging $\approx$ batch size). However, for these super-resolving flows, there is extra structure relating different volumes, which might change the picture. If you can think of some fair comparison, it would be very interesting to know whether volume scaling properties are qualitatively different for these super-resolving flows.

  4. Some comments are made about avoidance of mode collapse in the symmetry-broken phase, but as written, these conclusions seem to be based on the numerical value of the ESS. When evaluated on insufficient samples, the ESS can be artificially inflated when mode collapse occurs (it can appear to be finite when it should be $\approx 0$). Are you explicitly verifying that the model distribution is bimodal?

  5. The left inverse of the naive upsampling isn't unique, block spinning with any weights that sum to one will invert the operation just as well. Why doesn't this cause ambiguities in which Jacobian needs to be computed?

  6. Equation 20 introduces some additional breaking of translational symmetry: the noise added to the first $b^{d-1}$ sites is uncorrelated site-to-site, but the noise added to the last site is correlated with everything else. Is there anywhere else in the architecture that can learn to compensate for this, or is it necessarily inherited by the final model density?

  7. On page 8, it says $D^{(II)}_{KL}$ can be estimated using only Monte Carlo samples from the coarse lattice. Why is this so, if $\log Z_L$ and $\log Z_\mathcal{L}$ both appear in Eq. 33?

  8. Notational quibble: the flow map is only a diffeomorphism when considered as a flow between (coarse dof, noise dof) and (fine dof). As defined in Eq.~8, the super-resolving flow map $\mathcal{T}_\theta$ is not a diffeomorphism, so in Eq. 9, $\tilde{p}(\tilde{\phi})$ is not the push-forward of a density on the coarse degrees of freedom $\varphi$, it is a pushforward of a density over $(\varphi, \zeta)$. Similarly, $\log \det J_\mathcal{T}(\varphi)$ in Eqs. 29 and 32 should really be $\log \det J_\mathcal{T}(\varphi, \zeta)$. In a related point, Eq. 15 is technically correct as written, but it's maybe worth noting that because $\mathcal{T}^\dagger_\theta$ is many-to-one, $\tilde{p}{\tilde{\phi})$ is a constant over all $\tilde{\phi}$ that map to the same $\varphi$, and these constants tile the fine-grid manifold $\tilde{\phi}$.

Recommendation

Ask for minor revision

  • validity: high
  • significance: ok
  • originality: ok
  • clarity: high
  • formatting: perfect
  • grammar: excellent

Report #1 by Anonymous (Referee 1) on 2025-4-24 (Invited Report)

Report

Super-Resolving Normalising Flows for Lattice Field Theories

Marc Bauer, Renzo Kapust, Jan M. Pawlowski, Finn L. Temmen

The paper explores new ways to merge normalising flow, a machine learning method, with Markov chain Monte Carlo techniques in lattice field theory. The paper is interesting and very well written, and deserves to be published after addressing the issues below.

1) My main comment is very general. The following questions kept bothering me, until they were partly settled in Section V: how are the theories on the fine and the course lattice related, given that they are supposed to describe the same 'physics'; since lattice couplings depend on the ultraviolet cutoff scale, how is their relation ('flow') established? Do the transformations as the lattice volume increases keep the theory on a line of constant physics (LCP)?

This is partly addressed in Section V, but ideally the conceptual idea should be presented earlier.

1a) Let me go into detail now. The authors state that the physical size of the lattice is kept fixed and that the lattice spacing 'a' is reduced under their transformation. Couplings in the lattice action (i.e. at the UV scale) depend on the cutoff scale. In Section V, this is addressed, studying the flow of couplings, denoted as c[a]. It is then made clear that the theory on the coarse lattice is not known a priori, but has to be learned from the constructed theory on the fine lattice. This is a nontrivial conceptual step, which as I stated should be mentioned earlier. It is also slightly unsatisfactory, since it means we need to learn both the transformation moving to finer lattices, and the flow of couplings moving to coarser lattices. A pessimist might say that hence we know nothing on either side! An optimist would say that this problem is now solved. What do the authors say?

1b) What can be addressed better is whether during the entire scaling up of the lattice volume, from L = 4, 8, 16, 32, 64, 128, the theory is moving on a line of constant physics. This I could not deduce from the paper, i.e. need the couplings on the coarse lattice be redetermined for every volume, or is there indeed a trajectory in coupling space? If they need to be redetermined, does this make the calculation more and more expensive going to larger lattice volumes? It would be elegant to stay on the LCP.

2a) It is stated that to start with a lattice of size L=4 helps in building in correlations. I wonder how useful this is close to the critical point. On a 4x4 lattice the transition is hardly visible in e.g. the magnetic susceptibility. The critical coupling depends on the lattice size, $\kappa_c(L)$, which can be studied using finite size scaling. Hence close to the critical point, the correlations on 4x4 may be not representative to those on 128x128, and even in the 'wrong' phase, due to the L dependence of the critical coupling. As I understand from Fig 3, the transformation is learned from an ensemble which is not related to the ensemble on the fine lattice via the RG flow. I wonder whether the authors have studied (or can study) the dependence on the initial ensemble on the coarse lattice.

2b) Potentially related to this is fig 4b and the gap in the $\kappa_L$ (horizontal) direction. The gap seems to reflect the uncertainty of where the transition sits on the coarsest lattice. Does the gap gets smaller when starting from 8x8 instead of 4x4?

3) I appreciate the connection with the inverse RG, especially ref [5], and the discussion in App A. In ref [5] the flow is exactly in the opposite direction compared to in the body of the paper, and each (inverse) RG step takes one closer to the critical point, as mentioned by the authors. I wonder whether both approaches could be combined. Have the authors considered this?

4) Very briefly, $\sigma_\theta^2$ in eq 19 is just one parameter, or does it depend on steps in the algorithm?

Recommendation

Ask for minor revision

  • validity: -
  • significance: -
  • originality: -
  • clarity: -
  • formatting: -
  • grammar: -

Login to report or comment