SciPost Phys. 12, 081 (2022) ·
published 3 March 2022

· pdf
We explicitly construct the quantum field theory corresponding to a general class of deep neural networks encompassing both recurrent and feedforward architectures. We first consider the meanfield theory (MFT) obtained as the leading saddlepoint in the action, and derive the condition for criticality via the largest Lyapunov exponent. We then compute the loop corrections to the correlation function in a perturbative expansion in the ratio of depth T to width N, and find a precise analogy with the wellstudied O(N) vector model, in which the variance of the weight initializations plays the role of the 't Hooft coupling. In particular, we compute both the O(1) corrections quantifying fluctuations from typicality in the ensemble of networks, and the subleading O(T/N) corrections due to finitewidth effects. These provide corrections to the correlation length that controls the depth to which information can propagate through the network, and thereby sets the scale at which such networks are trainable by gradient descent. Our analysis provides a firstprinciples approach to the rapidly emerging NNQFT correspondence, and opens several interesting avenues to the study of criticality in deep neural networks.
Johanna Erdmenger, Kevin T. Grosvenor, Ro Jefferson
SciPost Phys. 12, 041 (2022) ·
published 26 January 2022

· pdf
We investigate the analogy between the renormalization group (RG) and deep neural networks, wherein subsequent layers of neurons are analogous to successive steps along the RG. In particular, we quantify the flow of information by explicitly computing the relative entropy or KullbackLeibler divergence in both the one and twodimensional Ising models under decimation RG, as well as in a feedforward neural network as a function of depth. We observe qualitatively identical behavior characterized by the monotonic increase to a parameterdependent asymptotic value. On the quantum field theory side, the monotonic increase confirms the connection between the relative entropy and the ctheorem. For the neural networks, the asymptotic behavior may have implications for various information maximization methods in machine learning, as well as for disentangling compactness and generalizability. Furthermore, while both the twodimensional Ising model and the random neural networks we consider exhibit nontrivial critical points, the relative entropy appears insensitive to the phase structure of either system. In this sense, more refined probes are required in order to fully elucidate the flow of information in these models.
Johanna Erdmenger, Kevin T. Grosvenor, Ro Jefferson
SciPost Phys. 8, 073 (2020) ·
published 6 May 2020

· pdf
Motivated by the increasing connections between information theory and highenergy physics, particularly in the context of the AdS/CFT correspondence, we explore the information geometry associated to a variety of simple systems. By studying their Fisher metrics, we derive some general lessons that may have important implications for the application of information geometry in holography. We begin by demonstrating that the symmetries of the physical theory under study play a strong role in the resulting geometry, and that the appearance of an AdS metric is a relatively general feature. We then investigate what information the Fisher metric retains about the physics of the underlying theory by studying the geometry for both the classical 2d Ising model and the corresponding 1d free fermion theory, and find that the curvature diverges precisely at the phase transition on both sides. We discuss the differences that result from placing a metric on the space of theories vs. states, using the example of coherent free fermion states. We compare the latter to the metric on the space of coherent free boson states and show that in both cases the metric is determined by the symmetries of the corresponding density matrix. We also clarify some misconceptions in the literature pertaining to different notions of flatness associated to metric and nonmetric connections, with implications for how one interprets the curvature of the geometry. Our results indicate that in general, caution is needed when connecting the AdS geometry arising from certain models with the AdS/CFT correspondence, and seek to provide a useful collection of guidelines for future progress in this exciting area.