SciPost Submission Page
Improved Pseudolikelihood Regularization and Decimation methods on Nonlinearly Interacting Systems with Continuous Variables
by Alessia Marruzzo, Payal Tyagi, Fabrizio Antenucci, Andrea Pagnani, Luca Leuzzi
This is not the latest submitted version.
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users):  Luca Leuzzi 
Submission information  

Preprint Link:  http://arxiv.org/abs/1708.00787v3 (pdf) 
Date submitted:  20180306 01:00 
Submitted by:  Leuzzi, Luca 
Submitted to:  SciPost Physics 
Ontological classification  

Academic field:  Physics 
Specialties: 

Approaches:  Theoretical, Computational 
Abstract
We propose and test improvements to stateoftheart techniques of Bayeasian statistical inference based on pseudolikelihood maximization with $\ell_1$ regularization and with decimation. In particular, we present a method to determine the best value of the regularizer parameter starting from a hypothesis testing technique. Concerning the decimation, we also analyze the worst case scenario in which there is no sharp peak in the tildedpseudolikelihood function, firstly defined as a criterion to stop the decimation. Techniques are applied to noisy systems with nonlinear dynamics, mapped onto multivariable interacting Hamiltonian effective models for waves and phasors. Results are analyzed varying the number of available samples and the externally tunable temperaturelike parameter mimicing real data noise. Eventually the behavior of inference procedures described are tested against a wrong hypothesis: nonlinearly generated data are analyzed with a pairwise interacting hypothesis. Our analysis shows that, looking at the behavior of the inverse graphical problem as data size increases, the methods exposed allow to rule out a wrong hypothesis.
Author comments upon resubmission
Reply to referee 1, Pan Zhang
We thank the referee for his review of our manuscript and his comments. We recognize that the previous version was actually lengthy and we made an effort to shorten the revised version.
Pan asks us to give comparisons against other methods, not based of the PLM. In previous papers, though (P. Tyagi et al. JSTAT 2015 and PRB 2016, Refs. [15,31] in the revised version), for pairwise interacting systems, we studied other estimators based on mean field approximations and compared their performances to those of the pseudolikelihood maximization estimators. With the PLM we obtained better performances even in the low sampling regime. As rightly pointed out by the referee in his comments, in this work we did not repeat the comparison to other standard methods but we concentrate on techniques based on the pseudolikelihood maximization. We, however, notice that, in the cases under study, in which strong nonlinearities and multibody interactions occur among variables, techniques other than PLMbased ones would be even more computationally demanding with respect to the cases already analyzed in Refs. [15,31]. To clarify this point to the readers we explicitly write, in the conclusions: “In a previous\cite{Tyagi16} work, the Pseudolikelihood algorithm proved to give much better performances with respect to mean field methods for continuous spin models. We have concentrated then on different possible approaches and implementations of the PLM estimator. The results showed that the algorithms are able to reconstruct the network of interactions, with higher accuracy close to the critical region, as well as the distributions of the couplings.”
We added in Fig. 9 caption the specifications of blue and red colours to correlations computed on differently inferred system networks.
Reply to referee 2, We thank the referee for his/her careful analysis of our work and we herewith reply to his/her comments.
1) The referee comments “it is unclear to me whether the authors’ main objective was to write their paper to report about progress about laser mode physics or about pseudolikelihood inference. Both are interesting, but the reader may have a hard time to understand from the current formulation what is specific to the details oh Hamiltonian (1) and what is more general here. Please clearly focus on one application, or state clearly what is generic.”
This is a very important issue, and we thank the referee for pointing out at it, since our original motivation (to understand random laser behaviour and/or nonlinear effects contribution to light propagation in random media) may have generated confusions about the machine learning methods that we analyze. The paper is about progress on pseudo likelihood inference. In particular, on systems whose interactions are not pairwise, but multibody. To clarify this, in the introduction, on the first paragraph of page 3 of the revised version we now write: “Even though our original motivation arises in the framework of nonlinear optics and laser physics [14,2024], in this work we will concentrate on stateoftheart techniques to solve inverse problems. Indeed, the techniques here presented can be applied to a large class of models with multibody interactions.”
Results are generic, but we test the methods on a set of models (diverse variables, qualitatively diverse network connectivities). These are introduced in an apart Section 2 “Test Model”, where we spend a few lines reporting how the models stem out from the study of nonlinearly interacting waves. However, we keep this part, interesting to both nonlinear optics and statistical mechanics (sub)communities, conceptually separated from the rest. To clarify this point we begin Sec. 2 with the head sentence: “In this section the models and the physical systems of interest are introduced. The interested reader can find more details in [14]. This paper is organized in such a way to let the reader interested only in inference techniques applied to nonlinear systems to skip this section.”
2) The referee asks “why the authors refer at several positions in the paper to random laser data, they actually work on simulated data. This raises some issues. For instance, is hypothesis (5) well justified? This is a confusing point as the authors seem to say that (5) is experimentally correct, while they also write right after equation (26) that there are cases where the distinction between zero and small couplings is not easy, which is the case for continuously distributed interactions.”
Actually, nowhere we claim that Eq. (5) is experimentally satisfied for all possible materials and applications or that it is the only possible distribution of couplings. We have added a small paragraph to explain more in details why we choose Eq. 5, as well as a Gaussian distribution, as probability distributions for the coupling costants: “ Because of the partial knowledge of modes localization and the very poor knowledge of the nonlinear response so far in experiments, the random values for the $J$s can be taken from any physically reasonable arbitrary probability distribution. We will then take couplings of a multibody interacting network, with number of coupling nodes $N_q$ scaling as $N_q \sim N^z$ with the number of variables $N$, as generated through a bimodal distribution, i.e., \begin{equation} P(J)=1/2[\delta(J\hat J)+\delta(J+\hat J)], \end{equation} with $\hat J=1/N^{(z1)/2}$, or through a Normal Gaussian distribution of mean square displacement $\sigma \sim \hat{J}$. Indeed, we can analyze the performance of the inference techniques for both discrete and continuously distributed couplings. “
Bimodal stands for a test on discrete couplings, Gaussian for a test on continuous couplings. We simulate both systems, and analyse both kind of data with the same techniques. Our aim is to show the performances of the methods treated in both cases, for the most general application.
3) The referee asks “Questions: What happens if the data are generated from Hamiltonians where (5) is not exactly satisfied?”
The PLM methods we expose reproduce the coupling values however generated. That is, actually, the aim: to provide a machine learning procedure to infer unknown couplings. Therefore we test the methods using Eq. (5) for the J values  thus Bimodal, using Gaussian distributions with zero average, using constant values of the couplings. On top of the values also the networks are random and we generated networks with different connectivities, scaling like N, N^2, or N^3. Also the local topologies we have adopted for the networks are of two different kinds: one is uniformly random (poissonianlike and , thus, termed RedosRenyi), one is based on a modelocking rule, associating quenched “frequencies” to the variables. The difference smoothens in the thermodynamic limit but it is rather important at finite N, as reported in Appendices A and B.
4) “What happens, more generally, if some random and small Hamiltonian is added to (1) when the data are generated? Section 5 is an attempt to answer partially this question in an extreme case, when the Hamiltonian used to infer parameters is blatantly different from the one used to generate the data. For the special case of pairwise couplings such as in (38), the authors find that all inferred couplings tend to 0 if the number M of data points is sufficiently high. Why is it so? why do not they get some complicated set of effective pairwise interactions, varying with M?”
What happens depends on the random Hamiltonian, that is unknown in the real world. If we have a 4body interacting system inferred via a 4body interacting model any 4body perturbation to it is automatically taken into account, as it is clear from the answer to question 3. If the original system has, instead, say, linear perturbations, or more generically is a “2+4” interacting model, than we need a 2+4 Hamiltonian (see, e.g., Refs. [20,21]) to “catch” all couplings. We did not go through this analysis in the present paper because it would be a lengthy analysis and requires expensive simulations to generate data, but we do not need to do that to display the power of PLMbased methods because the core of the result is already in the analysis of the blatantly wrong Hamiltonian of Sec. 5. There, we show that if we use a wrong Hamiltonian model, and we have enough data to analyze, the parameters of the wrong model are simply inferred to be zero. In other words, the inference procedure is able to find that the wrong model is wrong, it is not there. The method does not adjust things to adapt to the wrong model yielding “some complicated set of effective pairwise interactions”. Having proper experimental data (that we do not have), therefore, and using a generic model, say a combined 2+3+4 model combining, respectively, linearly interacting waves and non linearly interacting waves with both possible noncentrosymmetric potential (3body) and centrosymmetric potential (4body), would rule out inexistent contributions (setting them to zero) and only infer the values of existing contributions. With no priori knowledge. And the quality of this discrimination increases with M and with the range of external tuning variables such as the temperature. Indeed, the risky statistical artefact of inferring “some complicated set of effective pairwise interactions” appears to be washed out by the PLM if M is large enough.
5) Furthermore, the referee asks a “general question about the use of PLM close to a transition: there are general necessary conditions for the success of PLM, which worked completely worked out in the Ising case for instance see paper by Ravikumar, Lafferty, Wainwright. In particular, some susceptibility matrix must have a norm smaller than unity to avoid amplifying errors during the iterative maximisation of the pseudolikelihood. Are these conditions satisfied here, even above the transition temperature where the author operate (see Figure 4 for instance)?”
Concerning this point we added a small paragraph in Section "Improved Pseudo Likelihood Maximization with $l_1$ regularization: hypothesis testing" where we explained the requirements the Fisher Information Matrix needs to satisfy in order for PLM with l1reg to have a unique solution and to correctly reconstruct the neighbors of a node: "Note that, in order for the procedure to be consistent, the eigenvalues of the Fisher Information Matrix needs to be bounded from below. This condition together with the requirement that the entries of $\mathcal{I}^i_{a b}$ related to nonneighbors of $i$ cannot exercise an overly strong effect on the subset related to the neighbors of $i$ assures that the PLM with $l_1$ regularization has a unique solution and correctly reconstruct the neighbors of $i$ if enough number of samples are provided\cite{Ravikumar10}."
The first requirement is always checked since we evaluate the eigenvalues and, in order for the procedure to be consistent, they have to be bounded. The second requirement is checked a posteriori.
6) “section 4 is very hard to read as it is lengthy, and report many results of the inference procedure applied to a variety of cases. Could the author rewrite it in a more synthetic way, extracting only meaningful results and messages for the readers?”
We agree with the referee. We tried our best to shorten Sec. 4 and make easier to read in the revised version.
List of changes
In general the text has been sensitively shortened, as required by both referees.
Further changes are the following.
Introduction:
On page 3 we add the paragraph:
“Even though our original motivation arises in the framework of nonlinear optics and laser physics [14,2024], in this work we will concentrate on stateoftheart techniques to solve inverse problems. Indeed, the techniques here presented can be applied to a large class of models with multibody interactions.”
Sec. 2:
after Eq. (5) we modify the text as:
“ Because of the partial knowledge of modes localization and
the very poor knowledge of the nonlinear response so far in experiments, the random
values for the $J$s can be taken from any physically reasonable arbitrary probability distribution.
We will then take couplings of a multibody interacting network, with number of coupling nodes $N_q$ scaling as $N_q \sim N^z$ with the number of variables $N$, as generated through a bimodal distribution, i.e.,
\begin{equation}
P(J)=1/2[\delta(J\hat J)+\delta(J+\hat J)],
\end{equation}
with $\hat J=1/N^{(z1)/2}$,
or through a Normal Gaussian distribution of mean square displacement
$\sigma \sim \hat{J}$.
Indeed, we can analyze the performance of the inference techniques for both discrete and continuously distributed couplings. “
Sec. 3.2:
The paragraph has been added:
"Note that, in order for the procedure to be consistent, the eigenvalues of the Fisher Information Matrix needs to be bounded from below. This condition together with the requirement that the entries of $\mathcal{I}^i_{a b}$ related to nonneighbors of $i$ cannot exercise an overly strong effect on the subset related to the neighbors of $i$ assures that the PLM with $l_1$ regularization has a unique solution and correctly reconstruct the neighbors of $i$ if enough number of samples are provided\cite{Ravikumar10}."
Fig. 9 caption: color references added
Conclusions:
paragraph added: “In a previous\cite{Tyagi16} work, the Pseudolikelihood algorithm proved to give much better performances with respect to mean field methods for continuous spin models. We have concentrated then on different possible approaches and implementations of the PLM estimator. The results showed that the algorithms are able to reconstruct the network of interactions, with higher accuracy close to the critical region, as well as the distributions of the couplings.”
Current status:
Reports on this Submission
Anonymous Report 2 on 2018330 (Invited Report)
 Cite as: Anonymous, Report on arXiv:1708.00787v3, delivered 20180330, doi: 10.21468/SciPost.Report.401
Report
I think that the manuscript has been improved with respect to the previous version. Section 4, in particular, is more accessible, and some points (see reports) have been clarified.
Prior to publication, I would appreciate if the authors could again consider the comment I made in my initial report, that is, point 4 in their rebuttal letter. It is hard for me to understand why inference with a wrong model leads to vanishing couplings, and not to something complicated and essentially inconsistent, but that fits the data! I find the answer rather not convincing on this aspect. Let me be clear: I am not saying that the authors are wrong and their study is incorrect, I am simply saying that I do not understand their findings and that they do not present compelling arguments allowing me to get what is going on. If I take a bunch of data generated by model 1 and fit with model 2, I will get a set of meaningless parameters for model 2, fitting as best as possible the data. Why should these parameter vanish? It will depend on the expressive power of model 2 with respect to model 1, won't it?
I think the paper would benefit if some comments on this point could be inserted prior to publication.