SciPost Submission Page
Step: a tool to perform tests of smoothness on differential distributions based on Chebyshev polynomials of the first kind
by Patrick L. S. Connor, Radek Žlebčík
This is not the latest submitted version.
This Submission thread is now published as
Submission summary
Authors (as registered SciPost users):  Patrick Connor 
Submission information  

Preprint Link:  https://arxiv.org/abs/2111.09968v2 (pdf) 
Code repository:  https://gitlab.cern.ch/step/library 
Data repository:  https://gitlab.cern.ch/step/library 
Date submitted:  20211129 11:07 
Submitted by:  Connor, Patrick 
Submitted to:  SciPost Physics 
Ontological classification  

Academic field:  Physics 
Specialties: 

Approach:  Computational 
Abstract
We motivate and describe a method based on fits with Chebyshev polynomials to test the smoothness of differential distributions. We also provide a headeronly tool in C++ called STEP to perform such tests. As a demonstration, we apply the method in the context of the measurement of inclusive jet doubledifferential cross section in the jet transverse momentum and rapidity at the Tevatron and LHC. This method opens new possibilities to test the quality of differential distributions used for the extraction of physics quantities such as the strong coupling.
Current status:
Reports on this Submission
Anonymous Report 2 on 202227 (Invited Report)
 Cite as: Anonymous, Report on arXiv:2111.09968v2, delivered 20220207, doi: 10.21468/SciPost.Report.4332
Strengths
1) The idea of examining the smoothness of a differential distribution of jet related observables, such that they can be easily incorporated in PDF fits etc, is certainly a very useful one. Such tests have been extensively performed by the ATLAS and CMS experiments recently, and DO and CDF previously, especially in the context of searches of narrow (and wide) resonances decaying to jets, see for example a review :
R. M. Harris and K. Kousouris, “Searches for dijet resonances at hadron colliders”, Int. J.
Mod. Phys. A 26 (2011) 5005, doi:10.1142/S0217751X11054905, arXiv:1110.5302.
2) The idea to provide a tool to be able to perform such tests quickly and in an automated way is certainly commendable, and should be pursued and developed further with the addition of more functional forms and tests to determine the needed number of freely floating parameters.
Weaknesses
Specific comments and questions on the current paper draft
1) Have you examined other families of functions, as for example in the review above (eq. 47, 48, 49) , to perform the same tests? If not, and there is a specific reason why only Chebyshev polynomials are utilized, please state this in the paper.
2) Please provide the probabilities of fits, rather than only chisquare/ndf, in order to be able to assess how likely these descriptions of the data are.
3) Have you tried utilizing the Fisher Test to determine how many parameters to use in each function, and when to stop? If not, and given that this test is widely used in searches to test how many parameters to use in the functions describing the background, please consider it, and state if what you do is equivalent and/or superior.
Fisher, R. A. (1922). "On the interpretation of χ2 from contingency tables, and the calculation of P". Journal of the Royal Statistical Society. 85 (1): 87–94. doi:10.2307/2340521. JSTOR 2340521.
4) How do you address the possible inadequacy of the function you utilize to describe the experimental data? In searches often the pulls (datafit/data statistical uncertainty) of the distributions/fits are also examined to see if there are continuous runnings that would indicate that a different function or more parameters should be added. See for example Fig.5 of
CMS Collaboration ,"Search for high mass dijet resonances with a new background prediction method in protonproton collisions at $\sqrt{s} =$ 13 TeV", JHEP 05 (2020), 033
and Fig.1 of
ATLAS Collaboration, “Search for new phenomena in dijet events using 37 fb −1 of pp √ collision data collected at s =13 TeV with the ATLAS detector” ,Phys. Rev. D96 (2017) 052004, doi:10.1103/PhysRevD.96.052004, arXiv:1703.09127.
Report
If the points above are addressed I would recommend publishing in this journal.
Requested changes
The points above listed as "Weaknesses" would be good to be addressed and the relevant and needed information to be incorporated in the paper.
Anonymous Report 1 on 202217 (Invited Report)
 Cite as: Anonymous, Report on arXiv:2111.09968v2, delivered 20220107, doi: 10.21468/SciPost.Report.4160
Strengths
1 The underlying idea is very simple
Weaknesses
1 The motivation for the paper is questionable
2 The details of the methodology are unclear
3 The conclusion and examples are unclear
4 The results are quite modest
Report
The paper presents an algorithm for fitting spectra with Chebyshev polynomials, that is presented as a "test of smoothness". The main motivation for this appears to have to do with PDF fits.
First of all, it should be mentioned that the motivation is quite weak: firstly, available evidence suggests that the problems in PDF fits the Authors refer to have to do with properties of the experimental covariance matrix (specifically its being illconditioned) so it is unclear that smoothness has anything to do with them. Moreover, given that PDFs are related to data through a convolution it is unclear that data smoothness could possibly be a problem.
Coming to the method itself, what it amounts to is simply an iterative fit of Chebyshev polynomials. This is a very simple idea and it is unclear that having implemented it in a piece of code is worthy of publication.
The motivations for the use of Chebyshev polynomials appear to be very generic: on pag. 2 the authors list a couple desirable properties of these polynomials, but it is unclear that other orthogonal polynomials wouldn't have similar or perhaps better properties. So the specific choice appears to be somewhat haphazard.
Furthermore, there are many specific points that make both the details of the methodology and its proposed uses somewhat obscure:
1) at point 3 on pag. 3 as a stopping criterion it is mentioned that $\chi^2$ is "compatible with the number of degrees of freedom". What does "compatible" mean? What is the exact criterion?
2) on pag. 5 a "procedure of early stopping" is mentioned. But this was never mentioned previously! What is this early stopping?
3) The authors talk about a "test" but what they present is simply a sequence of fits. Presumably, the test would amount to comparing the fit results to some criterion? It is completely unclear what the test is.
Finally, the discussion of results is somewhat disappointing: the authors end up discussing qualitative features of the spectra that they display, but it is not at all clear what the added value of their proposed "test" is.
In summary, I do not think that this paper contains enough original material or results to justify a publication in a scientific journal.
Author: Patrick Connor on 20220613 [id 2577]
(in reply to Report 1 on 20220107)
Dear referee,
thank you very much for reviewing our paper. Please find answers point by point herebelow.
The paper has undergone significant changes since your first review.
Best regards, Patrick Connor & Radek Zlebcik
First of all, it should be mentioned that the motivation is quite weak: firstly, available evidence suggests that the problems in PDF fits the Authors refer to have to do with properties of the experimental covariance matrix (specifically its being illconditioned) so it is unclear that smoothness has anything to do with them. Moreover, given that PDFs are related to data through a convolution it is unclear that data smoothness could possibly be a problem.
The smoothness of the distribution is certainly not a sufficient condition for the usability of the data in a QCD interpretation. However, it is a necessary condition. Indeed, by construction of the QCD fits, where the PDFs at a starting scale are first evolved and consequently convoluted with the partonic cross section, the resulting spectra must be smooth, as both operations smear the orginal PDF. In the QCD fits, steps or outliers in the experimental spectrum that are not covered by the statistical uncertainties will lead to larger 𝜒2/ndf values for the data set, sometimes even prohibiting the use of the experimental data at all.
In the new version of the paper, we now demonstrate the smoothness of the predictions by repeating our test of smoothness on Asimov data, which have, by construction, central values identical with the theoretical model. As expected, the fit performance of Asimov data is at least as good as and often significantly better than the that of the real data.
We have now also clarified what we mean with covariance matrix, which describes the statistical and systematic correlations among the bins of the truth level distribution, to make sure that it is not confused with the response matrix, which is used in the unfolding procedure to describe the migrations from the truth level to the detector level. The response matrix, indeed known to often be illconditioned, is sometimes called correlation matrix, which can lead to the confusion.
Coming to the method itself, what it amounts to is simply an iterative fit of Chebyshev polynomials. This is a very simple idea and it is unclear that having implemented it in a piece of code is worthy of publication.
We believe that such tool is useful for experimentalists to test the quality of their data distributions before their release. In fact, since our first submission, colleagues from the ATLAS Collaboration have already started to use our technique to investigate their data (Ref. [18] in the new version of the paper draft).
In the context of inclusive jet measurement, it is an alternative to tests of smoothness using QCD fits, without the danger of biasing toward the theory. In the paper draft, we have shown that it can help to identify steps corresponding to trigger thresholds. Another application consists in testing the procedure of unfolding, as a similar fit performance is expected before and after the procedure (although the fitted smooth function is always different). For instance, a typical mistake performed in the procedure of unfolding is to underestimate the uncertainties from the simulated data, used for the response matrix and for the estimation of migrations through the edges of the phase space. Unfortunately, this cannot be illustrated directly since only unfolded (i.e. truth level) data are public. On the other hand, certain uncertainties may be overestimated, as is probably the case for the flat 1% bintobin uncorrelated uncertainty added in the 8TeV inclusive measurement from the CMS Collaboration; if experimentalists had had such a test available at that time, they may have been able to better identify the origin of the deviations from a smooth behaviour before the unfolding (e.g. steps due to triggers, as illustrated in the paper draft), without any need to increase the experimental uncertainty. It would lead to higher impact of the data in the global PDF fits.
We have significantly modified Section 3, trying to clarify all these points.
The motivations for the use of Chebyshev polynomials appear to be very generic: on pag. 2 the authors list a couple desirable properties of these polynomials, but it is unclear that other orthogonal polynomials wouldn’t have similar or perhaps better properties. So the specific choice appears to be somewhat haphazard.
We have also tested alternative orthogonal bases, such as Legendre polynomials, which seem to work too; instead, the use of standard polynomials (1, 𝑋, 𝑋^2, etc.) results in limited stability of the iterative procedure. We have now mentioned in the paper draft the possibility to use alternative bases, and adapted the C++ implementation to allow the user to use alternative polynomial bases.
However, the aim of this paper is not to find all polynomial bases that work. For the purpose of our test, which is to find a function allowing to factor out the global shape of the spectrum and to focus on the description of the scattering of the data points around a smooth behaviour, it is sufficient to find only one family of polynomials.
Furthermore, there are many specific points that make both the details of the methodology and its proposed uses somewhat obscure: 1. at point 3 on pag. 3 as a stopping criterion it is mentioned that χ2 is “compatible with the number of degrees of freedom”. What does “compatible” mean? What is the exact criterion? 2. on pag. 5 a “procedure of early stopping” is mentioned. But this was never mentioned previously! What is this early stopping? 3. The authors talk about a “test” but what they present is simply a sequence of fits. Presumably, the test would amount to comparing the fit results to some criterion? It is completely unclear what the test is.
 A 𝜒2 distribution is expected to be centred at the number of degrees of freedom and its variance is expected to be twice the number of degrees of freedom (𝑘). With “compatible”, we mean that we expect the following inequality to be satisfied: 𝜒2−𝑘<√2𝑘. This is now clarified in the text.
 The procedure of (early) stopping is explicitly described in the definition of the fit algorithm. Since the former version of the paper draft, we have implemented two more early stopping criteria, one based on Ftest, and another based on cross validation. We have added detailed descriptions and discussions on the outputs of the different early stopping criteria.
 We “test” whether the scattering of the points around a smooth function is described by the uncertainties (typically the statistical uncertainties). This smooth function includes one hyperparameter describing the number of fit parameters. A bad fit performance (e.g. 𝜒2/𝑘≈4) for a large value of this hyperparameter would indicate that the uncertainties alone cannot cover the deviations. For instance, in the absence of bintobin partial correlations (e.g. D0, CDF), this would typically appear as outliers or steps. In the present of such correlations (e.g. CMS, ATLAS), introduced for instance through the procedure of unfolding, a direct interpretation in terms of outliers or steps is no longer guaranteed, but can still indicate whether the provided uncertainties are sufficient.
We have significantly changed the text to clarify these points.
Finally, the discussion of results is somewhat disappointing: the authors end up discussing qualitative features of the spectra that they display, but it is not at all clear what the added value of their proposed “test” is.
The approach followed by experimentalists to test the quality of their data is to perform a QCD interpretation with software such as xFitter. However, this assumes that the theory reasonably describes the data within certain systematic uncertainties. Large 𝜒2 values are then difficult to interpret, since they may have various origins. With our approach, we factor out possible sources of tensions related to underestimated uncorrelated uncertainties and to possible issues inherent to the theory curves. The test of smoothness focuses on the quality of the data at any stage of the experimental analysis. The smoothness is a very weak—hence powerful—assumption.
Publishing an experimental analysis is a long process that easily spans over a few years within a collaboration. Once the data have been published, it is generally too late to investigate issues in the experimental analysis. By publishing our method, we want to provide an additional tool for our present and future colleagues to test the quality of their spectra before publishing them. The impact of such a test will lead to an improved quality of the published data and likely to a better perturbative QCD interpretation (under the assumption that QCD is the right theory).
Anonymous on 20221209 [id 3118]
(in reply to Patrick Connor on 20220613 [id 2577])I am satisfied with the Author's revision and replies to my original criticism. My only residual comment is that I find the terminology "early stopping" very confusing. To me, "early stopping" suggests that there exists also "late stopping"  and possibly "correct stopping", which is neither early nor late? Indeed, before the Author's explanations, I misunderstood what they meant here. While in actual fact, what the Authors call "early stopping" is just the way they implement their test. Also, in this respect I find a bit confusing that the various criteria that they suggest correspond to rather different ways of implementing a smoothness test: e.g. with criterion 3 (the process is interrupted when a given p value is reached) surely the test is passed or failed according to whether the fit does or does not stop, while with criterion 4 (chi^2 stops improving) the fit is passed or not according to what the final chi2 value is. Upon careful reading of the paper this may be understood, but it seems to me that the paper would be easier to understand if this were spelled out explicitly.
Author: Patrick Connor on 20220613 [id 2578]
(in reply to Report 2 on 20220207)Dear referee,
thank you very much for your comments and suggestions. Please find answers to the points that you have raised below our signature.
Since your first review, the paper has been applied a significant amount of modifications and improvements.
Best regards, Radek Zlebcik & Patrick Connor
We have tested the other families of functions from the review that you have mentioned. We have added them to the paper draft for all four measurements (the green dotted points labelled "HK" in the legend). In general, for these two measurements, they give comparable description with similar number of parameters. However, for example, for the ATLAS data points which have smaller uncertainties, these functions do not describe shape of data well and more general perscription with more parameters is needed. Here, advantage of our approach is that the generalisation of the model is straighforward.
We have tested also other orthogonal polynomials, like Legendre polynomials: they exhibit similar performance. In our C++ implementation, we have now implemented an option to use alternative bases of polynomials in the fits.
We have added tables with fit probabilities in the new version.
We have added the Ftest as an alternative early stopping criterion. Furthermore, we have also implemented an early stopping criterion based on statistical replicas of the data sets and cross validation. We have added tables and detailed discussions on the results of the different early stopping criteria.
The functionality of obtaining the pulls was already provided in the GitLab repository of the tool, but not detailed in the first version of the paper. We have now added figures in the paper and compared the fit performance with our approach. In this way, one ensures that the function really describes overall shape and is suitable to spot the outliers.