# High Efficiency Configuration Space Sampling -- probing the distribution of available states

### Submission summary

 As Contributors: Paweł Jochym · Jan Łażewski Preprint link: scipost_202101_00011v1 Date submitted: 2021-01-20 09:59 Submitted by: Jochym, Paweł Submitted to: SciPost Physics Academic field: Physics Specialties: Condensed Matter Physics - Theory Approaches: Theoretical, Computational

### Abstract

Substantial acceleration of research and more efficient utilization of resources can be achieved in modelling investigated phenomena by identifying the limits of system's accessible states instead of tracing the trajectory of its evolution. The proposed strategy uses the Metropolis-Hastings Monte-Carlo sampling of the configuration space probability distribution coupled with physically-motivated prior probability distribution. We demonstrate this general idea by presenting a high performance method of generating configurations for lattice dynamics and other computational solid state physics calculations corresponding to non-zero temperatures. In contrast to the methods based on molecular dynamics where only a small fraction of obtained data is consumed, the proposed scheme is distinguished by a considerably higher, reaching even 80%, acceptance ratio.

###### Current status:
Has been resubmitted

### Submission & Refereeing History

Resubmission scipost_202101_00011v3 on 21 May 2021
Resubmission scipost_202101_00011v2 on 26 April 2021

Submission scipost_202101_00011v1 on 20 January 2021

## Reports on this Submission

### Report 2 by Bjorn Wehinger on 2021-3-10 Post-Editorial Recommendation Report (Invited Report)

• Cite as: Bjorn Wehinger, Report on arXiv:scipost_202101_00011v1, delivered 2021-03-10, doi: 10.21468/SciPost.Report.2676

### Strengths

Original new approach for the study of lattice lattice dynamics at finite temperatures.

### Weaknesses

Presentation should be improved.

### Report

The authors of the manuscript "High Efficiency Configuration Space Sampling – probing the distribution of available states" present a new method for studying lattice dynamics at finite temperatures.
Their approach is based on configuration sampling the distribution of available states using the Metropolis-Hasting algorithm with a prior probability distribution derived from harmonic lattice dynamics.
The authors compute anharmonic phonon dispersions and lifetimes for 3C-SiC to validate their approach and claim high computational performance due to a large observed acceptance ratio.
The idea is highly original and I expect significant impact for the study of thermal properties at finite temperatures in large crystalline systems containing many atoms per unit cells.
In order to fully convince the reader and justify publication in SciPost Physics, I recommend the authors to address and clarify the following:

1. The lattice dynamics of 3C-SiC at room temperature can be described fairly well by the harmonic approximation. It seems thus no big surprise that a prior probability distribution derived from harmonic lattice dynamics converges successfully and quickly. But how well does it work for a more anharmonic situation?
Although the chosen potentials might not be accurate close to melting it would be a very nice illustration to compare the dispersions and lifetimes to molecular dynamics simulations at a temperature where anharmonic effects are more important.

2. The performance of the new approach is based on comparing its acceptance ratio to molecular dynamics simulations. How do actual computation times compare? How does the performance (computation time) scale with system size including the possibility to run calculation in parallel on many cores?

3. Presentation. Title and abstract suggest application of the method to a wild variety of problems in solid state physics. However, such are mentioned only marginally in the conclusions while the main text fully focuses on the application to lattice dynamics.
Experts in lattice dynamics may thus overlook this work and its relevance if not highlighted better.
At several points the manuscript would profit from more quantitative statements.

For instance,

Lines 80-81: Limitations and applications should be discussed in more details.

Lines 102-103: Phase transitions are excluded by the "reasonable" class. This should me mentioned and the application of the approach to different kind of phase transitions could be addressed.

Lines 114-115: "too wild" and "very quick" should be quantified.

Lines 136- 137: "barely noticeable" and "hardly visible" obviously depend on how the data is plotted. Please quantify.

Figs. 1 and 2. correlations between $E_{k var}$ and $E_{p var}$ could be discussed.

Lines 186-189: Asymptotic production of target distribution for any non-vanishing prior distribution requires a citation.

Lines 213-214: Please explain why parameters are independent and their values not critical. Are there limitations?

Figures 5 and 6 should be discussed in more detail. Agreement and differences need to be pointed out. Fig. 5 is confusing. It's caption suggest that molecular dynamics was used to extract harmonic phonon frequencies, while the text states that higher order (anharmonic) force constants were extracted. It would be nice to compare harmonic phonon frequencies to both anharmonic phonon frequencies obtained from molecular dynamics and from the new method and discuss agreement and differences in detail for at least two different temperatures.
Fig. 6 is lacking information on lifetimes of the acoustic branches with small momenta and small energies close to the $\Gamma$-point. It would be nice to discuss convergence and numerical limitations for these.

Both figures are very difficult to read because they are small and contain too much data. Splitting into sub-panels where the same number of samples are compared could help.

In summary, the presented approach is highly innovative and worth to be published but the presentation needs to be improved to make it convincing.

### Requested changes

• validity: high
• significance: high
• originality: top
• clarity: good
• formatting: reasonable
• grammar: reasonable

### Author:  Paweł Jochym  on 2021-04-26  [id 1383]

(in reply to Report 2 by Bjorn Wehinger on 2021-03-10)
Category:
remark

# Reply to the report of Dr Wehinger

We would like to thank Dr Wehinger for careful reading of the manuscript and his positive opinion on our work.

The authors of the manuscript "High Efficiency Configuration Space Sampling – probing the distribution of available states" present a new method for studying lattice dynamics at finite temperatures. Their approach is based on configuration sampling the distribution of available states using the Metropolis-Hasting algorithm with a prior probability distribution derived from harmonic lattice dynamics. The authors compute anharmonic phonon dispersions and lifetimes for 3C-SiC to validate their approach and claim high computational performance due to a large observed acceptance ratio. The idea is highly original and I expect significant impact for the study of thermal properties at finite temperatures in large crystalline systems containing many atoms per unit cells. In order to fully convince the reader and justify publication in SciPost Physics, I recommend the authors to address and clarify the following:

We would like to point out that the presented approach does not depend on strict harmonicity of the system. The Eq. (4) and its description (l. 78-87) explicitly point to the impact of the anharmonicity on the formulas used in the proposed method. In particular, the normality of the distribution is not impacted - since it originates from the Central Limit Theorem (CLT, Eq. 5). What may be influenced is the value of the mean and the variance of the distribution - which will skew the temperature scale and possibly diminish the fidelity of our approximation of the thermal equilibrium state. Since both referees missed this point we have expanded our explanation of this issue to make it more clear to the reader.

Additionally, while we use lattice dynamics as an example in the text, the potential applicability of the proposed method is broader - it may be useful in other places where we need to reproduce the configuration of the system of atoms in thermal equilibrium in non-zero temperature. This fact is mentioned in the abstract but we will expand the conclusions by mentioning it there as well.

1. The lattice dynamics of 3C-SiC at room temperature can be described fairly well by the harmonic approximation. It seems thus no big surprise that a prior probability distribution derived from harmonic lattice dynamics converges successfully and quickly. But how well does it work for a more anharmonic situation? Although the chosen potentials might not be accurate close to melting it would be a very nice illustration to compare the dispersions and lifetimes to molecular dynamics simulations at a temperature where anharmonic effects are more important.

Indeed, the chosen system is not strongly anharmonic at T=300K. But still there is enough anharmonicity in the model to produce 5ps phonon lifetimes plotted in Fig. 6. Also, the Tersoff potential selected for the study is not a simplistic, harmonic, two-body potential. It is a published, effective model of interactions in the Si-C compounds.

We would like to stress that the closeness of the prior distribution to the target (Figs 3 and 4) originates from the size of the system and careful selection of the prior generating algorithm (Eq. 6 and description in l. 172-183). As we noted in the reply to the first referee the extreme cases of anharmonicity dominating the right hand side of the Eq. 4 for all, or most coordinates may be beyond the direct applicability of the proposed method. To illustrate the point, we have added to Figures 3, 4, 5, and 6 the calculations performed for higher temperatures (up to T=2000K) closer to the melting point of 3C-SiC demonstrating effectiveness of the proposed approach even in high temperatures.

1. The performance of the new approach is based on comparing its acceptance ratio to molecular dynamics simulations. How do actual computation times compare? How does the performance (computation time) scale with system size including the possibility to run calculation in parallel on many cores?

The computational cost is essentially proportional to the number of requested configurations plus necessary burn-in samples (1-10, can be limited to 1-2 with careful selection of initial displacement variation). Due to the details of the Metropolis-Hastings algorithm this cost is independent of the acceptance ratio. Low acceptance ratio leads to low quality of the generated distribution, not a direct increase in computational cost. This increase stems from the fact that with low acceptance more samples are required for the reasonable fidelity of the produced distribution. In comparison with MD calculations, each generated configuration is equivalent to one time step in trajectory. However, in case of the DFT-based calculations, the MD procedure can be optimized by starting each step from the charge density/wave functions converged in the previous step. Due to the fact that samples generated by HECSS are independent, this optimization is not easily available in DFT-based calculations. This amounts to approximately twice as many electronic SCF steps per evaluated configuration. Thus, n-configurations HECSS run is equivalent to approximately 2*(n+10) time steps of the MD calculation. In our experience this is not enough to provide even single, well-thermalized sample for n<500.

Regarding the parallel computation: In current implementation each configuration evaluation may be run on multiple cores but the sample generation is strictly serial. The near-independence of generated samples provides opportunity for future splitting of the computation to multiple processes. Naturally, each temperature scan may be run as a separate process with full linear scaling.

We will add analysis of the computational cost of the HECSS approach to the final paragraph of the text.

1. Presentation. Title and abstract suggest application of the method to a wild variety of problems in solid state physics. However, such are mentioned only marginally in the conclusions while the main text fully focuses on the application to lattice dynamics. Experts in lattice dynamics may thus overlook this work and its relevance if not highlighted better. At several points the manuscript would profit from more quantitative statements.

We will expand the abstract to better reflect our focus - which is indeed, at this moment, on lattice dynamics applications. The other applications mentioned in the abstract are our suggestions of other fields where this type of procedure may be beneficial.

Lines 80-81: Limitations and applications should be discussed in more details. Lines 102-103: Phase transitions are excluded by the "reasonable" class. This should me mentioned and the application of the approach to different kind of phase transitions could be addressed.

Following the comment of the first referee we have expanded the description of probable limitations of the proposed method (phase transitions, highly anharmonic systems).

Lines 114-115: "too wild" and "very quick" should be quantified.

Lines 136- 137: "barely noticeable" and "hardly visible" obviously depend on how the data is plotted. Please quantify.

We have replaced these imprecise phrases with quantitative description showing the speed of convergence and cited the appropriate literature.

Figs. 1 and 2. correlations between E_kvar and E_pvar could be discussed.

The correlation between variances of the kinetic and potential distribution comes directly from energy conservation and statistical mechanics. Both energies are part of the Hamiltonian and sum up to the total energy. Thus, due to the energy conservation their variances should match. We have added the appropriate sentence to the discussion at the end of section 2.

Lines 186-189: Asymptotic production of target distribution for any non-vanishing prior distribution requires a citation.

Appropriate citation has been added to the list of references.

Lines 213-214: Please explain why parameters are independent and their values not critical. Are there limitations?

The independence from the system (supercell) size stems from the connection with the displacement distribution - it is our conclusion drawn from the experience gained during the development of the HECSS scheme. If the interactions are reproduced reasonably well in the small supercell (e.g. single crystallographic unit cell) the average size of thermal displacement is expected to be the same as in larger supercell due to the same energy per degree of freedom (i.e. temperature) and very similar shape of the potential. The independence and the practical ranges of the parameters cited in the text are derived from the multiple tests run during the development of the HECSS code. We have added a sentence explaining this property and rephrased the surrounding text to make this issue more clear to the reader.

Figures 5 and 6 should be discussed in more detail. Agreement and differences need to be pointed out. Fig. 5 is confusing. It's caption suggest that molecular dynamics was used to extract harmonic phonon frequencies, while the text states that higher order (anharmonic) force constants were extracted. It would be nice to compare harmonic phonon frequencies to both anharmonic phonon frequencies obtained from molecular dynamics and from the new method and discuss agreement and differences in detail for at least two different temperatures.

The phonon frequencies presented in Fig. 5 are derived by fitting of a third order anharmonic model to both datasets and the frequencies are derived from this model. The lifetimes from the Fig. 6 are obtained from the same model using ALAMODE to compute anharmonic self-energy and phonon lifetimes from the third order coefficients in the fit (using relaxation time approximation). We have corrected a misleading description of Figs 5 and 6 and expanded the description to make the point clear. We have also included the RMS differences between phonon frequencies derived by both methods.

Fig. 6 is lacking information on lifetimes of the acoustic branches with small momenta and small energies close to the Γ-point. It would be nice to discuss convergence and numerical limitations for these.

The access to the vicinity of the zone-center is limited by the supercell size used in the calculation. The closest point provided by the supercell used in the paper (5x5x5, 1000 atoms) and reciprocal space sampling grid (20x20x20) is located at 1/10 of the zone size from the center. All data between this point and the zone center are interpolated from the fitted force constant matrices in real space. We will add information about the reciprocal space sampling to the text. Additionally we have expanded the presented data to include more temperatures: 100K 600K and 2000K.

Both figures are very difficult to read because they are small and contain too much data. Splitting into sub-panels where the same number of samples are compared could help.

Figure 5 is intended to show small difference between frequencies computed from both data sets. We agree that the presentation in both Fig. 5 and 6 will benefit from such split and we have replaced both figures by separate panels containing data sets of the same size. We have also added additional temperatures - as mentioned above. The description has been modified appropriately.

In summary, the presented approach is highly innovative and worth to be published but the presentation needs to be improved to make it convincing.

We hope that the above explanations and corrections to the text make our paper convincing and clarify all the issues raised by Dr Wehinger.

Paweł T. Jochym, Jan Łażewski

### Anonymous Report 1 on 2021-2-26 (Invited Report)

• Cite as: Anonymous, Report on arXiv:scipost_202101_00011v1, delivered 2021-02-26, doi: 10.21468/SciPost.Report.2627

### Report

The authors of the manuscript "High Efficiency Configuration Space Sampling – probing the distribution of available states" present a configuration sampling approach based on the Metropolis-Hastings algorithm.
The authors claim that their approach is more efficient than other established approaches based on molecular dynamics (MD) propagation. They validate their numerical approach by computing the phonon dispersion and lifetimes
of the cubic 3C-SiC crystal and comparing against MD benchmark results.

I have found a few gaps in the presentation which prevent a full grasp of their assumptions and numerical validation. Their arguments and derivations cannot be fully reproduced by qualified experts

For instance, the authors claim they sampled at finite temperature, but the value of the temperature is not provided anywhere in the text.

If the temperature is much smaller than the melting temperature (~3,000 K for SiC), the harmonic approximation is expected to be quite accurate and the the Gaussian "prior" sampling described in Sec. 3 should be almost exact. This seems to agree with the large acceptance ratio (up to 80%) observed by the authors.

A very high acceptance ratio is not necessarily an advantage of the approach as it may imply large correlation in the sample. The authors should discuss the position autocorrelation function obtained from their approach and compare against the one obtained using canonical MD sampling. The best acceptance rate is the one which minimises the autocorrelation time.

The agreement between the Monte Carlo and MD phonon dispersion shown in Fig. 5 is to be expected if the anharmonicity is negligible. The agreement on the phonon lifetimes is a tougher check, but it is hard to draw any quantitative conclusion from Fig. 6. The points are pretty scattered over a semi-log plot, which means that the error can be rather large.
The authors should discuss a more quantitative estimator, e.g., the square root of the sum over the wave-vectors and bands of the square deviation of the Monte Carlo and MD phonon lifetimes.

Lines 34-35: The authors mention "running a 30000 steps MD". Why exactly this number of steps?

Eq. (1): Not all the symbols have been introduced in the text.

Eq. (2): The authors implicitly assume a two-body force field. What would happen in the case of a many-body force field like the embedded atom model or Tersoff potentials?

Lines 80-81: The sentence "Only experience can tell us how good this approximation is and how wide its applicability range is" is not correct and underrate the role of a large body of numerical analysis.

Lines 102-103: The sentence "The reasonable class is very broad here, certainly containing all physically interesting cases by virtue of requiring a finite variance and a well-defined mean" is not correct, unless phase transitions are excluded. The variance of several quantities diverges close to a phase transition.

Eq. (5): The equal sign is not correct as for N\to\infty the variance of the distribution of the right-hand side is zero.

Line 125: A "single server" is not a well-defined object: how many CPU's were used?

In comparing the computational efficiency of MD and Monte Carlo methods, one has to take into consideration the time spent generating random numbers, which may be more computationally demanding than propagating a trajectory, given the same number of force-field evaluations.

In conclusion, I do not feel comfortable in suggesting the manuscript in its current form.

• validity: -
• significance: -
• originality: -
• clarity: -
• formatting: -
• grammar: -

### Author:  Paweł Jochym  on 2021-03-04  [id 1286]

(in reply to Report 1 on 2021-02-26)

We thank the referee for reading our paper and spotting omissions and mistakes in our text. We would like to immediately address the referee's comments and correct mistakes pointed in his report. Naturally, we intend to include these corrections in the resubmitted manuscript. We hope that the following clarifications will enable full understanding of our approach. We address the comments paragraph by paragraph quoting the referee before our response.

I have found a few gaps in the presentation which prevent a full grasp of their assumptions and numerical validation. Their arguments and derivations cannot be fully reproduced by qualified experts For instance, the authors claim they sampled at finite temperature, but the value of the temperature is not provided anywhere in the text.

The text was intended as a demonstration of the proposed method not as a reproduction or prediction of a particular experimental result. This may perhaps explain unfortunate omission of sampled temperature in the text. While the temperature can in fact be extracted from the average energy in Fig. 3 and 4, it should also be stated in the text explicitly. The sampling temperature for all presented data is 300 K, chosen as standard ambient temperature. However, we have tested our algorithm for a number of other temperatures, up to 2000 K and there was no qualitative difference in its performance or properties.

If the temperature is much smaller than the melting temperature (~3,000 K for SiC), the harmonic approximation is expected to be quite accurate and the the Gaussian "prior" sampling described in Sec. 3 should be almost exact. This seems to agree with the large acceptance ratio (up to 80%) observed by the authors.

The shape of the energy distribution is not determined by harmonicity of the potentials but by the Central Limit Theorem and the size of the system. The difference between prior distribution (which comes from our approximation of the displacement distribution) and the target distribution does not stem from the anharmonicity of the potential but mainly from the fact that the displacements of the atoms in the crystal are not independent (clearly stated in our text). Please note that we have no direct access to the potential energy of the system - we can only specify the geometry and calculate the resulting energy. Thus, we have no ability to directly generate the target energy distribution from Fig. 3. The high acceptance ratio comes from our selection of the displacement distribution and tuning algorithm described in the paper. The Metropolis-Hastings (M-H) algorithm can generate a target distribution from any prior which is non-zero over the domain. However, the acceptance ratio may be very low if you fail to use the prior which is a good approximation of target distribution. It is a well-known fact in the numerical statistics community that the good selection of the prior distribution is the key to an effective use of probability distribution sampling algorithms.

A very high acceptance ratio is not necessarily an advantage of the approach as it may imply large correlation in the sample. The authors should discuss the position autocorrelation function obtained from their approach and compare against the one obtained using canonical MD sampling. The best acceptance rate is the one which minimises the autocorrelation time.

The issue of sample correlation would be indeed important if we used a 'random walk'-type algorithm for the prior generation (which is a popular variant of the M-H algorithm). Instead, we use independent samples and the only possible correlation between them arises from a very small change (less than 2%) in the variance of the position distribution. We discuss this issue in the paragraph starting in line 196. On the other hand, the MD derived data has obvious autocorrelations - note that we can derive phonon frequencies from the Fourier transform of the velocity autocorrelation function along the MD trajectory. Thus, it is necessary to separate sampling points on the trajectory by substantial intervals allowing for these correlations to die out. Nevertheless, all time steps between the sampling points still need to be calculated, which leads to large inefficiency of the MD as a configuration generator. To further clarify the issue we suggest expanding the explanation in the text by the following paragraph:

'The possible correlations introduced in the HECSS generated data result only from the fact that if the n-th sample leads to exceptionally small or large energy, the next sample is drawn from the positional distribution with variance increased or reduced, respectively, by a small amount (no more than $(1+\delta)$ times in extreme cases). Thus, the probability of larger energy following the exceptionally small energy in the sampling chain (or the other way around: a smaller sample following an exceptionally large one) is slightly increased. Note, however, that this does not introduce any correlations in any particular coordinate. In the MD trajectory the correlations arise from the non-random character of the particle trajectory.'

The agreement between the Monte Carlo and MD phonon dispersion shown in Fig. 5 is to be expected if the anharmonicity is negligible. The agreement on the phonon lifetimes is a tougher check, but it is hard to draw any quantitative conclusion from Fig. 6. The points are pretty scattered over a semi-log plot, which means that the error can be rather large. The authors should discuss a more quantitative estimator, e.g., the square root of the sum over the wave-vectors and bands of the square deviation of the Monte Carlo and MD phonon lifetimes.

Indeed, for the purely harmonic system the phonon frequency test is not useful since phonon frequencies are independent from the displacement size. However, if the system considered in the paper were close to harmonic, we would expect to obtain very long phonon lifetimes (since they are infinite in the harmonic system). The data in Fig. 6 demonstrates that most of the phonon modes exhibit lifetimes below 10ps - showing non-negligible anharmonicity in the model. This fact provides justification for the validity of the phonon frequency test. The phonon lifetimes are very sensitive to the accuracy of the model. This is especially true in case of large values which indicate small deviations from harmonicity and usually carry large error bars. Unfortunately, the large range (close to two orders of magnitude) of the values of lifetimes makes the simple RMS measure of differences very misleading - since the differences at high end of the range will dominate the sum. Thus, we are going to consider using RMS of logarithms of lifetimes as a better measure of relative changes in the values obtained using MD and HECSS approaches.

Lines 34-35: The authors mention "running a 30000 steps MD". Why exactly this number of steps?

The number of steps (30 000) used in the introduction was a typical relaxation time of a long-run MD suggested by the often used "rule of thumb" in MD calculations (50 times period of typical vibrations in the system). For 3C-SiC: $f\approx 10$THz = $10^{13}$Hz $\Rightarrow t=10^{-13}$s = 100fs; 50 * 100fs = 5ps. With 1fs time step that equals 5000 steps minimum run where we can use at most half of it for actual data (you need to provide time to obtain thermal equilibrium). If we need approx. 30 data points (as required by anharmonic calculations, see Fig. 6) and they should be separated by at least 1ps interval (at least 10 typical vibrations) we get approximately 30 000 steps. The cited number itself has no 'magical' value and results from the setup of the calculations presented in the paper. To avoid impression that the number 30 000 has any special meaning, we are going to replace the number by the phrase: "thousands of MD steps".

Eq. (1): Not all the symbols have been introduced in the text.

Eq. (1): The sentence introducing missing symbols will be added to the revised text: $x_n$ - generalized coordinate, $H$ - Hamiltonian, $T$ - temperature, $k_B$ - Boltzmann constant, $\delta_{mn}$ - Kronecker delta

Eq. (2): The authors implicitly assume a two-body force field. What would happen in the case of a many-body force field like the embedded atom model or Tersoff potentials?

Eq. (2): We make no two-body assumption, neither implied nor explicit. The formulation of equipartition theorem, Eq. (1), explicitly concerns single coordinates (the only non-zero term due to the Kronecker delta) and makes no assumption on the form of the Hamiltonian H. The Taylor expansion, Eq. (2), is not a complete expansion in all coordinates $q$ (note the scalar $q$ symbol). It is the Taylor expansion in single coordinate with coefficients ($C_n$ - proportional to partial derivatives of energy with respect to this coordinate) which are functions of all the other coordinates in the system. Furthermore, the calculations presented in the paper use the mentioned Tersoff potential developed in refs 17, 18. We understand that due to the formulation of the surrounding text this may not be entirely clear and may confuse the reader. To avoid this we are going to add a clarifying sentence before Eq. (2).

Lines 80-81: The sentence "Only experience can tell us how good this approximation is and how wide its applicability range is" is not correct and underrate the role of a large body of numerical analysis.

The unfortunate sentence 80-81 brings nothing of importance to the text. We are going to remove it in the resubmitted text.

Lines 102-103: The sentence "The reasonable class is very broad here, certainly containing all physically interesting cases by virtue of requiring a finite variance and a well-defined mean" is not correct, unless phase transitions are excluded. The variance of several quantities diverges close to a phase transition.

Variance of several quantities is indeed divergent in some phase transitions. In cases where the transition involves divergent heat capacity this includes energy variance. Thus, our phrase :"...all physically interesting cases..." was indeed wrong. The sentence is going to be corrected. We are going to specify where we can use the described procedure and clearly state that in cases where energy variance diverges, the procedure cannot be used. We thank the referee for spotting this important fact.

Eq. (5): The equal sign is not correct as for N\to\infty the variance of the distribution of the right-hand side is zero.

The Eq. (5) was an attempt to formally write asymptotic relation of the Central Limit Theorem described in the paragraph 99-102. The CLT is indeed not a limit relation but asymptotic distribution convergence relation and the Eq. (5) should use appropriate notation for such relations as convergence in distribution:

$$\sqrt{3N}\left(\frac{1}{N} \sum_i E_i -\langle E\rangle \right) \xrightarrow{d}\mathcal{N}(0, \sigma).$$

The mistake in notation has no consequences for the arguments and conclusions presented in the text. The Eq. (5) is going to be corrected in the revised text.

Line 125: A "single server" is not a well-defined object: how many CPU's were used?

"Single server" mentioned in line 125 was used as a rough indication of the computational effort involved in the described task. It is nothing out of ordinary: 2x4 cores CPU and 32GB RAM. This is actually a fairly under-powered and old machine, less powerful than some of newer generation laptops. The information will be added to the sentence in revised text.

In comparing the computational efficiency of MD and Monte Carlo methods, one has to take into consideration the time spent generating random numbers, which may be more computationally demanding than propagating a trajectory, given the same number of force-field evaluations.

In some cases the random number generation may be indeed fairly expensive but in the case of typical systems of tens of atoms, the energy and forces evaluation is much more time-consuming. For instance, the random number generator we have used (from SciPy.stats library) takes 180$\mu$s to generate 3000 random numbers required to create one sample for the 5x5x5 supercell of 3C-SiC. The single evaluation of energy for the same cell (1000 atoms) takes 4ms (20 times longer) using ASAP3 with OpenKIM model from our calculations. A more sophisticated interaction model is bound to be even more time-consuming. Furthermore, molecular dynamics requires calculating multiple time steps per every generated sample. Considering this facts, we maintain that the proposed HECSS approach offers substantial advantage over MD as a source of configuration data.