SciPost Submission Page
Performance in solving the Hermitian and pseudo-Hermitian Bethe-Salpeter equation with the Yambo code
by Petru Milev, Blanca Mellado-Pinto, Muralidhar Nalabothula, Ali Esquembre Kucukalic, Fernando Alvarruiz, Enrique Ramos, Alejandro Molina-Sanchez, Ludger Wirtz, Jose E. Roman, Davide Sangalli
This is not the latest submitted version.
Submission summary
| Authors (as registered SciPost users): | Petru Milev |
| Submission information | |
|---|---|
| Preprint Link: | scipost_202507_00069v1 (pdf) |
| Code repository: | https://gitlab.com/lumen-code/lumen |
| Code version: | Lumen Fork |
| Code license: | GPL-2.0 license |
| Data repository: | https://github.com/Petru-Milev/data_for_scipost_submission_1 |
| Date submitted: | July 25, 2025, 12:40 p.m. |
| Submitted by: | Petru Milev |
| Submitted to: | SciPost Physics Codebases |
| Ontological classification | |
|---|---|
| Academic field: | Physics |
| Specialties: |
|
| Approach: | Computational |
Abstract
We analyze the performance of two strategies in solving the structured eigenvalue problem deriving from the Bethe-Salpeter equation (BSE) in condensed matter physics. The BSE matrix is constructed with the Yambo code, and the two strategies are implemented by interfacing Yambo with the ScaLAPACK and ELPA libraries for direct diagonalization, and with the SLEPc library for the iterative approach. We consider both the Hermitian (Tamm-Dancoff approximation) and pseudo-Hermitian forms, addressing dense matrices of three different sizes. A description of the implementation is also provided, with details for the pseudo-Hermitian case. Timing and memory utilization are analyzed on both CPU and GPU clusters. Our results demonstrate that it is now feasible to handle dense BSE matrices of the order of 10^5.
Current status:
Reports on this Submission
Strengths
1- Comparison of different solvers interfaced with the Yambo electronic structure code in terms of runtime, scalability and memory utilization 2- Detailed analysis of results 3- Good readability and writing style
Weaknesses
1- Suitability for journal questionable 2- Structure and conciseness could be improved 3- Details on used solvers are unclear
Report
The manuscript does not present a "Codebase" per sé and therefore does not seem to align with the publication criteria of the journal "SciPost Physics Codebases". "Yambo" would be an example of a codebase, as I understand it, but is not presented in its entirety here. Instead, only some new features, i.e. the ability to interface other solver libraries, are evalualed. These additions are not (yet?) implemented in Yambo's production code, but on the development fork "lumen", further limiting their relevance given the journal's scope.
While the manuscript is pleasantly readable, it would benefit from a more clear and concise structure. If one wants to understand, say, one figure, the relevant information (algorithm, library, hardware, ..) seems scattered all over the manuscript.
In particular, perhaps due to the confusing structure of the manuscript, it is unclear which algorithms exactly were used when talking about "non-Hermitian Algorithms" and "pseudo-Hermitian Algorithms". For "non-Hermitian Algorithms", I am assuming it is a general eigensolver that completely ignores the matrices structure, i.e. LAPACK's "zgeev". But this should be stated explicitly. Even more unclear are the ""pseudo-Hermitian Algorithms". Section 2.1.1. implies that the (complex) eigenvalue problem is recast into a real skew-symmetric eigenvalue problem, which is then solved by a special skew-symmetric solver (see references 9 and 32). However, as far as I'm aware, this type of solver is not available in standard (Sca)LAPACK, but only in ELPA. So what exactly was used in the (Sca)LAPACK case? Or was the eigenvalue problem interpreted as a complex generalized Hermitian problem? Then section 2.1.1. is misleading. This should be clarified.
The manuscript should be sent to a journal whose scope better fits its contents, and should improve its clarity.
Requested changes
Some further open questions and points to improve:
1- Footnotes should not be used as sources, but should be actual footnotes or incorporated into the text where it makes sense.
2- The number of computed eigenvalues (100) in the iterative scheme seems arbitrary and specific, but the conclusions drawn are very general. (Example: “As already observed, iterative solvers are orders faster than diagonalization-based approaches.”) They would be more convincing, if the number of computed eigenvalues were varied as well, if the conclusions are supported by algorithmic arguments (instead of observed empirically) or if there was a good reason to choose this number specifically.
3- How was GPU memory measured? The linux time command does not provide this funcitonality.
4- In the hardware overview, details on the node-interconnect are missing, which are relevant in the experiments of Figure 3, as the experiments surpass the 1-node regime when using more than 32 cores. This impact should be addressed.
Recommendation
Accept in alternative Journal (see Report)
Report #1 by Anonymous (Referee 1) on 2025-9-29 (Invited Report)
The referee discloses that the following generative AI tools have been used in the preparation of this report:
OpenAI ChatGPT (GPT-5), used solely to check the English grammar of the report
Report
The present manuscript is clearly written, technically sound, and addresses a computational bottleneck in Bethe-Salpeter equation. This is relevant for condensed matter physics and materials science. The benchmarking against CPU/GPU clusters, as well as the comparison between Hermitian, non-Hermitian, and pseudo-Hermitian formulations, is accurate. The work will be valuable for the community, especially Yambo users. Overall the manuscript is technically strong, with solid contributions.
I support the publication of this work, and would like the authors to reply to the following comments:
1) While the manuscript focuses on performance, it would benefit from a short comment on some scientific applications. For example: what system sizes or particular physical phenomena (e.g. excitons in specific 2D materials, Rydberg states, etc.) become accessible with these advances? Some readers may miss why a BSE matrix with size 10^5 matters physically, or what kind of excitonic spectra these calculations are enabling.
2) As far as I can see, the benchmarks are primarily performed using the BSE Hamiltonian of CrI3. Could the authors clarify to what extent the conclusions obtained from this material are expected to be general? In particular, are there material dependent features (e.g. band dispersion, screening etc.) that might affect the efficiency or scaling?
Recommendation
Publish (easily meets expectations and criteria for this Journal; among top 50%)
