Open
Description
Details
I'm running calculations using the input files from the ABACUS Test Report for HSE (#6294), but the speed is very slow for everything except the SCF iterations. For example, with Si, the LOCAL POTENTIAL calculation is extremely slow, as shown here:
Initial plane wave basis and FFT box
---------------------------------------------------------
DONE(0.260526 SEC) : INIT PLANEWAVE
DONE(30.8853 SEC) : LOCAL POTENTIAL
Additionally, it takes a long time for files to be output after SCF convergence is achieved. I'm also including the TIME STATISTICS from the stdout:
----------------------------------------------------------------------
CLASS_NAME NAME TIME/s CALLS AVG/s PER/%
----------------------------------------------------------------------
total 214.96 13 16.54 100.00
Driver atomic_world 214.96 1 214.96 100.00
ESolver_KS_LCAO before_all_runners 33.02 1 33.02 15.36
NOrbital_Lm extra_uniform 6.49 1875 0.00 3.02
Mathzone_Add1 Uni_Deriv_Phi 6.33 1875 0.00 2.94
Exx_LRI init 32.38 1 32.38 15.06
Matrix_Orbs21 init 6.05 2 3.03 2.82
Matrix_Orbs21 init_radial_table 18.78 2 9.39 8.74
Center2_Orb cal_ST_Phi12_R 15.67 3439 0.00 7.29
LRI_CV set_orbitals 19.27 1 19.27 8.96
Matrix_Orbs11 init_radial_table 5.77 1 5.77 2.69
Ions opt_ions 181.84 1 181.84 84.59
ESolver_KS runner 128.16 1 128.16 59.62
ESolver_KS_LCAO before_scf 3.01 1 3.01 1.40
Exx_LRI cal_exx_ions 2.95 1 2.95 1.37
Potential cal_veff 2.22 35 0.06 1.03
PotXC cal_veff 2.19 35 0.06 1.02
XC_Functional v_xc 52.51 22 2.39 24.43
HSolverLCAO solve 7.82 34 0.23 3.64
HamiltLCAO updateHk 2.31 3536 0.00 1.07
HSolverLCAO hamiltSolvePsiK 3.87 3536 0.00 1.80
DiagoElpa elpa_solve 3.10 3536 0.00 1.44
RI_2D_Comm split_m2D_ktoR 100.91 7 14.42 46.94
Exx_LRI cal_exx_elec 12.43 7 1.78 5.78
XC_Functional_Libxc v_xc_libxc 2.16 27 0.08 1.01
ESolver_KS_LCAO cal_force 53.68 1 53.68 24.97
Force_Stress_LCAO getForceStress 53.68 1 53.68 24.97
Exx_LRI cal_exx_force 17.47 1 17.47 8.13
Exx_LRI cal_exx_stress 36.11 1 36.11 16.80
----------------------------------------------------------------------
My execution environment is as follows:
ABACUS version: v3.9.0.7
Compilation: Dockerfile.intel with intel-oneapi-mkl set to 2025.1
Command: OMP_NUM_THREADS=16 mpirun -np 2 abacus
CPU: 32 cores of Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
Is this kind of speed expected? Do you have any advice?
Task list for Issue attackers (only for developers)
- Reproduce the performance issue on a similar system or environment.
- Identify the specific section of the code causing the performance issue.
- Investigate the issue and determine the root cause.
- Research best practices and potential solutions for the identified performance issue.
- Implement the chosen solution to address the performance issue.
- Test the implemented solution to ensure it improves performance without introducing new issues.
- Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
- Review and incorporate any relevant feedback from users or developers.
- Merge the improved solution into the main codebase and notify the issue reporter.