Description
Details
While using OMP=12 with mpirun -n1 abacus to test Si16, theoretically, the efficiency of the FFT should be 12 times faster than a single core. However, in practice, setting OMP=12 is much slower than using a single core. Moreover, it consumes 80% of the total runtime, indicating that it is a bottleneck in the multi-core plane wave (pw) component.Here is the time table of the pw
Here is the Input file of the test
`
INPUT_PARAMETERS
#Parameters (1.General)
suffix autotest
calculation scf
#nbands 8
symmetry 1
#Parameters (2.Iteration)
ecutwfc 60
scf_thr 1e-8
scf_nmax 100
cal_force 1
cal_stress 1
#Parameters (3.Basis)
basis_type pw
#Parameters (4.Smearing)
smearing_method gauss
smearing_sigma 0.002
#Parameters (5.Mixing)
mixing_type broyden
mixing_beta 0.3
ks_solver dav
`
Task list for Issue attackers (only for developers)
- Reproduce the performance issue on a similar system or environment.
- Identify the specific section of the code causing the performance issue.
- Investigate the issue and determine the root cause.
- Research best practices and potential solutions for the identified performance issue.
- Implement the chosen solution to address the performance issue.
- Test the implemented solution to ensure it improves performance without introducing new issues.
- Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
- Review and incorporate any relevant feedback from users or developers.
- Merge the improved solution into the main codebase and notify the issue reporter.