LCAO too slow on GPU, an example from Si with 512 atoms

### Details

[Si-512-abacus-gpu-test.tar.gz](https://github.com/user-attachments/files/21196731/Si-512-abacus-gpu-test.tar.gz)

The main bottleneck lies in the computing process of the CPU, as most of the computing time is consumed by CPU calculations, while the GPU operates for less than 20% of the total time. Regarding multi-GPU computing, I have correctly compiled the CUDA version of ELPA. However, practical tests show that the parallel acceleration of multi-GPU systems mainly stems from the proportionally increased number of CPU cores allocated. Specifically, scaling up from a configuration of [6 cores paired with 1 V100 SXM2 16GB GPU] to [24 cores paired with 4 V100 SXM2 16GB GPUs] may achieve a 2.5x speedup; even using 24 cores with a single V100 SXM2 16GB GPU can result in a speedup of more than 2x.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LCAO too slow on GPU, an example from Si with 512 atoms #6380

Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LCAO too slow on GPU, an example from Si with 512 atoms #6380

Description

Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions