Skip to content

LCAO too slow on GPU, an example from Si with 512 atoms #6380

@mohanchen

Description

@mohanchen

Details

Si-512-abacus-gpu-test.tar.gz

The main bottleneck lies in the computing process of the CPU, as most of the computing time is consumed by CPU calculations, while the GPU operates for less than 20% of the total time. Regarding multi-GPU computing, I have correctly compiled the CUDA version of ELPA. However, practical tests show that the parallel acceleration of multi-GPU systems mainly stems from the proportionally increased number of CPU cores allocated. Specifically, scaling up from a configuration of [6 cores paired with 1 V100 SXM2 16GB GPU] to [24 cores paired with 4 V100 SXM2 16GB GPUs] may achieve a 2.5x speedup; even using 24 cores with a single V100 SXM2 16GB GPU can result in a speedup of more than 2x.

Metadata

Metadata

Assignees

Labels

PerformanceIssues related to fail running ABACUS

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions