-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow BoomerAMG performance using gpu-aware mpi with Nalu-Wind on Kestrel #1241
Comments
I am sad to report that CUDA 12.6 with all the proper linkages did not change the behavior. I am pulling fresh profiles of it and will add the data |
@dreachem tagging for reference |
MPICH_GPU_IPC_ENABLED=0 Allows us to bypass the issue by disabling the slow calls |
We would still like to run things with it enabled, but we are seeing some performance benefits even as is for our initial test cases. |
As a further test, I will also try out an umpire memory pool based on some discussions of other workarounds to IPC we found. |
Hi!
I put up this issue to track the ongoing effort to figure out why Nalu-Wind struggles with gpu-aware-mpi enabled in HYPRE when using the cluster Kestrel. The slowdown is within a bunch of slow MPI calls happening within BoomerAMG specifically when it is enabled. If I run with pure GMRES in HYPRE and gpu-aware-mpi things look good.
It is not clear that HYPRE is doing anything necessarily wrong. It is possible that this is a CUDA or HPE problem and people from both parties have been contacted over it. I have attached some shots showing an nsys profile with and without mpi aware enabled. Note the MPI line differences.
To solve this issue I am currently trying CUDA 12.6.2 which might have a related bug fix.
The text was updated successfully, but these errors were encountered: