-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid results of type 1 transform into (64, 64, 64) grid on A100 GPU #575
Comments
Smaller reproducer with just one point:
The spectra should be 1 everywhere, which it is for |
What happens if we use GM instead of SM? https://finufft.readthedocs.io/en/latest/c_gpu.html#options-for-gpu-code gpu_method should be supported in python too. |
With
A100:
On T4 all good:
|
@janden could you provide the command to do a debug build with pip? I saw this type of errors when using debug symbols. In my tests if I compile with @pavel-shmakov could you try a bigger eps? 1e-2 or 1e-3? |
|
@pavel-shmakov for the local compilation which version of CUDA are you using? If we move to email we could share binary wheels with different flags to narrow down the issue |
I'm using CUDA 12.3.
Great, please feel free to reach out on [email protected] |
I am able to reproduce the issue locally on my machine A6000 GPU:
I think the issue might be the nvcc version. If I build it locally with: gpu_method=1: (1.000000238418579+0j) |
@janden we should investigate the compile flags we pass to pip or can we test this with CUDA 11.2? Maybe it is a specific 11.2 problem. In that case upgrading to 11.3 or newer might be the solution. We could also ship source only for cufinufft? if one installs nvidia runtime nvcc is also present. In principle they can compile it locally. |
I've compiled the master branch for CUDA 11.3 and 11.4 here: https://users.flatironinstitute.org/~janden/cufinufft-2.4.0dev0-cuda11.3/ Let me know how these work out. FWIW, I can't reproduce the bug above on my local machine with the published 2.3.1 binary wheels. |
With this one the bug still reproduces.
With this one it doesn't! |
I would call it either a CUDA bug or me forgetting some sort of synchronisation, might have needed before but since I develop on the latest this could have been relaxed. @janden for the next release can we upgrade to 11.4? It was released in 2022. |
@pavel-shmakov Can confirm the same behavior on an FI machine (i.e., reproduce the bug for CUDA 11.3 and not for CUDA 11.4). @DiamonDinoia That's fine with me. We're talking 2.4.0 here, right? |
Yes |
That would be great! Any chance for an even newer CUDA version? CUDA release notes mention many improvements to |
I would recommend doing @janden shall we follow torch and have |
Alternatives are shared linking cufft, but in cufinufft the fft time is negligible. |
My colleagues working with FFTs on GPU mentioned to me that VkFFT (https://github.com/DTolm/VkFFT) is the way to go if high performance is required. But I agree that it should not have a big impact on overall NUFFT performance. |
I agree, this is something to consider when will target not NVIDIA GPUs. It does not take much to use it as we have the cmake facility in-place. But I'd imagine having problem with the complex data type and different API naming. |
We've encountered an issue where
cufinufft.nufft3d1
outputs wildly incorrect results for very specific inputs and only on certain GPUs. This can be reproduced by running the following code on an A100 GPU:Here's an archive with
points.pt
andvalues.pt
: inputs.zipThe value is many orders of magnitude greater than it should be. It also grows quickly with decreasing
eps
.Notes:
libcufinufft.so
is built, that would be helpful, and we can investigate further!The text was updated successfully, but these errors were encountered: