-
-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug in NNlib when using CUDA's unified memory #568
Comments
Here's your NNlib- and Flux-free MWE if you were looking for one. For the second scenario, broadcasting |
@ToucheSir Ok, I think I know why I couldn't reproduce it by simple addition, if dimensions are the same it would simply dispatch to addition and it broadcasts only when dimensions differ CUDA.rand(Float32,30,30) .+ cu(rand(Float32,30,30), unified=true) works fine but CUDA.rand(Float32,30,30) .+ cu(rand(Float32,30,30,1), unified=true) fails. For the second example though broadcasting between two CuArrays on unified memory cannot reproduce the issue cu(rand(Float32,30,30),unified=true) .+ cu(rand(Float32,30,30,1), unified=true) works fine. Thanks for guidance! |
Should have been fixed by JuliaGPU/CUDA.jl#2290, so I think this can be closed. |
Hi,
I've been testing simple MWE to try if I can get my model training using unified memory feature of latest CUDA.jl package.
It does not look like the problem is within CUDA.jl but more like with NNlib.jl having a conflicting broadcast definitions when model is on device memory and data are on unified memory.
First scenario is when model is on device and data in buffer unified memory:
would error out
Second scenario is when both model and data are on unified memory which leads to different compilation error
I suspect these might be related.
The text was updated successfully, but these errors were encountered: