Skip to content

RTX5070Ti is not fully supported by CUDA 12.8.0 #2714

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Yongda1 opened this issue Mar 23, 2025 · 14 comments · Fixed by #2717
Open

RTX5070Ti is not fully supported by CUDA 12.8.0 #2714

Yongda1 opened this issue Mar 23, 2025 · 14 comments · Fixed by #2717
Labels
bug Something isn't working

Comments

@Yongda1
Copy link

Yongda1 commented Mar 23, 2025

When i am running the following codes,

import Pkg
using CUDA
using Test
Pkg.add("CUDA")
CUDA.versioninfo()
N = 100
x_d = CUDA.fill(1.0f0, N)  # a vector stored on the GPU filled with 1.0 (Float32)
y_d = CUDA.fill(2.0f0, N)  # a vector stored on the GPU filled with 2.0
y_d .+= x_d
@test all(Array(y_d) .== 3.0f0)

Error

1 device:
  0: NVIDIA GeForce RTX 5070 Ti (sm_120, 13.432 GiB / 15.921 GiB available)
┌ Warning: Your NVIDIA GeForce RTX 5070 Ti GPU (compute capability 12.0) is not fully supported by CUDA 12.8.0.
│ Some functionality may be broken. Ensure you are using the latest version of CUDA.jl in combination with an up-to-date NVIDIA driver.
│ If that does not help, please file an issue to add support for the latest CUDA toolkit.
└ @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\lib\cudadrv\state.jl:236 
ERROR: 
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\lib\cudadrv\module.jl:69
  [3] CuModule
    @ C:\Users\Administrator\.julia\packages\CUDA\sWPBr\lib\cudadrv\module.jl:49 [inlined]
  [4] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{image::Vector{UInt8}, entry::String})
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\compilation.jl:414
  [5] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler C:\Users\Administrator\.julia\packages\GPUCompiler\OGnEB\src\execution.jl:262
  [6] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler C:\Users\Administrator\.julia\packages\GPUCompiler\OGnEB\src\execution.jl:151
  [7] macro expansion
    @ C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\execution.jl:373 [inlined]
  [8] macro expansion
    @ .\lock.jl:273 [inlined]
  [9] cufunction(f::GPUArrays.var"#gpu_broadcast_kernel_linear#38", tt::Type{…}; kwargs::@Kwargs{…})
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\execution.jl:368
 [10] macro expansion
    @ C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\execution.jl:112 [inlined]
 [11] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…},
 workgroupsize::Nothing)
    @ CUDA.CUDAKernels C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\CUDAKernels.jl:103
 [12] _copyto!
    @ C:\Users\Administrator\.julia\packages\GPUArrays\uiVyU\src\host\broadcast.jl:71 [inlined]
 [13] materialize!
    @ C:\Users\Administrator\.julia\packages\GPUArrays\uiVyU\src\host\broadcast.jl:38 [inlined]
 [14] materialize!(dest::CuArray{…}, bc::Base.Broadcast.Broadcasted{…})
    @ Base.Broadcast .\broadcast.jl:880
 [15] top-level scope
    @ e:\Juliatest\cudatest.jl:9

caused by: CUDA error: no kernel image is available for execution on the device (code 209, ERROR_NO_BINARY_FOR_GPU)
Stacktrace:
  [1] checked_cuModuleLoadDataEx(_module::Base.RefValue{…}, image::Ptr{…}, numOptions::Int64, options::Vector{…}, optionValues::Vector{…})
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\lib\cudadrv\module.jl:28
  [2] CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})     
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\lib\cudadrv\module.jl:60
  [3] CuModule
    @ C:\Users\Administrator\.julia\packages\CUDA\sWPBr\lib\cudadrv\module.jl:49 [inlined]
  [4] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{image::Vector{UInt8}, entry::String})
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\compilation.jl:414
  [5] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler C:\Users\Administrator\.julia\packages\GPUCompiler\OGnEB\src\execution.jl:262
  [6] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler C:\Users\Administrator\.julia\packages\GPUCompiler\OGnEB\src\execution.jl:151
  [7] macro expansion
    @ C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\execution.jl:373 [inlined]
  [8] macro expansion
    @ .\lock.jl:273 [inlined]
  [9] cufunction(f::GPUArrays.var"#gpu_broadcast_kernel_linear#38", tt::Type{…}; kwargs::@Kwargs{…})
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\execution.jl:368
 [10] macro expansion
    @ C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\execution.jl:112 [inlined]
 [11] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…},
 workgroupsize::Nothing)
    @ CUDA.CUDAKernels C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\CUDAKernels.jl:103
 [12] _copyto!
    @ C:\Users\Administrator\.julia\packages\GPUArrays\uiVyU\src\host\broadcast.jl:71 [inlined]
 [13] materialize!
    @ C:\Users\Administrator\.julia\packages\GPUArrays\uiVyU\src\host\broadcast.jl:38 [inlined]
 [14] materialize!(dest::CuArray{…}, bc::Base.Broadcast.Broadcasted{…})
    @ Base.Broadcast .\broadcast.jl:880
 [15] top-level scope
    @ e:\Juliatest\cudatest.jl:9
Some type information was truncated. Use `show(err)` to see complete types.

Details on CUDA:

CUDA runtime 12.8, artifact installation
CUDA driver 12.8
NVIDIA driver 572.83.0

CUDA libraries:

  • CUBLAS: 12.8.3
  • CURAND: 10.3.9
  • CUFFT: 11.3.3
  • CUSOLVER: 11.7.2
  • CUSPARSE: 12.5.7
  • CUPTI: 2025.1.0 (API 26.0.0)
  • NVML: 12.0.0+572.83

Julia packages:

  • CUDA: 5.7.0
  • CUDA_Driver_jll: 0.12.0+0
  • CUDA_Runtime_jll: 0.16.0+0

Toolchain:

  • Julia: 1.11.4
  • LLVM: 16.0.6
@Yongda1
Copy link
Author

Yongda1 commented Mar 24, 2025

sorry, I dont understand how to use Runic.jl to accept these changes on VS code. Could you show me how to do? Thanks.

Best regards
Jim

@maleadt
Copy link
Member

maleadt commented Mar 24, 2025

how to use Runic.jl to accept these changes on VS code

There's nothing you need to do. Runic.jl is a code formatter, and that comment is intended for the PR author (i.e., me).

@Yongda1
Copy link
Author

Yongda1 commented Mar 24, 2025

Thank you.

@Yongda1
Copy link
Author

Yongda1 commented Mar 24, 2025

Excuse me, so, I just need wait for the next version?

@maleadt
Copy link
Member

maleadt commented Mar 24, 2025

You can check out the master branch of CUDA.jl for the time being.

@Yongda1
Copy link
Author

Yongda1 commented Mar 24, 2025

Thanks. Could you tell me how to install the master branch on VS Code? I only know Pkg.add("CUDA").

@maleadt
Copy link
Member

maleadt commented Mar 24, 2025

https://pkgdocs.julialang.org/v1/managing-packages/#Adding-packages

If a branch (or a certain commit) of Example has a hotfix that is not yet included in a registered version, we can explicitly track that branch (or commit) by appending #branchname (or #commitSHA1) to the package name

@Yongda1
Copy link
Author

Yongda1 commented Mar 24, 2025

Thanks.
I used this line,
Pkg.add(url="https://github.com/JuliaGPU/CUDA.jl")
However, when i am running the following codes, I still got error.

import Pkg
using Test
Pkg.add(url="https://github.com/JuliaGPU/CUDA.jl")
#CUDA.versioninfo()
N = 100
x_d = CUDA.fill(1.0f0, N)  # a vector stored on the GPU filled with 1.0 (Float32)
y_d = CUDA.fill(2.0f0, N)  # a vector stored on the GPU filled with 2.0
y_d .+= x_d
@test all(Array(y_d) .== 3.0f0)
ERROR: 
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:35
  [2] CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\lib\cudadrv\module.jl:69
  [3] CuModule
    @ C:\Users\Administrator\.julia\packages\CUDA\sWPBr\lib\cudadrv\module.jl:49 [inlined]
  [4] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{image::Vector{UInt8}, entry::String})
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\compilation.jl:414
  [5] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler C:\Users\Administrator\.julia\packages\GPUCompiler\OGnEB\src\execution.jl:262
  [6] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler C:\Users\Administrator\.julia\packages\GPUCompiler\OGnEB\src\execution.jl:151
  [7] macro expansion
    @ C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\execution.jl:373 [inlined]
  [8] macro expansion
    @ .\lock.jl:273 [inlined]
  [9] cufunction(f::GPUArrays.var"#gpu_broadcast_kernel_linear#38", tt::Type{…}; kwargs::@Kwargs{…})
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\execution.jl:368
 [10] macro expansion
    @ C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\execution.jl:112 [inlined]
 [11] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\CUDAKernels.jl:103
 [12] _copyto!
    @ C:\Users\Administrator\.julia\packages\GPUArrays\uiVyU\src\host\broadcast.jl:71 [inlined]
 [13] materialize!
    @ C:\Users\Administrator\.julia\packages\GPUArrays\uiVyU\src\host\broadcast.jl:38 [inlined]
 [14] materialize!(dest::CuArray{…}, bc::Base.Broadcast.Broadcasted{…})
    @ Base.Broadcast .\broadcast.jl:880
 [15] top-level scope
    @ e:\Juliatest\cudatest.jl:8

caused by: CUDA error: no kernel image is available for execution on the device (code 209, ERROR_NO_BINARY_FOR_GPU)
Stacktrace:
  [1] checked_cuModuleLoadDataEx(_module::Base.RefValue{…}, image::Ptr{…}, numOptions::Int64, options::Vector{…}, optionValues::Vector{…})
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\lib\cudadrv\module.jl:28
  [2] CuModule(data::Vector{UInt8}, options::Dict{CUDA.CUjit_option_enum, Any})
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\lib\cudadrv\module.jl:60
  [3] CuModule
    @ C:\Users\Administrator\.julia\packages\CUDA\sWPBr\lib\cudadrv\module.jl:49 [inlined]
  [4] link(job::GPUCompiler.CompilerJob, compiled::@NamedTuple{image::Vector{UInt8}, entry::String})
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\compilation.jl:414
  [5] actual_compilation(cache::Dict{…}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{…}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler C:\Users\Administrator\.julia\packages\GPUCompiler\OGnEB\src\execution.jl:262
  [6] cached_compilation(cache::Dict{…}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{…}, compiler::Function, linker::Function)
    @ GPUCompiler C:\Users\Administrator\.julia\packages\GPUCompiler\OGnEB\src\execution.jl:151
  [7] macro expansion
    @ C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\execution.jl:373 [inlined]
  [8] macro expansion
    @ .\lock.jl:273 [inlined]
  [9] cufunction(f::GPUArrays.var"#gpu_broadcast_kernel_linear#38", tt::Type{…}; kwargs::@Kwargs{…})
    @ CUDA C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\execution.jl:368
 [10] macro expansion
    @ C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\compiler\execution.jl:112 [inlined]
 [11] (::KernelAbstractions.Kernel{…})(::CuArray{…}, ::Vararg{…}; ndrange::Tuple{…}, workgroupsize::Nothing)
    @ CUDA.CUDAKernels C:\Users\Administrator\.julia\packages\CUDA\sWPBr\src\CUDAKernels.jl:103
 [12] _copyto!
    @ C:\Users\Administrator\.julia\packages\GPUArrays\uiVyU\src\host\broadcast.jl:71 [inlined]
 [13] materialize!
    @ C:\Users\Administrator\.julia\packages\GPUArrays\uiVyU\src\host\broadcast.jl:38 [inlined]
 [14] materialize!(dest::CuArray{…}, bc::Base.Broadcast.Broadcasted{…})
    @ Base.Broadcast .\broadcast.jl:880
 [15] top-level scope
    @ e:\Juliatest\cudatest.jl:8
Some type information was truncated. Use `show(err)` to see complete types.

@maleadt maleadt reopened this Mar 24, 2025
@maleadt
Copy link
Member

maleadt commented Mar 24, 2025

Sorry, I hadn't looked closely enough at the error message; this isn't simply a compatibility issue.

@maleadt
Copy link
Member

maleadt commented Mar 25, 2025

I'm afraid this may have to wait until I get a hold on Blackwell hardware, which hopefully happens in a couple of weeks. In the mean time, if somebody's interested feel free to take a look. We're probably invoking nvcc incorrectly, e.g., targeting a wrong compute capability or PTX ISA.

@Yongda1
Copy link
Author

Yongda1 commented Mar 26, 2025

It is ok. Thanks for your effort.

@zipeilee
Copy link

zipeilee commented Apr 4, 2025

I took a look at merge #2717 by @maleadt , and it seems that LLVM requires v20 or higher. Could it be that the LLVM version is too low? Currently, my Julia 1.11.4 comes with LLVM v16. I’m not sure if this information is helpful to you, or if there is a way to resolve this issue.

Additionally, I can provide my cuda.versioninfo() output for my 5070 Ti if needed.

CUDA runtime 12.8, artifact installation
CUDA driver 12.8
NVIDIA driver 570.133.7

CUDA libraries: 
- CUBLAS: 12.8.4
- CURAND: 10.3.9
- CUFFT: 11.3.3
- CUSOLVER: 11.7.3
- CUSPARSE: 12.5.8
- CUPTI: 2025.1.1 (API 26.0.0)
- NVML: 12.0.0+570.133.7

Julia packages: 
- CUDA: 5.7.1
- CUDA_Driver_jll: 0.12.1+1
- CUDA_Runtime_jll: 0.16.1+0

Toolchain:
- Julia: 1.11.4
- LLVM: 16.0.6

2 devices:
  0: NVIDIA GeForce RTX 5070 Ti (sm_120, 9.878 GiB / 15.921 GiB available)
  1: NVIDIA GeForce RTX 3070 Ti (sm_86, 6.953 GiB / 8.000 GiB available)

@maleadt
Copy link
Member

maleadt commented Apr 9, 2025

It seems that LLVM requires v20 or higher.

Not necessarily; we invoke ptxas manually using a higher .version attribute:

# if LLVM couldn't target the requested PTX ISA, bump it in the assembly.
if job.config.target.ptx != job.config.params.ptx
ptx = job.config.params.ptx
asm = replace(asm, r"(\.version .+)" => ".version $(ptx.major).$(ptx.minor)")
end

@Yongda1
Copy link
Author

Yongda1 commented Apr 9, 2025

Thanks. I will try it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants