-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can not find device after 565+ on GH200 NVL2 #774
Comments
Could you also generate an nvidia-bug-report.log.gz for 560? It may help to compare logs between working and failing configurations. |
nvidia-bug-report.log.gz |
If I'm reading the log correctly, 560.35.05 has the same symptom as 570.86.15:
Once the GPU gets into this state, I think the problem will persist until a reboot. Could I trouble you to do the following for each of 560.35.05 and 570.86.15?
|
I did reboot between install, or nvidia-smi will show driver mismatch. I will try to perform the sequence again for both versions tonight at down time and get back to you. Thanks! |
|
NVIDIA Open GPU Kernel Modules Version
nvidia-driver-565-open(565.57.01), nvidia-driver-570-open (570.86.15)
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Ubuntu 24.04.1 LTS
Kernel Release
Linux 6.11.0-1002-nvidia-64k #2-Ubuntu SMP PREEMPT_DYNAMIC Wed Oct 23 19:17:25 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
GPU 0: NVIDIA GH200 144G HBM3e | GPU 1: NVIDIA GH200 144G HBM3e
Describe the bug
Installing ubuntu 24.04 on a new GH200 NVL2 system, using apt NVIDIA open kernel driver.
Both GPUs not found when using the nvidia-driver-565-open or nvidia-driver-570-open apt package.
But both GPU found when using nvidia-driver-560-open
To Reproduce
sudo apt install nvidia-driver-565-open
nvidia-smi
No devices were found
sudo apt install nvidia-driver-570-open
nvidia-smi
No devices were found
sudo apt install nvidia-driver-560-open
nvidia-smi
both GPUs found.
Both GPUs are at 96.00.A0.00.01 VBIOS
Which should be newer than 96.00.68.00.xx
https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-565-57-01/index.html#known-issues
nvidia-bug-report.log.gz
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
No response
The text was updated successfully, but these errors were encountered: