-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random boot fail to initialize with timeout or reset required #709
Comments
I think some AM5 Boards had problems on old BIOS Versions with 40xx gen. Did you try to upgrade to the latest available BIOS? On my machine (9950X, Asrock x670E, 4070 Super) I do not see such issues. |
I'm using the latest BIOS, also tried older BIOS, but same result. Also changing BIOS settings (disabling TPM, TSME, and other things). As a last resort I plugged a RTX3060, and it works fine. I moved to try other GPU because after installed Windows 10 (Using latest drivers) the system boot but get poor performance and system freeze and also get stable after some random reboots. With RTX3060 two OS works fine, as expected. I suspect that GPU could be faulty, but I'm not sure because when they boot all run right, I can play games (loading the GPU at 60-70) for more than 4 hours without any issue. Also readed that a faulty 12VHPWR connection could cause this issues, and double checked the connection and tried with other cable. Same result. The power supply is sufficcient (1000W) ATX 5.0 certified. My last suspect is that could be some incompatibility between MB and GPU, but I'm not sure. I think is strange because MB and GPU are from same manufacturer and not a very new hardware. Now, really, I'not sure if was a driver bug, faulty hardware or some sort of hardware/firmware incompatibility. I'm continuing investigating. |
I'm in conversations with manufacturer because I'm very sure that is a hardware incompatibility between GPU and Motherboard that cause an improper initialization of GPU internals. The issue also occurs in Windows, with latest drivers, taking sometimes a long time to boot resulting in a very very poor performance after booting in Windows. The GPU is fine, tested on other system and check it with different tools (including nvidia MODS/MATS tools) without any error. Separated tested the Motherboard and CPU with other GPU and everything works as expected on Linux. So I will close that issue because isn't related to a driver bug. |
NVIDIA Open GPU Kernel Modules Version
560.35.03
Please confirm this issue does not happen with the proprietary driver (of the same version). This issue tracker is only for bugs specific to the open kernel driver.
Operating System and Version
Gentoo Linux
Kernel Release
6.6.52
Please confirm you are running a stable release kernel (e.g. not a -rc). We do not accept bug reports for unreleased kernels.
Hardware: GPU
NVIDIA GeForce RTX 4070 Ti SUPER
Describe the bug
In a generally way nvidia driver failed to load properly. Sometimes with a error "RESET REQUIRED", other with timeout in communication with GPU.
I tried other kernels (6.8 and 6.11) with same results, and also propietary driver with/o GSP firmware loading, in all cases the result is the same.
It's strange and I cannot isolate the root of cause, so I will try to explain the case.
I changed motherboard and CPU, from an Gigabyte X570 and Ryzen 3700X to an Gigabyte X670 adn Ryzen 9700X. The GPU are the same, 4070Ti Super. Previously the system works right with same OS and 560.35.03 driver.
After hardware upgrade, I cannot boot in a seamless way, most of the times GPU initialization failed. Made a lot of trial and error (different kernel and nvidia driver config) with same result.
The driver produce two types of errors (I don't know why), one is a GPU communication timeout and other related to reset required.
Tried closed source driver (same issues) and disabling GSP firmware loading.
The most intrigguing thing is that I can boot properly after some tries, generally 2 or 3, combining soft reset with cool boot. When I get a a "RESET ERROR" I do a cool boot and when get a "TIMEOUT" I do a soft reset.
After get a proper init, things go right.
I think that exists a issue in initialization order or logic in the combination of AGESA.
The most stable configuration (also need some reboots to work) is using closed source with disabled GSP firmware.
If you need a proper details or logs, please tell me what you need.
PD: I post here because as I understand, the open-source driver is the default option on 560 driver.
To Reproduce
Almost, on any boot
Bug Incidence
Always
nvidia-bug-report.log.gz
nvidia-bug-report.log.gz
More Info
At least a consistent boot, but a full fix would be great.
The text was updated successfully, but these errors were encountered: