You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Basically, when Incus invokes Nvidia hook defined in LXC, the hook returns with a non-zero exit status. When cofinguring /etc/nvidia-container-runtime/config.toml to output debugging information I get the following logs:
-- WARNING, the following logs are for debugging purposes only --
I0924 03:55:25.997466 4 nvc.c:393] initializing library context (version=1.16.1, build=4c2494f16573b585788a42e9c7bee76ecd48c73d)
I0924 03:55:25.997530 4 nvc.c:364] using root /
I0924 03:55:25.997544 4 nvc.c:365] using ldcache /etc/ld.so.cache
I0924 03:55:25.997557 4 nvc.c:366] using unprivileged user 0:0
I0924 03:55:25.997586 4 nvc.c:410] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0924 03:55:25.997669 4 nvc.c:412] dxcore initialization failed, continuing assuming a non-WSL environment
I0924 03:55:25.998002 21 rpc.c:71] starting driver rpc service
I0924 03:55:26.004017 4 rpc.c:135] driver rpc service terminated with signal 15
I0924 03:55:26.004082 4 nvc.c:452] shutting down library context
The error says that driver rpc service terminated with signal 15 (SIGTERM) and nothing more, I'm not sure how to troubleshoot this but at this point I cannot tell if it's a bug of the Nvidia container toolkit or a combination of the Nvidia Hook.
I tried using Podman with CDI and I succeded runing CUDA loads inside a container so it doesn´t seem to be a first a driver problem. Tried with different driver versions as well.
The text was updated successfully, but these errors were encountered:
Context of the issue here: https://discuss.linuxcontainers.org/t/nvidia-hook-not-working-with-opensuse-leap-15-6/21686
Basically, when Incus invokes Nvidia hook defined in LXC, the hook returns with a non-zero exit status. When cofinguring /etc/nvidia-container-runtime/config.toml to output debugging information I get the following logs:
The error says that driver rpc service terminated with signal 15 (SIGTERM) and nothing more, I'm not sure how to troubleshoot this but at this point I cannot tell if it's a bug of the Nvidia container toolkit or a combination of the Nvidia Hook.
I tried using Podman with CDI and I succeded runing CUDA loads inside a container so it doesn´t seem to be a first a driver problem. Tried with different driver versions as well.
The text was updated successfully, but these errors were encountered: