Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker install fails due to "missing" library #698

Closed
factor3 opened this issue May 30, 2024 · 11 comments
Closed

Docker install fails due to "missing" library #698

factor3 opened this issue May 30, 2024 · 11 comments
Labels
bug Something isn't working

Comments

@factor3
Copy link

factor3 commented May 30, 2024

Has this issue been opened before?
No, I do not see this issue in this bug list or the FAQ.

The "download stage appears to work without problems, but when I do the command:

docker compose --profile auto up --build

I get the following failure message:

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

I do have the nvidia drivers on my system:

sudo ubuntu-drivers list --gpgpu
nvidia-driver-555-open, (kernel modules provided by nvidia-dkms-555-open)
nvidia-driver-550-server, (kernel modules provided by linux-modules-nvidia-550-server-generic)
nvidia-driver-550, (kernel modules provided by linux-modules-nvidia-550-generic)
nvidia-driver-550-server-open, (kernel modules provided by linux-modules-nvidia-550-server-open-generic)
nvidia-driver-555, (kernel modules provided by nvidia-dkms-555)
nvidia-driver-550-open, (kernel modules provided by linux-modules-nvidia-550-open-generic)

and, as it turns out, the library actually exists on my system:

$ sudo ldconfig -p|grep libnvidia-ml.so.1
libnvidia-ml.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libnvidia-ml.so.1

which means that for some reason, the compose file is failing to load the library.

Which UI
auto

Hardware / Software

  • OS: Ubuntu
  • OS version: 22.04
  • Docker Version: 24.0.5
  • Docker compose version: 2.20.3
  • Repo version: from master
  • RAM: 64 Gbytes
  • GPU/VRAM: Nvidia GTX 4070 ti

Steps to Reproduce

  1. Use git to clone the stable diffusion:
  2. cd into the stable-diffusion-webui-docker directory
  3. Perform the download: docker compose --profile download up --build
  4. Perform the install: docker compose --profile auto up --build
  5. See the failure.
@factor3 factor3 added the bug Something isn't working label May 30, 2024
@AbdBarho
Copy link
Owner

AbdBarho commented May 30, 2024

Do you have the nvidia container toolkit installed? also might be related to #694

@factor3
Copy link
Author

factor3 commented May 30, 2024

Greetings, AbdBarho:

Thanks for the response. I did not have the container toolkit installed, but I did install it and restarted my system. I still get the error.

I am looking at #694, though I believe that the drivers on my system are the same as the ones that solved the problem there. I am looking more closely at this, though, in case I missed something.

@factor3
Copy link
Author

factor3 commented May 31, 2024

I have some questions:

Referencing #694: is the cuda-toolkit related at all to the nvidia container toolkit? The nvidia toolkit documentation says that the cuda toolkit is not necessary if the container toolkit is downloaded, but in #694 it is looking like part of the solution included using the cuda toolkit.

Also: they talked about downgrading the nvidia drivers (or the cuda drivers?) in order for things to work properly. I do not know how to do the downgrading. Could someone point me to where I can learn how to do this?

One more point: it doesn't look like the problem described in #694 is the some as the problem I describe here. It is similar in that a Docker container is failing to access the drivers through an API, but what is happening in my case is that a container is failing to access an existing library and saying that that library is "missing". There are no version issues involved with my problem -- unless the container needs a different driver version in order to access the library. Is that the case? Does the container have to use a local nvidia driver in order to load nvidia libraries?

@MarioLiebisch
Copy link

Issue #694 is a failure during the build, not when starting the container. It's caused by driver version 555, but I guess you circumvented that issue, so maybe there might be both for everyone, we just don't know yet (since we fail building).

@factor3
Copy link
Author

factor3 commented Jun 2, 2024

MarioLiebisch:

Isn't my failure also occurring during a build, though it is a build not of the server but the UI? The command is:

docker compose --profile auto up --build

after all. The failure is occurring during the building of the container which, based on what I am seeing in the Dockerfile, is also building part of the code itself.

Still: the failure is different here from the failure in #694. That problem was solved (more or less) by downgrading the drivers used. If finding the existing library is dependent on properly working drivers, I may need to downgrade the drivers I am using. I have never done this sort of thing before (I always upgraded, never had a need to downgrade before). How do I do this?

@MarioLiebisch
Copy link

To downgrade use Display Driver Uninstaller and then just reinstall the older version, ideally with the network disconnected to make sure you don't get the new one automatically.

@factor3
Copy link
Author

factor3 commented Jun 5, 2024

I learned something interesting that folks here -- especially MarioLiebisch -- might find interesting.

I attempted to install another AI system in a docker container on my machine. It also uses the Nvidia drivers and the container toolkit.

This install did not do a build. It was a straightforward docker container.

It is generating the same error as the stable diffusion install: not finding the libnvidia-ml.so.1 library despite the fact that the library is installed.

I did further research. This is not a stable- diffusion problem. It is a problem with docker. Apparently,, there is some bug that is causing docker to fail to load that shared library.

I am looking further, to see if anyone found a solution. If I find one I will share it here. This problem has been plaguing docker users for several years now, and so far none of the fixes I have found appear to work. I will, however, continue to look...

@MarioLiebisch
Copy link

From my understanding it's an issue with the (now outdated) nvidia-cuda-toolkit, which is included as part of Docker Desktop (i.e. we need an update of that). As an alternative, you could probably install Docker and nvidia-cuda-toolkit inside WSL2, but you won't be able to use Docker Desktop that way.

@factor3
Copy link
Author

factor3 commented Jun 5, 2024

Greetings, MarioLeibisch:

Actually, I do not care about not using Docker Desktop, because I have never used it. I use the CLI, Dockge, and other services for managing my containers.

Could you point me to where I can learn how to install Docker and nvidia-cuda-toolkit inside WSL2? I do not know much about WSL2 (actually, I am unsure if I even know what it is).

@MarioLiebisch
Copy link

WSL2 is essentially a virtual machine running parallel to Windows using Hyper-V. Docker Desktop uses it, too, but with its own images etc. There's some added glue, so you've got your Windows file system mounted inside the Linux system (of your choice), you can execute Linux stuff on the Windows command line and vice versa etc. (including X11 GUIs).

If you installed Docker inside your Linux distribution, e.g. using apt install docker-ce, you should™ be able to just run apt install nvidia-cuda-toolkit (but you might have to pass a specific version depending on your platform). However, since I don't know your setup in specific and I mostly use Docker Desktop (outside a pure/native Linux server with no GPU), you might want to look elsewhere for help. Keep in mind you might waste disk space or even cause other issues by installing packages you don't need in the end etc. I can't really advice you here.

Easiest way to solve the issue would be to just downgrade the GPU drivers now and wait for an update.

@factor3
Copy link
Author

factor3 commented Jun 8, 2024

The problem has been solved.

I decided to reinstall ubuntu on my system. I did upgrade from V22.04 to V24.04, though I doubt that this upgrade affected anything. I only mention it for making a more thorough report.

I actually installed the nvidia driver before installing docker. I put the container toolkit on, then used an apt- based procedure to make certain that I was actually installing docker-ce. I think that this was critical, because the standard procedure for installing docker on ubuntu involves using snap -- and I do not believe that the docker installed by snap is the same as docker-ce.

After installing docker-ce, I began the procedure for installing stable diffusion. I did the download, then started the "auto". I got the same failure as before:

Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as ‘legacy’
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

This time, however, I decided (for grins and giggles) to install the CUDA toolkit. When I attempted to start stable diffusion again, it started up without difficulties.

I want to thank those who responded. Without your various suggestions, I would not have solved this problem.

@factor3 factor3 closed this as completed Jun 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants