Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCL no /etc/OpenCL/vendors/nvidia.icd #682

Open
denisstrizhkin opened this issue Sep 7, 2024 · 2 comments
Open

OpenCL no /etc/OpenCL/vendors/nvidia.icd #682

denisstrizhkin opened this issue Sep 7, 2024 · 2 comments

Comments

@denisstrizhkin
Copy link

nvidia-container-toolkit does not expose /etc/OpenCL/vendors/nvidia.icd to containers currently. However, if I just copy this file from the host system into the container, then OpenCL works.

Is this the intended behavior for some particular reason? If not, I think it would be good to expose /etc/OpenCL/vendors/nvidia.icd to containers.

@qhaas
Copy link

qhaas commented Sep 11, 2024

This might be related to the vulkan icd issue

Update: I cannot get the ICD to automatically mount, but the libnvidia-opencl.so.1 library needed by the OpenCL ICD loader is in there. I suspect this fille name will be relatively stable given it tracks a symbolic link and thus can be placed inside the /etc/OpenCL/vendors/nvidia.icd file in the container image (assuming you stay in the nvidia ecosystem). But I agree, it should likely be auto-mounted.

Consider:

$ cat /etc/redhat-release 
Rocky Linux release 9.4 (Blue Onyx)

$ modinfo nvidia | grep ^version
version:        535.183.06

$ nvidia-container-toolkit --version
NVIDIA Container Runtime Hook version 1.16.1

$ docker version
Client: Docker Engine - Community
 Version:           27.2.1
...
Server: Docker Engine - Community
 Engine:
  Version:          27.2.1
...
 containerd:
  Version:          1.7.21
...
 runc:
  Version:          1.1.13
...
 docker-init:
  Version:          0.19.0
...

$ cat /etc/docker/daemon.json 
{
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

$ cat /etc/OpenCL/vendors/nvidia.icd 
libnvidia-opencl.so.1

$ file /usr/lib64/libnvidia-opencl.so.1
/usr/lib64/libnvidia-opencl.so.1: symbolic link to libnvidia-opencl.so.535.183.06

$ clinfo -l
Platform #0: NVIDIA CUDA
 `-- Device #0: Quadro P4000

$ cat opencl.Dockerfile 
FROM ubuntu:22.04

ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y clinfo && apt-get clean

RUN install -d /etc/OpenCL/vendors/ && \
    echo 'libnvidia-opencl.so.1' >> /etc/OpenCL/vendors/nvidia.icd
ENV NVIDIA_DRIVER_CAPABILITIES=all

$ docker build -t test:opencl -f opencl.Dockerfile .
...

$ docker run --runtime=nvidia --gpus all --rm -it test:opencl clinfo -l
Platform #0: NVIDIA CUDA
 `-- Device #0: Quadro P4000

@elezar
Copy link
Member

elezar commented Sep 23, 2024

Hi @denisstrizhkin, yes, I think this may be a similar issue in that we're not injecting the nvidia.icd file from the host or using an alternative mechanism such as the environment variables listed here https://github.com/KhronosGroup/OpenCL-ICD-Loader?tab=readme-ov-file#table-of-debug-environment-variables

One question I have, is the following. In the documentation it states that the full path to a library should be provided but your example uses only the library name. I this also supported?

Requiring the full path becomes an issue if the path in the container and the path on the host are different.

Update: To answer my own question: the loader uses dlopen under the hood, meaning that the library name should be fine assuming the library is in the Ldcache or locatable in the LD_LIBRARY_PATH: https://github.com/KhronosGroup/OpenCL-ICD-Loader/blob/3d27d7ca04d29fabe608a2372ce693601bcc4e81/loader/linux/icd_linux.c#L243-L251

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants