Skip to content

Commit

Permalink
feat(ml): introduce support of onnxruntime-rocm for AMD GPU
Browse files Browse the repository at this point in the history
  • Loading branch information
Zelnes committed Jul 15, 2024
1 parent 04d0f57 commit 18f5d4d
Show file tree
Hide file tree
Showing 14 changed files with 261 additions and 16 deletions.
7 changes: 7 additions & 0 deletions .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,13 @@ jobs:
device: cuda
suffix: -cuda

- image: immich-machine-learning
context: machine-learning
file: machine-learning/Dockerfile
platforms: linux/amd64
device: rocm
suffix: -rocm

- image: immich-machine-learning
context: machine-learning
file: machine-learning/Dockerfile
Expand Down
4 changes: 2 additions & 2 deletions docker/docker-compose.dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,12 +75,12 @@ services:
image: immich-machine-learning-dev:latest
# extends:
# file: hwaccel.ml.yml
# service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference
# service: cpu # set to one of [armnn, cuda, rocm, openvino, openvino-wsl] for accelerated inference
build:
context: ../machine-learning
dockerfile: Dockerfile
args:
- DEVICE=cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference
- DEVICE=cpu # set to one of [armnn, cuda, rocm, openvino, openvino-wsl] for accelerated inference
ports:
- 3003:3003
volumes:
Expand Down
4 changes: 2 additions & 2 deletions docker/docker-compose.prod.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,12 @@ services:
image: immich-machine-learning:latest
# extends:
# file: hwaccel.ml.yml
# service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference
# service: cpu # set to one of [armnn, cuda, rocm, openvino, openvino-wsl] for accelerated inference
build:
context: ../machine-learning
dockerfile: Dockerfile
args:
- DEVICE=cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference
- DEVICE=cpu # set to one of [armnn, cuda, rocm, openvino, openvino-wsl] for accelerated inference
volumes:
- model-cache:/cache
env_file:
Expand Down
4 changes: 2 additions & 2 deletions docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,12 +29,12 @@ services:

immich-machine-learning:
container_name: immich_machine_learning
# For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
# For hardware acceleration, add one of -[armnn, cuda, rocm, openvino] to the image tag.
# Example tag: ${IMMICH_VERSION:-release}-cuda
image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
# extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
# file: hwaccel.ml.yml
# service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
# service: cpu # set to one of [armnn, cuda, rocm, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
volumes:
- model-cache:/cache
env_file:
Expand Down
7 changes: 7 additions & 0 deletions docker/hwaccel.ml.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,13 @@ services:
capabilities:
- gpu

rocm:
group_add:
- video
devices:
- /dev/dri:/dev/dri
- /dev/kfd:/dev/kfd

openvino:
device_cgroup_rules:
- 'c 189:* rmw'
Expand Down
9 changes: 7 additions & 2 deletions docs/docs/features/ml-hardware-acceleration.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ You do not need to redo any machine learning jobs after enabling hardware accele

- ARM NN (Mali)
- CUDA (NVIDIA GPUs with [compute capability](https://developer.nvidia.com/cuda-gpus) 5.2 or higher)
- ROCM (AMD GPUs)
- OpenVINO (Intel discrete GPUs such as Iris Xe and Arc)

## Limitations
Expand Down Expand Up @@ -40,6 +41,10 @@ You do not need to redo any machine learning jobs after enabling hardware accele
- The installed driver must be >= 535 (it must support CUDA 12.2).
- On Linux (except for WSL2), you also need to have [NVIDIA Container Toolkit][nvct] installed.

#### ROCM

- The GPU must be supported by ROCM (or use `HSA_OVERRIDE_GFX_VERSION=<a supported version, ie 10.3.0>`)

#### OpenVINO

- The server must have a discrete GPU, i.e. Iris Xe or Arc. Expect issues when attempting to use integrated graphics.
Expand All @@ -49,7 +54,7 @@ You do not need to redo any machine learning jobs after enabling hardware accele

1. If you do not already have it, download the latest [`hwaccel.ml.yml`][hw-file] file and ensure it's in the same folder as the `docker-compose.yml`.
2. In the `docker-compose.yml` under `immich-machine-learning`, uncomment the `extends` section and change `cpu` to the appropriate backend.
3. Still in `immich-machine-learning`, add one of -[armnn, cuda, openvino] to the `image` section's tag at the end of the line.
3. Still in `immich-machine-learning`, add one of -[armnn, cuda, rocm, openvino] to the `image` section's tag at the end of the line.
4. Redeploy the `immich-machine-learning` container with these updated settings.

#### Single Compose File
Expand Down Expand Up @@ -95,7 +100,7 @@ immich-machine-learning:
Once this is done, you can redeploy the `immich-machine-learning` container.

:::info
You can confirm the device is being recognized and used by checking its utilization (via `nvtop` for CUDA, `intel_gpu_top` for OpenVINO, etc.). You can also enable debug logging by setting `IMMICH_LOG_LEVEL=debug` in the `.env` file and restarting the `immich-machine-learning` container. When a Smart Search or Face Detection job begins, you should see a log for `Available ORT providers` containing the relevant provider. In the case of ARM NN, the absence of a `Could not load ANN shared libraries` log entry means it loaded successfully.
You can confirm the device is being recognized and used by checking its utilization (via `nvtop` for CUDA, `radeontop` for ROCM, `intel_gpu_top` for OpenVINO, etc.). You can also enable debug logging by setting `IMMICH_LOG_LEVEL=debug` in the `.env` file and restarting the `immich-machine-learning` container. When a Smart Search or Face Detection job begins, you should see a log for `Available ORT providers` containing the relevant provider. In the case of ARM NN, the absence of a `Could not load ANN shared libraries` log entry means it loaded successfully.
:::

[hw-file]: https://github.com/immich-app/immich/releases/latest/download/hwaccel.ml.yml
Expand Down
4 changes: 2 additions & 2 deletions docs/docs/guides/remote-machine-learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ name: immich_remote_ml
services:
immich-machine-learning:
container_name: immich_machine_learning
# For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
# For hardware acceleration, add one of -[armnn, cuda, rocm, openvino] to the image tag.
# Example tag: ${IMMICH_VERSION:-release}-cuda
image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
# extends:
# file: hwaccel.ml.yml
# service: # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
# service: # set to one of [armnn, cuda, rocm, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
volumes:
- model-cache:/cache
restart: always
Expand Down
47 changes: 46 additions & 1 deletion machine-learning/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,40 @@ RUN mkdir /opt/armnn && \
cd /opt/ann && \
sh build.sh

# Warning: 57.2Gb of disk space required to pull this image
# https://github.com/microsoft/onnxruntime/blob/main/dockerfiles/Dockerfile.rocm
FROM rocm/dev-ubuntu-22.04:6.1.2-complete as builder-rocm

WORKDIR /code

RUN apt-get update && apt-get install -y --no-install-recommends wget git python3.10-venv
# Install same version as the Dockerfile provided by onnxruntime
RUN wget https://github.com/Kitware/CMake/releases/download/v3.27.3/cmake-3.27.3-linux-x86_64.sh && \
chmod +x cmake-3.27.3-linux-x86_64.sh && \
mkdir -p /code/cmake-3.27.3-linux-x86_64 && \
./cmake-3.27.3-linux-x86_64.sh --skip-license --prefix=/code/cmake-3.27.3-linux-x86_64 && \
rm cmake-3.27.3-linux-x86_64.sh

ENV PATH /code/cmake-3.27.3-linux-x86_64/bin:${PATH}

# Prepare onnxruntime repository & build onnxruntime
RUN git clone --single-branch --branch v1.18.1 --recursive "https://github.com/Microsoft/onnxruntime" onnxruntime
WORKDIR /code/onnxruntime
# EDIT PR
# While there's still this PR open, we need to compile on the branch of the PR
# https://github.com/microsoft/onnxruntime/pull/19567
COPY ./rocm-PR19567.patch /tmp/
RUN git apply /tmp/rocm-PR19567.patch
# END EDIT PR
RUN /bin/sh ./dockerfiles/scripts/install_common_deps.sh
# I ran into a compilation error when parallelizing the build
# I used 12 threads to build onnxruntime, but it needs more than 16GB of RAM, and that's the amount of RAM I have on my machine
# I lowered the number of threads to 8, and it worked
# Even with 12 threads, the compilation took more than 1,5 hours to fail
RUN ./build.sh --allow_running_as_root --config Release --build_wheel --update --build --parallel 9 --cmake_extra_defines\
ONNXRUNTIME_VERSION=1.18.1 --use_rocm --rocm_home=/opt/rocm
RUN mv /code/onnxruntime/build/Linux/Release/dist/*.whl /opt/

FROM builder-${DEVICE} as builder

ARG DEVICE
Expand All @@ -34,6 +68,9 @@ RUN poetry config installer.max-workers 10 && \
RUN python3 -m venv /opt/venv

COPY poetry.lock pyproject.toml ./
RUN if [ "$DEVICE" = "rocm" ]; then \
poetry add /opt/onnxruntime_rocm-*.whl; \
fi
RUN poetry install --sync --no-interaction --no-ansi --no-root --with ${DEVICE} --without dev

FROM python:3.11-slim-bookworm@sha256:17ec9dc2367aa748559d0212f34665ec4df801129de32db705ea34654b5bc77a as prod-cpu
Expand Down Expand Up @@ -70,10 +107,18 @@ COPY --from=builder-armnn \
/opt/ann/build.sh \
/opt/armnn/

FROM rocm/dev-ubuntu-22.04:6.1.2-complete as prod-rocm


FROM prod-${DEVICE} as prod

ARG DEVICE

RUN apt-get update && \
apt-get install -y --no-install-recommends tini libmimalloc2.0 && \
if [ "${DEVICE}" != "rocm" ]; then \
extra=libmimalloc2.0; \
fi && \
apt-get install -y --no-install-recommends tini "${extra}" && \
rm -rf /var/lib/apt/lists/*

WORKDIR /usr/src/app
Expand Down
4 changes: 2 additions & 2 deletions machine-learning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

This project uses [Poetry](https://python-poetry.org/docs/#installation), so be sure to install it first.
Running `poetry install --no-root --with dev --with cpu` will install everything you need in an isolated virtual environment.
CUDA and OpenVINO are supported as acceleration APIs. To use them, you can replace `--with cpu` with either of `--with cuda` or `--with openvino`. In the case of CUDA, a [compute capability](https://developer.nvidia.com/cuda-gpus) of 5.2 or higher is required.
CUDA, ROCM and OpenVINO are supported as acceleration APIs. To use them, you can replace `--with cpu` with either of `--with cuda`, `--with rocm` or `--with openvino`. In the case of CUDA, a [compute capability](https://developer.nvidia.com/cuda-gpus) of 5.2 or higher is required.

To add or remove dependencies, you can use the commands `poetry add $PACKAGE_NAME` and `poetry remove $PACKAGE_NAME`, respectively.
Be sure to commit the `poetry.lock` and `pyproject.toml` files with `poetry lock --no-update` to reflect any changes in dependencies.
Expand Down Expand Up @@ -37,4 +37,4 @@ This project utilizes facial recognition models from the [InsightFace](https://g
## License and Use Restrictions
We have received permission to use the InsightFace facial recognition models in our project, as granted via email by Jia Guo ([email protected]) on 18th March 2023. However, it's important to note that this permission does not extend to the redistribution or commercial use of their models by third parties. Users and developers interested in using these models should review the licensing terms provided in the InsightFace GitHub repository.

For more information on the capabilities of the InsightFace models and to ensure compliance with their license, please refer to their [official repository](https://github.com/deepinsight/insightface). Adhering to the specified licensing terms is crucial for the respectful and lawful use of their work.
For more information on the capabilities of the InsightFace models and to ensure compliance with their license, please refer to their [official repository](https://github.com/deepinsight/insightface). Adhering to the specified licensing terms is crucial for the respectful and lawful use of their work.
2 changes: 1 addition & 1 deletion machine-learning/app/models/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@
}


SUPPORTED_PROVIDERS = ["CUDAExecutionProvider", "OpenVINOExecutionProvider", "CPUExecutionProvider"]
SUPPORTED_PROVIDERS = ["CUDAExecutionProvider", "ROCMExecutionProvider", "OpenVINOExecutionProvider", "CPUExecutionProvider"]


def get_model_source(model_name: str) -> ModelSource | None:
Expand Down
2 changes: 1 addition & 1 deletion machine-learning/app/sessions/ort.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ def _provider_options_default(self) -> list[dict[str, Any]]:
options = []
for provider in self.providers:
match provider:
case "CPUExecutionProvider" | "CUDAExecutionProvider":
case "CPUExecutionProvider" | "CUDAExecutionProvider"| "ROCMExecutionProvider":
option = {"arena_extend_strategy": "kSameAsRequested"}
case "OpenVINOExecutionProvider":
option = {"device_type": "GPU_FP32", "cache_dir": (self.model_path.parent / "openvino").as_posix()}
Expand Down
2 changes: 1 addition & 1 deletion machine-learning/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 5 additions & 0 deletions machine-learning/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,11 @@ optional = true
[tool.poetry.group.cuda.dependencies]
onnxruntime-gpu = {version = "^1.17.0", source = "cuda12"}

[tool.poetry.group.rocm]
optional = true

[tool.poetry.group.rocm.dependencies]

[tool.poetry.group.openvino]
optional = true

Expand Down
Loading

0 comments on commit 18f5d4d

Please sign in to comment.