From 16f5682ab945b136409f360a6d96d71f12cb4ba9 Mon Sep 17 00:00:00 2001 From: Daniel J Walsh Date: Tue, 4 Feb 2025 09:27:18 -0500 Subject: [PATCH] Add information for Podman as well as Docker We believe Podman is a viable alternative to Docker. Lots of people have moved to Podman, and the project should make sure people adopt it. Signed-off-by: Daniel J Walsh --- README.md | 4 +-- docs/build.md | 22 ++++++++---- docs/{docker.md => container.md} | 60 +++++++++++++++++++++++++++----- 3 files changed, 70 insertions(+), 16 deletions(-) rename docs/{docker.md => container.md} (75%) diff --git a/README.md b/README.md index d68330d2a11f60..119b6f881dad8e 100644 --- a/README.md +++ b/README.md @@ -242,7 +242,7 @@ The project also includes many example programs and tools using the `llama` libr - Clone this repository and build locally, see [how to build](docs/build.md) - On MacOS or Linux, install `llama.cpp` via [brew, flox or nix](docs/install.md) -- Use a Docker image, see [documentation for Docker](docs/docker.md) +- Use a container image, see [documentation for containers](docs/container.md) - Download pre-built binaries from [releases](https://github.com/ggerganov/llama.cpp/releases) ## Obtaining and quantizing models @@ -500,7 +500,7 @@ To learn more about model quantization, [read this documentation](examples/quant #### Development documentation - [How to build](docs/build.md) -- [Running on Docker](docs/docker.md) +- [Running in a container](docs/container.md) - [Build on Android](docs/android.md) - [Performance troubleshooting](docs/development/token_generation_performance_tips.md) - [GGML tips & tricks](https://github.com/ggerganov/llama.cpp/wiki/GGML-Tips-&-Tricks) diff --git a/docs/build.md b/docs/build.md index dd6495028b3280..5479d8ffcd7e66 100644 --- a/docs/build.md +++ b/docs/build.md @@ -94,13 +94,13 @@ Building through oneAPI compilers will make avx_vnni instruction set available f - Using manual oneAPI installation: By default, `GGML_BLAS_VENDOR` is set to `Generic`, so if you already sourced intel environment script and assign `-DGGML_BLAS=ON` in cmake, the mkl version of Blas will automatically been selected. Otherwise please install oneAPI and follow the below steps: ```bash - source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit docker image, only required for manual installation + source /opt/intel/oneapi/setvars.sh # You can skip this step if in oneapi-basekit container image, only required for manual installation cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DGGML_NATIVE=ON cmake --build build --config Release ``` -- Using oneAPI docker image: - If you do not want to source the environment vars and install oneAPI manually, you can also build the code using intel docker container: [oneAPI-basekit](https://hub.docker.com/r/intel/oneapi-basekit). Then, you can use the commands given above. +- Using oneAPI container image: + If you do not want to source the environment vars and install oneAPI manually, you can also build the code using intel container: [oneAPI-basekit](https://hub.docker.com/r/intel/oneapi-basekit). Then, you can use the commands given above. Check [Optimizing and Running LLaMA2 on IntelĀ® CPU](https://www.intel.com/content/www/us/en/content-details/791610/optimizing-and-running-llama2-on-intel-cpu.html) for more information. @@ -280,19 +280,29 @@ cmake -B build -DGGML_VULKAN=ON cmake --build build --config Release ``` -**With docker**: +**With containers**: You don't need to install Vulkan SDK. It will be installed inside the container. -```sh # Build the image + +
Docker example```sh docker build -t llama-cpp-vulkan --target light -f .devops/vulkan.Dockerfile . +```
+
Podman example```sh +podman build -t llama-cpp-vulkan --target light -f .devops/vulkan.Dockerfile . +```
+ # Then, use it: docker run -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-vulkan -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 + +or + +podman run --security-opt label=disable -it --rm -v "$(pwd):/app:Z" --device /dev/dri/renderD128:/dev/dri/renderD128 --device /dev/dri/card1:/dev/dri/card1 llama-cpp-vulkan -m "/app/models/YOUR_MODEL_FILE" -p "Building a website can be done in 10 simple steps:" -n 400 -e -ngl 33 ``` -**Without docker**: +**Without a container**: Firstly, you need to make sure you have installed [Vulkan SDK](https://vulkan.lunarg.com/doc/view/latest/linux/getting_started_ubuntu.html) diff --git a/docs/docker.md b/docs/container.md similarity index 75% rename from docs/docker.md rename to docs/container.md index dac9a9ec164ffe..a60fd9ea8e7d9a 100644 --- a/docs/docker.md +++ b/docs/container.md @@ -1,11 +1,11 @@ -# Docker +# Container ## Prerequisites -* Docker must be installed and running on your system. +* A container engine, ie Docker/Podman, must be installed and running on your system. * Create a folder to store big models & intermediate files (ex. /llama/models) ## Images -We have three Docker images available for this project: +We have three container images available for this project: 1. `ghcr.io/ggerganov/llama.cpp:full`: This image includes both the main executable file and the tools to convert LLaMA models into ggml and convert into 4-bit quantization. (platforms: `linux/amd64`, `linux/arm64`) 2. `ghcr.io/ggerganov/llama.cpp:light`: This image only includes the main executable file. (platforms: `linux/amd64`, `linux/arm64`) @@ -27,13 +27,18 @@ The GPU enabled images are not currently tested by CI beyond being built. They a ## Usage -The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full docker image. +The easiest way to download the models, convert them to ggml and optimize them is with the --all-in-one command which includes the full container image. Replace `/path/to/models` below with the actual path where you downloaded the models. ```bash docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B ``` +or + +```bash +podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --all-in-one "/models/" 7B +``` On completion, you are ready to play! @@ -41,23 +46,39 @@ On completion, you are ready to play! docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 ``` +```bash +podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:full --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 +``` + or with a light image: ```bash docker run -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 ``` +or + +```bash +podman run --security-opt label=disable -v /path/to/models:/models ghcr.io/ggerganov/llama.cpp:light -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 +``` + or with a server image: ```bash docker run -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 ``` -## Docker With CUDA +or + +```bash +podman run --security-opt label=disable -v /path/to/models:/models -p 8000:8000 ghcr.io/ggerganov/llama.cpp:server -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 +``` + +## Container engines With CUDA Assuming one has the [nvidia-container-toolkit](https://github.com/NVIDIA/nvidia-container-toolkit) properly installed on Linux, or is using a GPU enabled cloud, `cuBLAS` should be accessible inside the container. -## Building Docker locally +## Building Container locally ```bash docker build -t local/llama.cpp:full-cuda --target full -f .devops/cuda.Dockerfile . @@ -65,6 +86,14 @@ docker build -t local/llama.cpp:light-cuda --target light -f .devops/cuda.Docker docker build -t local/llama.cpp:server-cuda --target server -f .devops/cuda.Dockerfile . ``` +or + +```bash +podman build -t local/llama.cpp:full-cuda --target full -f .devops/cuda.Dockerfile . +podman build -t local/llama.cpp:light-cuda --target light -f .devops/cuda.Dockerfile . +podman build -t local/llama.cpp:server-cuda --target server -f .devops/cuda.Dockerfile . +``` + You may want to pass in some different `ARGS`, depending on the CUDA environment supported by your container host, as well as the GPU architecture. The defaults are: @@ -88,17 +117,32 @@ docker run --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m / docker run --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1 ``` -## Docker With MUSA +or + +```bash +podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:full-cuda --run -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1 +podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:light-cuda -m /models/7B/ggml-model-q4_0.gguf -p "Building a website can be done in 10 simple steps:" -n 512 --n-gpu-layers 1 +podman run --security-opt label=disable --gpus all -v /path/to/models:/models local/llama.cpp:server-cuda -m /models/7B/ggml-model-q4_0.gguf --port 8000 --host 0.0.0.0 -n 512 --n-gpu-layers 1 +``` + +## Container engines With MUSA Assuming one has the [mt-container-toolkit](https://developer.mthreads.com/musa/native) properly installed on Linux, `muBLAS` should be accessible inside the container. -## Building Docker locally +## Building Container images locally ```bash docker build -t local/llama.cpp:full-musa --target full -f .devops/musa.Dockerfile . docker build -t local/llama.cpp:light-musa --target light -f .devops/musa.Dockerfile . docker build -t local/llama.cpp:server-musa --target server -f .devops/musa.Dockerfile . ``` +or + +```bash +podman build -t local/llama.cpp:full-musa --target full -f .devops/musa.Dockerfile . +podman build -t local/llama.cpp:light-musa --target light -f .devops/musa.Dockerfile . +podman build -t local/llama.cpp:server-musa --target server -f .devops/musa.Dockerfile . +``` You may want to pass in some different `ARGS`, depending on the MUSA environment supported by your container host, as well as the GPU architecture.