Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
408225b
server: use random media marker (#21962)
ngxson Apr 15, 2026
b1be68e
[SYCL] Fix Q8_0 reorder: garbage on 2nd prompt + crash on full VRAM (…
PMZFX Apr 16, 2026
8612ed1
ci : Use ggml-org/ccache-action on RISC-V as well (#21632)
luhenry Apr 16, 2026
82677a6
ggml-webgpu: compute pass batching and removing profiling overhead (#…
reeselevine Apr 16, 2026
90fb96a
devops : added spirv-headers to nix (#21965)
yuannan Apr 16, 2026
5637536
ggml : implemented simd_gemm kernel for riscv vector extension (#20627)
rehan-10xengineer Apr 16, 2026
1e796eb
ggml-cpu: add 128-bit RVV implementation for Quantization Vector Dot …
rehan-10xengineer Apr 16, 2026
ae2d348
metal: Implement ROLL op (#21946)
kushagharahi Apr 16, 2026
3f7c29d
ggml: add graph_reused (#21764)
am17an Apr 16, 2026
03b3d07
Convert: Fix NemotronH Config Parsing (#21664)
anavp-nvidia Apr 16, 2026
b572d1e
codeowners: add team member comments (#21714)
0cc4m Apr 16, 2026
f772f6e
model : support NVFP4 tensors for Gemma4 (#21971)
CISC Apr 16, 2026
9db77a0
model : refactor QKV into common build_qkv and create_tensor_qkv help…
JoursBleu Apr 16, 2026
4adac43
server: tests: fetch random media marker via /apply-template (#21962)…
ServeurpersoCom Apr 16, 2026
e45dbde
opencl: add q5_K gemm and gemv kernels for Adreno (#21595)
shaofeiqi Apr 16, 2026
4fbdabd
model: using single llm_build per arch (#21970)
ngxson Apr 16, 2026
85dde8d
hexagon: optimize HMX matmul operations (#21071)
chraac Apr 16, 2026
089dd41
cmake: use glob to collect src/models sources (#22005)
ngxson Apr 16, 2026
30dce2c
cli : use get_media_marker (#22017)
CISC Apr 16, 2026
5e6c0e1
opencl: refactor q8_0 set_tensor and mul_mat host side dispatch for A…
lhez Apr 17, 2026
fcc7508
model : Gemma4 model type detection (#22027)
EZForever Apr 17, 2026
6990e2f
libs : rename libcommon -> libllama-common (#21936)
ggerganov Apr 17, 2026
268d61e
mtmd: add missing struct tag (#22023)
65a Apr 17, 2026
a279d0f
ci : add android arm64 build and release (#21647)
ykhrustalev Apr 17, 2026
b94050e
CUDA: use LRU based eviction for cuda graphs (#21611)
am17an Apr 17, 2026
45cac7c
ggml-webgpu: fix compiler warnings and refactor FlashAttention encodi…
reeselevine Apr 17, 2026
fd1c0ec
llama: fit ctx size for CPU only (#21568)
JohannesGaessler Apr 18, 2026
89a5474
convert : fix (ignore for now) typings errors (#22002)
CISC Apr 18, 2026
83d58e0
ci : free disk space for rocm release (#22012)
CISC Apr 18, 2026
59accc8
ggml-backend-meta: add multi-segment read support in get_tensor (#22063)
ssam18 Apr 18, 2026
23b8cc4
android : libcommon -> libllama-common (#22076)
CISC Apr 18, 2026
4f02d47
model : refactor bias tensor variable names (#22079)
CISC Apr 18, 2026
9e5647a
server: Expose `media_tag` on /props endpoint. (#22028)
cetarthoriphros Apr 18, 2026
91fef95
rpc : refactor the RPC transport (#21998)
rgerganov Apr 19, 2026
455d8e4
server : speculative checkpointing (#19493)
srogmann Apr 19, 2026
09b4efa
cmake: remove CMP0194 policy to restore MSVC builds (#21934)
texasich Apr 19, 2026
8685e7b
convert : support sentence-transformer 5.4 config files (#22087)
Bing-su Apr 19, 2026
037bfe3
ci : install spirv-headers for vulkan-cross (#22109)
CISC Apr 19, 2026
bcdcc10
ggml : reduce CPU overhead in meta backend (#22041)
gaugarg-nv Apr 19, 2026
1912407
mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos (breaking change…
ngxson Apr 19, 2026
471540a
HIP: Remove unesscary NCCL_CHECK (#21914)
IMbackK Apr 19, 2026
d5b780a
common/autoparser : allow space after tool call (#22073)
aldehir Apr 19, 2026
4eac5b4
CUDA: refactor mma data loading for AMD (#22051)
JohannesGaessler Apr 19, 2026
e365e65
vendor : update cpp-httplib to 0.42.0 (#21781)
cabelo Apr 19, 2026
9d49acb
server: rename --clear-idle to --cache-idle-slots (#21741)
yychyo Apr 20, 2026
788fcbc
[SYCL] Fix reorder MMVQ assert on unaligned vocab sizes (#22035)
PMZFX Apr 20, 2026
de71b5f
server : refactor "use checkpoint" logic (#22114)
ggerganov Apr 20, 2026
81df3f7
fix: GLM-DSA crash in llama-tokenize when using vocab_only (#22102)
ssam18 Apr 20, 2026
a678916
mtmd: refactor mtmd_decode_use_mrope (#22161)
ngxson Apr 20, 2026
a6cc43c
ggml-webgpu: updated matrix-vector multiplication (#21738)
neha-ha Apr 20, 2026
7f251fd
ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) (#21636)
pl752 Apr 20, 2026
fb19f94
TP: fix 0-sized tensor slices, AllReduce fallback (#21808)
JohannesGaessler Apr 20, 2026
fd6ae4c
Tensor-parallel: Fix delayed AllReduce on Gemma-4 MoE (#22129)
gaugarg-nv Apr 20, 2026
cf8b0db
server : remove /api endpoints (#22165)
ggerganov Apr 20, 2026
86f8daa
mtmd: correct get_n_pos / get_decoder_pos (#22175)
ngxson Apr 20, 2026
9789512
ggml-cuda: flush legacy pool on OOM and retry (#22155)
leonardHONG Apr 20, 2026
ff6b106
server : fix hardcoded proxy connection timeout in router mode (#1876…
xris99 Apr 21, 2026
cfe9838
fit-params : refactor + add option to output estimated memory per dev…
ggerganov Apr 21, 2026
041fe83
ggml : bump version to 0.10.0 (ggml/1463)
ggerganov Apr 21, 2026
4889afb
sync : ggml
ggerganov Apr 21, 2026
cd03ec7
llama-ext : fix exports (#22202)
ggerganov Apr 21, 2026
9998d88
mtmd: correct mtmd_decode_use_mrope() (#22188)
ngxson Apr 21, 2026
82209ef
vulkan: Support F16 OP_FILL (#22177)
jeffbolznv Apr 21, 2026
7fc1c4e
metal : workaround macOS GPU interactivity watchdog (#22216)
ggerganov Apr 21, 2026
606fa42
vendor : update cpp-httplib to 0.43.1 (#22143)
cabelo Apr 21, 2026
52f1096
openvino: driver setup, CI split, thread safety, and NPU optimization…
wine99 Apr 21, 2026
84652b8
arg : add --spec-default (#22223)
ggerganov Apr 21, 2026
98d2d28
mtmd: Add support for Reka Edge 2603 (#21616)
kwajiehao Apr 21, 2026
72d693e
spec : reset i_last when low acceptance streak occurs (#22168)
treo Apr 21, 2026
2248799
hexagon: fix missing v79 entry in libggml-htp.inf (#22194)
mengshengwu Apr 21, 2026
5a4cd67
Hexagon: DAIG op (#22195)
shreyajn Apr 21, 2026
04fe84b
server: allow cancel loading model (#21814)
ngxson Apr 21, 2026
f903fe2
Merge remote-tracking branch 'upstream/master' into experiment/upstre…
chad-loder Apr 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
spirv-headers,
openssl,
shaderc,
spirv-headers,
useBlas ?
builtins.all (x: !x) [
useCuda
Expand Down Expand Up @@ -147,6 +148,7 @@ effectiveStdenv.mkDerivation (finalAttrs: {
ninja
pkg-config
git
spirv-headers
]
++ optionals useCuda [
cudaPackages.cuda_nvcc
Expand Down
50 changes: 48 additions & 2 deletions .devops/openvino.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,19 @@ ARG OPENVINO_VERSION_MAJOR=2026.0
ARG OPENVINO_VERSION_FULL=2026.0.0.20965.c6d6a13a886
ARG UBUNTU_VERSION=24.04

# Optional proxy build arguments - empty by default
# Intel GPU driver versions. https://github.com/intel/compute-runtime/releases
ARG IGC_VERSION=v2.30.1
ARG IGC_VERSION_FULL=2_2.30.1+20950
ARG COMPUTE_RUNTIME_VERSION=26.09.37435.1
ARG COMPUTE_RUNTIME_VERSION_FULL=26.09.37435.1-0
ARG IGDGMM_VERSION=22.9.0

# Intel NPU driver versions. https://github.com/intel/linux-npu-driver/releases
ARG NPU_DRIVER_VERSION=v1.32.0
ARG NPU_DRIVER_FULL=v1.32.0.20260402-23905121947
ARG LIBZE1_VERSION=1.27.0-1~24.04~ppa2

# Optional proxy build arguments
ARG http_proxy=
ARG https_proxy=

Expand Down Expand Up @@ -78,13 +90,47 @@ ARG http_proxy
ARG https_proxy

RUN apt-get update \
&& apt-get install -y libgomp1 libtbb12 curl \
&& apt-get install -y libgomp1 libtbb12 curl wget ocl-icd-libopencl1 \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

# Install GPU drivers
ARG IGC_VERSION
ARG IGC_VERSION_FULL
ARG COMPUTE_RUNTIME_VERSION
ARG COMPUTE_RUNTIME_VERSION_FULL
ARG IGDGMM_VERSION
RUN mkdir /tmp/neo/ && cd /tmp/neo/ \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-core-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-opencl-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-ocloc-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-ocloc_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-opencl-icd-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-opencl-icd_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libigdgmm12_${IGDGMM_VERSION}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libze-intel-gpu1-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libze-intel-gpu1_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& dpkg --install *.deb \
&& rm -rf /tmp/neo/

# Install NPU drivers
ARG NPU_DRIVER_VERSION
ARG NPU_DRIVER_FULL
ARG LIBZE1_VERSION
RUN mkdir /tmp/npu/ && cd /tmp/npu/ \
&& wget https://github.com/intel/linux-npu-driver/releases/download/${NPU_DRIVER_VERSION}/linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz \
&& tar -xf linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz \
&& dpkg --install *.deb \
&& rm -rf /tmp/npu/

RUN cd /tmp \
&& wget https://snapshot.ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu/20260324T100000Z/pool/main/l/level-zero-loader/libze1_${LIBZE1_VERSION}_amd64.deb \
&& dpkg --install libze1_${LIBZE1_VERSION}_amd64.deb \
&& rm libze1_${LIBZE1_VERSION}_amd64.deb

COPY --from=build /app/lib/ /app/

### Full (all binaries)
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/build-android.yml
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ jobs:
distribution: zulu

- name: Setup Android SDK
uses: android-actions/setup-android@9fc6c4e9069bf8d3d10b2204b1fb8f6ef7065407 # v3
uses: android-actions/setup-android@40fd30fb8d7440372e1316f5d1809ec01dcd3699 # v4.0.1
with:
log-accepted-android-sdk-licenses: false

Expand Down
1 change: 1 addition & 0 deletions .github/workflows/build-cross.yml
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ jobs:
apt-get install -y --no-install-recommends \
build-essential \
glslc \
spirv-headers \
gcc-14-loongarch64-linux-gnu \
g++-14-loongarch64-linux-gnu \
libvulkan-dev:loong64
Expand Down
120 changes: 120 additions & 0 deletions .github/workflows/build-openvino.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
name: CI (openvino)

on:
workflow_dispatch: # allows manual triggering
push:
branches:
- master
paths: [
'.github/workflows/build-openvino.yml',
'**/CMakeLists.txt',
'**/.cmake',
'**/*.h',
'**/*.hpp',
'**/*.c',
'**/*.cpp',
]

pull_request:
types: [opened, synchronize, reopened]
paths: [
'.github/workflows/build-openvino.yml',
'ggml/src/ggml-openvino/**'
]

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

env:
GGML_NLOOP: 3
GGML_N_THREADS: 1
LLAMA_LOG_COLORS: 1
LLAMA_LOG_PREFIX: 1
LLAMA_LOG_TIMESTAMPS: 1

jobs:
ubuntu-24-openvino:
name: ubuntu-24-openvino-${{ matrix.openvino_device }}

concurrency:
group: openvino-${{ matrix.variant }}-${{ github.head_ref || github.ref }}
cancel-in-progress: false

strategy:
matrix:
include:
- variant: cpu
runner: '"ubuntu-24.04"'
openvino_device: "CPU"
- variant: gpu
runner: '["self-hosted","Linux","Intel","OpenVINO"]'
openvino_device: "GPU"

runs-on: ${{ fromJSON(matrix.runner) }}

env:
# Sync versions in build-openvino.yml, build-self-hosted.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
OPENVINO_VERSION_MAJOR: "2026.0"
OPENVINO_VERSION_FULL: "2026.0.0.20965.c6d6a13a886"

steps:
- name: Clone
id: checkout
uses: actions/checkout@v6

- name: ccache
if: runner.environment == 'github-hosted'
uses: ggml-org/ccache-action@v1.2.21
with:
key: ubuntu-24-openvino-${{ matrix.variant }}-no-preset-v1
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}

- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install -y build-essential libssl-dev libtbb12 cmake ninja-build python3-pip
sudo apt-get install -y ocl-icd-opencl-dev opencl-headers opencl-clhpp-headers intel-opencl-icd

- name: Use OpenVINO Toolkit Cache
if: runner.environment == 'github-hosted'
uses: actions/cache@v5
id: cache-openvino
with:
path: ./openvino_toolkit
key: openvino-toolkit-v${{ env.OPENVINO_VERSION_FULL }}-${{ runner.os }}

- name: Setup OpenVINO Toolkit
if: steps.cache-openvino.outputs.cache-hit != 'true'
uses: ./.github/actions/linux-setup-openvino
with:
path: ./openvino_toolkit
version_major: ${{ env.OPENVINO_VERSION_MAJOR }}
version_full: ${{ env.OPENVINO_VERSION_FULL }}

- name: Install OpenVINO dependencies
run: |
cd ./openvino_toolkit
chmod +x ./install_dependencies/install_openvino_dependencies.sh
echo "Y" | sudo -E ./install_dependencies/install_openvino_dependencies.sh

- name: Build
id: cmake_build
run: |
source ./openvino_toolkit/setupvars.sh
cmake -B build/ReleaseOV -G Ninja \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENVINO=ON
time cmake --build build/ReleaseOV --config Release -j $(nproc)

- name: Test
id: cmake_test
# TODO: fix and re-enable the `test-llama-archs` test below
run: |
cd ${{ github.workspace }}
if [ "${{ matrix.openvino_device }}" = "GPU" ]; then
export GGML_OPENVINO_DEVICE=GPU
fi
ctest --test-dir build/ReleaseOV -L main -E "test-llama-archs" --verbose --timeout 2000
24 changes: 6 additions & 18 deletions .github/workflows/build-riscv.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,22 +47,10 @@ jobs:
steps:
- name: Install dependencies
run: |
sudo apt-get update

# Install necessary packages
sudo apt-get install -y libatomic1 libtsan2 gcc-14 g++-14 cmake build-essential wget git-lfs

# Set gcc-14 and g++-14 as the default compilers
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-14 100
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-14 100

if ! which rustc; then
# Install Rust stable version
sudo apt-get install -y rustup
rustup install stable
rustup default stable
fi

git lfs install

- name: GCC version check
Expand All @@ -74,12 +62,12 @@ jobs:
id: checkout
uses: actions/checkout@v6

# FIXME: Enable when ggml-org/ccache-action works on riscv64
# - name: ccache
# uses: ggml-org/ccache-action@v1.2.21
# with:
# key: ubuntu-riscv64-native-sanitizer-${{ matrix.sanytizer }}-${{ matrix.build_type }}
# save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}
- name: ccache
uses: ggml-org/ccache-action@afde29e5b5422e5da23cb1f639e8baecadeadfc3 # https://github.com/ggml-org/ccache-action/pull/1
with:
key: ubuntu-riscv64-native-sanitizer-${{ matrix.sanitizer }}-${{ matrix.build_type }}
evict-old-files: 1d
save: ${{ github.event_name == 'push' && github.ref == 'refs/heads/master' }}

- name: Build
id: cmake_build
Expand Down
34 changes: 34 additions & 0 deletions .github/workflows/build-self-hosted.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,36 @@ jobs:
vulkaninfo --summary
GG_BUILD_VULKAN=1 bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp

# TODO: investigate slight precision issues in some operations for test-backend-ops on the WebGPU backend.
#ggml-ci-nvidia-webgpu:
# runs-on: [self-hosted, Linux, NVIDIA]

# steps:
# - name: Clone
# id: checkout
# uses: actions/checkout@v6

# - name: Dawn Dependency
# id: dawn-depends
# run: |
# DAWN_VERSION="v20260317.182325"
# DAWN_OWNER="google"
# DAWN_REPO="dawn"
# DAWN_ASSET_NAME="Dawn-18eb229ef5f707c1464cc581252e7603c73a3ef0-ubuntu-latest-Release"
# echo "Fetching release asset from https://github.com/google/dawn/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.tar.gz"
# curl -L -o artifact.tar.gz \
# "https://github.com/google/dawn/releases/download/${DAWN_VERSION}/${DAWN_ASSET_NAME}.tar.gz"
# mkdir dawn
# tar -xvf artifact.tar.gz -C dawn --strip-components=1

# - name: Test
# id: ggml-ci
# run: |
# GG_BUILD_WEBGPU=1 \
# GG_BUILD_WEBGPU_DAWN_PREFIX="$GITHUB_WORKSPACE/dawn" \
# GG_BUILD_WEBGPU_DAWN_DIR="$GITHUB_WORKSPACE/dawn/lib64/cmake/Dawn" \
# bash ./ci/run.sh ~/results/llama.cpp /mnt/llama.cpp

# TODO: provision AMX-compatible machine
#ggml-ci-cpu-amx:
# runs-on: [self-hosted, Linux, CPU, AMX]
Expand Down Expand Up @@ -235,6 +265,10 @@ jobs:
ggml-ci-intel-openvino-gpu-low-perf:
runs-on: [self-hosted, Linux, Intel, OpenVINO]

concurrency:
group: openvino-gpu-${{ github.head_ref || github.ref }}
cancel-in-progress: false

env:
# Sync versions in build.yml, build-self-hosted.yml, release.yml, build-cache.yml, .devops/openvino.Dockerfile
OPENVINO_VERSION_MAJOR: "2026.0"
Expand Down
Loading