Releases · ServeurpersoCom/llama.cpp

18 Oct 13:50

ee09828

b6795 Latest

Latest

HIP: fix GPU_TARGETS (#16642)

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-10-18T13:50:10Z
llama-b6795-bin-macos-arm64.zip

sha256:eae9fd8285411b7f24bd9b088ab62c2f62af58b154f18e2d997923d9b052e0c9

10.4 MB 2025-10-18T13:50:21Z
llama-b6795-bin-macos-x64.zip

sha256:e8358bd84302942967741cfff56a8b350d6b2f262cf02165847396126a5f6cb8

27 MB 2025-10-18T13:50:22Z
llama-b6795-bin-ubuntu-vulkan-x64.zip

sha256:3a2a15723c039b875ad42aa775be1b9f0c9cf773683b9cb8e00edefa47e027db

25.9 MB 2025-10-18T13:50:24Z
llama-b6795-bin-ubuntu-x64.zip

sha256:807eae587d34b234b8efac68355bffa14fa16aafaf4db58427381da5425c089a

12.5 MB 2025-10-18T13:50:25Z
llama-b6795-bin-win-cpu-arm64.zip

sha256:f4e4e13ee1c134c2ea7a2a69ff1c5db333987a551ef9406ea71a7e8b52461717

10.6 MB 2025-10-18T13:50:26Z
llama-b6795-bin-win-cpu-x64.zip

sha256:9ccb6091ac38404b333fab7eef8542ffe93d846bd82bf7e7235291b083fe9db5

13.7 MB 2025-10-18T13:50:27Z
llama-b6795-bin-win-cuda-12.4-x64.zip

sha256:9737ea117b0b04ef5a442f3fcfc1a0370d9d7272bdec8ea4e8afe1b4728d6741

169 MB 2025-10-18T13:50:29Z
llama-b6795-bin-win-hip-radeon-x64.zip

sha256:effe52c1ed88035922ee5356f406cab2ae3a81df1004e3221db1d2b2fa49745d

321 MB 2025-10-18T13:50:35Z
llama-b6795-bin-win-opencl-adreno-arm64.zip

sha256:0abd21378bd296f10dbd793e0b7d98d2ec1a7be6f06e82e878c2291c92c46ed0

11 MB 2025-10-18T13:50:48Z
Source code (zip)

2025-10-18T12:47:32Z
Source code (tar.gz)

2025-10-18T12:47:32Z

18 Oct 12:51

github-actions

b6794

e56abd2

b6794

vulkan: Implement topk_moe fused shader, ported from CUDA (#16641)

This is similar to the CUDA shader from #16130, but doesn't use shared memory
and handles different subgroup sizes.

Assets 15

18 Oct 10:33

github-actions

b6793

38355c6

b6793

CUDA: use registers instead of smem in topk-moe (#16647)

Uses the technique used in the vulkan PR #16641. Neat trick!

Assets 15

18 Oct 06:48

github-actions

b6792

8138785

b6792

opencl: transposed gemm/gemv moe kernel with mxfp4,f32 (#16602)

* opencl: transposed gemm/gemv moe kernel with mxfp4,f32

* add restore kernel for moe transpose

* fix trailing whitespaces

* resolve compilation warnings

Assets 15

17 Oct 17:03

github-actions

b6791

66b0dbc

b6791

llama-model: fix insonsistent ctxs <-> bufs order (#16581)

Assets 15

17 Oct 15:46

github-actions

b6790

41386cf

b6790

rpc : report actual free memory (#16616)

* rpc : report actual free memory

Start reporting the free memory on every device instead of using
fixed values. Now llama-cli users can get a nice memory breakdown
when using RPC devices.

* drop --mem in rpc-server

Assets 15

17 Oct 11:55

github-actions

b6788

342c728

b6788

ggml : fix SpaceMit IME array out-of-bounds in task assignment (#16629)

Fix incorrect task-to-batch index calculation in the quantization phase.

The bug caused out-of-bounds access to qnbitgemm_args array when
compute_idx exceeded per_gemm_block_count_m, leading to invalid
pointer dereferences and SIGBUS errors.

Correctly map tasks to batches by dividing compute_idx by
per_gemm_block_count_m instead of block_size_m.

Example:
  batch_feature=1, gemm_m=30, block_size_m=4
  per_gemm_block_count_m = 8, task_count = 8

  Old: gemm_idx = 4/4 = 1 (out of bounds  New: gemm_idx = 4/8 = 0 (correct)

Tested on SpaceMit K1 RISC-V64 with qwen2.5:0.5b model.

Co-authored-by: muggle <[email protected]>

Assets 15

17 Oct 08:44

github-actions

b6786

b194915

b6786

vulkan: fix debug build (add_rms_len/data not found) (#16624)

Assets 15

17 Oct 05:07

github-actions

b6783

ceff6bb

b6783

SYCL SET operator optimized for F32 tensors (#16350)

* SYCL/SET: implement operator + wire-up; docs/ops updates; element_wise & ggml-sycl changes

* sycl(SET): re-apply post-rebase; revert manual docs/ops.md; style cleanups

* move SET op to standalone file, GPU-only implementation

* Update SYCL SET operator for F32

* ci: fix editorconfig issues (LF endings, trailing spaces, final newline)

* fixed ggml-sycl.cpp

---------

Co-authored-by: Gitty Burstein <[email protected]>

Assets 15

16 Oct 17:23

github-actions

b6782

1bb4f43

b6782

mtmd : support home-cooked Mistral Small Omni (#14928)

Assets 15

Releases: ServeurpersoCom/llama.cpp

b6795

Uh oh!

b6794

Uh oh!

b6793

Uh oh!

b6792

Uh oh!

b6791

Uh oh!

b6790

Uh oh!

b6788

Uh oh!

b6786

Uh oh!

b6783

Uh oh!

b6782

Uh oh!