Skip to content

Releases: teleprint-me/llama.cpp

b5936

18 Jul 18:18
9fb1042
Compare
Choose a tag to compare
graph : fix graph reuse reset of params (#14760)

ggml-ci

b5849

09 Jul 02:09
6efcd65
Compare
Choose a tag to compare
vulkan: optimize flash attention split_k_reduce (#14554)

* vulkan: allow FA split_k with smaller KV values

* vulkan: spread split_k_reduce work across more threads

k_num can get rather large. Use the whole workgroup to reduce the M/L values.

Launch a thread for each element in the HSV dimension of the output. Helps a
lot for large HSV (like deepseek).

b5813

02 Jul 20:24
e75ba4c
Compare
Choose a tag to compare
gguf-py : add support for chat template jinja files (#14508)

* add support for chat template jinja files

* remove gemma3n hack

b5787

30 Jun 18:57
0a5a3b5
Compare
Choose a tag to compare
Add Conv2d for CPU (#14388)

* Conv2D: Add CPU version

* Half decent

* Tiled approach for F32

* remove file

* Fix tests

* Support F16 operations

* add assert about size

* Review: further formatting fixes, add assert and use CPU version of fp32->fp16

b5648

13 Jun 01:12
ed52f36
Compare
Choose a tag to compare
sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125)

b5599

05 Jun 22:28
1caae7f
Compare
Choose a tag to compare
gguf-py : add add_classifier_output_labels method to writer (#14031)

* add add_classifier_output_labels

* use add_classifier_output_labels

b5575

02 Jun 17:15
bfd3227
Compare
Choose a tag to compare
mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961)

* mtmd : fix memory in mtmd_helper_eval_chunk_single

* mtmd-cli : fix mem leak

* Update tools/mtmd/mtmd-cli.cpp

Co-authored-by: Georgi Gerganov <[email protected]>

---------

Co-authored-by: Georgi Gerganov <[email protected]>

b5572

01 Jun 22:11
7675c55
Compare
Choose a tag to compare
gguf: fix failure on version == 0 (#13956)

b5548

30 May 20:32
e562eec
Compare
Choose a tag to compare
CUDA: fix typo in FlashAttention code (#13926)

b5538

30 May 01:04
ec9e030
Compare
Choose a tag to compare
cmake: Guard GGML_CPU_ALL_VARIANTS by architecture (#13890)