Releases: CodeLinaro/llama.cpp
Releases · CodeLinaro/llama.cpp
b4450
fix: add missing msg in static_assert (#11143) Signed-off-by: hydai <[email protected]>
b4382
rpc-server : add support for the SYCL backend (#10934)
b4324
Opt class for positional argument handling (#10508) Added support for positional arguments `model` and `prompt`. Added functionality to download via strings like: llama-run llama3 llama-run ollama://granite-code llama-run ollama://granite-code:8b llama-run hf://QuantFactory/SmolLM-135M-GGUF/SmolLM-135M.Q2_K.gguf llama-run huggingface://bartowski/SmolLM-1.7B-Instruct-v0.2-GGUF/SmolLM-1.7B-Instruct-v0.2-IQ3_M.gguf llama-run https://example.com/some-file1.gguf llama-run some-file2.gguf llama-run file://some-file3.gguf Signed-off-by: Eric Curtin <[email protected]>
b4302
ggml: load all backends from a user-provided search path (#10699) * feat: load all backends from a user-provided search path * fix: Windows search path * refactor: rename `ggml_backend_load_all_in_search_path` to `ggml_backend_load_all_from_path` * refactor: rename `search_path` to `dir_path` * fix: change `NULL` to `nullptr` Co-authored-by: Diego Devesa <[email protected]> * fix: change `NULL` to `nullptr` --------- Co-authored-by: Diego Devesa <[email protected]>
b4301
vulkan: request round-to-even for fp16 in im2col/rope_head (#10767) Vulkan doesn't mandate a specific rounding mode, but the shader_float_controls feature allows rounding mode to be requested if the implementation supports it.
b4291
server : fix format_infill (#10724) * server : fix format_infill * fix * rename * update test * use another model * update test * update test * test_invalid_input_extra_req
b4267
Update deprecation-warning.cpp (#10619) Fixed Path Separator Handling for Cross-Platform Support (Windows File Systems)
b4255
vulkan: optimize and reenable split_k (#10637) Use vector loads when possible in mul_mat_split_k_reduce. Use split_k when there aren't enough workgroups to fill the shaders.
b4242
llama : add enum for built-in chat templates (#10623) * llama : add enum for supported chat templates * use "built-in" instead of "supported" * arg: print list of built-in templates * fix test * update server README
b4226
ggml : move AMX to the CPU backend (#10570) * ggml : move AMX to the CPU backend --------- Co-authored-by: Georgi Gerganov <[email protected]>