forked from ggml-org/llama.cpp
-
-
Notifications
You must be signed in to change notification settings - Fork 192
Pull requests: TheTom/llama-cpp-turboquant
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Sync upstream: speculative checkpointing for hybrid models
android
Apple Metal
build
devops
documentation
Improvements or additions to documentation
examples
ggml
Hexagon
model
nix
Nvidia GPU
OpenCL
OpenVINO
python
script
server
SYCL
testing
Vulkan
WebGPU
#100
opened Apr 21, 2026 by
chad-loder
Loading…
HIP mixed TurboQuant vec FA on gfx900/gfx906
build
ggml
Nvidia GPU
#99
opened Apr 21, 2026 by
2bigO
Loading…
sparse V: skip negligible attention weights across all backends
Apple Metal
ggml
Nvidia GPU
Vulkan
#98
opened Apr 21, 2026 by
TheTom
Owner
Loading…
perf: turbo VEC flash attention — +9% decode on CUDA via autoresearch
ggml
Nvidia GPU
script
#53
opened Apr 4, 2026 by
signalnine
Loading…
7 tasks done
fix: HIP/ROCm compatibility — check cudaMemcpyToSymbol errors, guard …
ggml
Nvidia GPU
#41
opened Apr 1, 2026 by
terrysimons
•
Draft
ProTip!
no:milestone will show everything without a milestone.