-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Description
First of all: CONGRATS ON YOUR AMAZING RESEARCH WORK.
Considering that this is using GGML and seems based directly on llama.cpp
:
Why is this a separate project to llama.cpp
, given that llama.cpp
already supports BitNet ternary quants? (ggml-org/llama.cpp#8151)
Are these simply more optimised kernels?
If so, how do they compare to llama's implementation?
Can/should they be contributed back to llama.cpp
?
sadityakumar9211, saattrupdan, vTuanpham, gingerly, Maaarcocr and 12 morey-vectorfieldZipingL, stan4cb, jaycc3000 and LeoX91AgainstEntropy, saattrupdan, RobinBially, vTuanpham, dev-cj and 13 more
Metadata
Metadata
Assignees
Labels
No labels