-
Notifications
You must be signed in to change notification settings - Fork 35
Support q2-k to q4-k #434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @wenhuach21 @n1ck-guo, does export for q4_k work right now? I tried to adapt that for torchao, and tried to serve with vllm
can you help me take a look at https://gist.github.com/jerryzh168/fac8f8c8f89c65ef7cc3d76fdc74ba04#file-gistfile1-txt-L48, wondering if the argument list for ggml_quant are correct or not:
auto-round/auto_round/data_type/int.py Line 77 in 37341f5
|
Thank you for the reporting, we will check the related issues immediately |
@n1ck-guo if you want to repro the issue, here are the steps:
|
We have tested the code of export q4_k_s. For some other models, it works well. But for microsoft/Phi-4-mini-instruct, we cannot export and it will raise error. This is because our code relying on the original export code from llama-cpp (convert_hf_to_gguf.py) and it seems it not works well with Phi-4. |
@jerryzh168 Thank you for waiting. This issue seems to be caused by a problem with the llama.cpp version. Could you please try with this pr #524 and the lastest gguf-py. |
need to support double quant in algorithm part
The text was updated successfully, but these errors were encountered: