Alternatives to BitsAndBytes for HF models #1337
Unanswered
Timelessprod
asked this question in
Q&A
Replies: 1 comment
-
I'd like to ask if you found what you were looking for, as I am in a similar situation. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I'm loading and fine-tuning a model from HF to use it with Ollama afterwards and for now I relied on BitsAndBytes for quantization (resources limitations). However it turns out that even with the following config, the safetensors were finally exported as uint8 (U8) instead of float16 (F16)* thus impossible to use with Ollama which only supports F16, BF16 and F32:
*I understood that
bnb_4bit_quant_storage
is supposed to be the type in which weight are stored when saving the model, correct me if I'm wrong.Thus, do you know any other library/framework to quantize a model while (or after) loading it from HF which works similar to BitsAndBytes but exports tensors in one of the valid dtypes above?
I looked around on the web but couldn't find anything fitting my needs.
Thank you very much.
Beta Was this translation helpful? Give feedback.
All reactions