Alternatives to BitsAndBytes for HF models #1337

Timelessprod · 2024-08-28T12:03:15Z

Timelessprod
Aug 28, 2024

Hello,

I'm loading and fine-tuning a model from HF to use it with Ollama afterwards and for now I relied on BitsAndBytes for quantization (resources limitations). However it turns out that even with the following config, the safetensors were finally exported as uint8 (U8) instead of float16 (F16)* thus impossible to use with Ollama which only supports F16, BF16 and F32:

bnb_config: BitsAndBytesConfig = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_storage="float16"
)

*I understood that bnb_4bit_quant_storage is supposed to be the type in which weight are stored when saving the model, correct me if I'm wrong.

Thus, do you know any other library/framework to quantize a model while (or after) loading it from HF which works similar to BitsAndBytes but exports tensors in one of the valid dtypes above?

I looked around on the web but couldn't find anything fitting my needs.

Thank you very much.

chibuzordev · 2025-03-25T11:00:41Z

chibuzordev
Mar 25, 2025

I'd like to ask if you found what you were looking for, as I am in a similar situation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Alternatives to BitsAndBytes for HF models #1337

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Alternatives to BitsAndBytes for HF models #1337

Uh oh!

Timelessprod Aug 28, 2024

Replies: 1 comment

Uh oh!

chibuzordev Mar 25, 2025

Timelessprod
Aug 28, 2024

chibuzordev
Mar 25, 2025