Skip to content

gau-nernst/gemma3-int4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gemma 3 INT4

Gemma 3 provides FP32 checkpoints trained with INT4 Quantization Aware Training (QAT). They have the following characteristics

  • INT4 symmetric quantization i.e. no zero point, with group size = 32
  • Tied embeddings

No established libraries (in PyTorch) can take full advantage of the above. Hence, this library aims to do it.

Plan:

  • vLLM / SGLang integration

Convert checkpoint

uv run convert_flax.py --ckpt_dir gemma3-1b-it-int4 --save_dir gemma-3-1b-it-int4 --format int4
# other possible --format values: awq, gguf

For GGUF, you will need a "donor" GGUF to copy the tokenizer over (I'm too lazy to re-implement the logic to serialize tokenizer to GGUF format).

# create the donor GGUF, where convert_hf_to_gguf.py is found in llama.cpp repo
python convert_hf_to_gguf.py $(huggingface-cli download google/gemma-3-4b-it) --outtype bf16 --outfile gemma-3-4b-it-BF16.gguf

# combine weights from convert_flax.py and metadata from convert_hf_to_gguf.py
uv run convert_gguf.py --ckpt gemma-3-4b-it-int4-gguf/model.safetensors --metadata gemma-3-4b-it-BF16.gguf --save_path gemma-3-4b-it-q4_0.gguf

# convert vision tower for vision-enabled checkpoints
python examples/llava/gemma3_convert_encoder_to_gguf.py $(huggingface-cli download google/gemma-3-4b-it) --outtype bf16 --outfile mmproj-bf16.gguf

Chat demo

python chat_hf.py --model gaunernst/gemma-3-1b-it-int4
Model Vision INT4 link AWQ link GGUF link
Gemma 3 1B Link Link Link
Gemma 3 4B Link Link Link
Gemma 3 12B Link Link Link
Gemma 3 27B Link Link Link

About

PyTorch inference library for Gemma 3 INT4

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages