Releases: mobiusml/hqq
Releases · mobiusml/hqq
v.0.2.3.post1
Bug fixes:
- Check
W_q
in state dict to fix peft issue #151 - Fix bugs related to
AutoHQQHFModel.save_to_safetensors
v0.2.3
- VLLM support via patching - GemLite backend + on-the-fly quantization
- Add support for Aria
- Add support to load quantized SequenceClassification
- Faster decoding via (custom cudagraphs, sdpa math backend, etc.)
- Fix bugs related torch compile and hf_generator related to the newer transformers versions
- Fix bugs related to saving quantized models with no grouping
- Fix bugs related to saving large quantized models
- Update examples
- Add support for HQQLinear
.to(device)
v0.2.2
v.0.2.1
HQQ v0.2.1
HQQLinear.state_dict()
for non-initialized layers. Mainly used in for huggingface/transformers#33141
v.0.2.0
HQQ v0.2.0
- Bug fixes
- Safetensors support for transformers via huggingface/transformers#33141
quant_scale
,quant_zero
andoffload_meta
are now deprecated. You can still use them with the hqq lib, but you can't use them with the transformers lib
v.0.1.8
v0.1.7.post3
HQQ v0.1.7.post3
- Enable CPU quantization and runtime
_load_state_dict
fix- fix
extra_repr
inHQQLinear
- fix
from_quantized
bugs - fix
|
typing - fix 3-bit
axis=1
slicing bug - add 5/6 bit for testing
v0.1.7.post2
HQQ v0.1.7.post2
- Various bug fixes, especially with
AutoHQQHFModel
and the patching logic, to make it work with any transformers model. - Readme refactoring.
- Whisper example.