Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

mobiusml / hqq Public

Notifications You must be signed in to change notification settings
Fork 78
Star 763

Code
Issues 11
Pull requests 3
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: mobiusml/hqq

Releases · mobiusml/hqq

v.0.2.3.post1

20 Feb 11:12

mobicham

Compare

Choose a tag to compare

Loading

v.0.2.3.post1 Latest

Latest

Bug fixes:

Check W_q in state dict to fix peft issue #151
Fix bugs related to AutoHQQHFModel.save_to_safetensors

Assets 2

Loading

All reactions

v0.2.3

17 Feb 08:43

mobicham

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v0.2.3

VLLM support via patching - GemLite backend + on-the-fly quantization
Add support for Aria
Add support to load quantized SequenceClassification
Faster decoding via (custom cudagraphs, sdpa math backend, etc.)
Fix bugs related torch compile and hf_generator related to the newer transformers versions
Fix bugs related to saving quantized models with no grouping
Fix bugs related to saving large quantized models
Update examples
Add support for HQQLinear .to(device)

Assets 2

Loading

All reactions

v0.2.2

12 Sep 15:23

mobicham

Compare

Choose a tag to compare

Loading

v0.2.2

HQQ v0.2.2

Support static cache compilation without using HFGenerator
Fixing various issues related to torch.compile

Assets 2

Loading

firengate reacted with thumbs up emoji

firengate and kaizizzzzzz reacted with laugh emoji

firengate reacted with hooray emoji

firengate reacted with heart emoji

firengate reacted with rocket emoji

All reactions

👍 1 reaction
😄 2 reactions
🎉 1 reaction
❤️ 1 reaction
🚀 1 reaction

2 people reacted

v.0.2.1

29 Aug 16:25

mobicham

Compare

Choose a tag to compare

Loading

v.0.2.1

HQQ v0.2.1

HQQLinear.state_dict() for non-initialized layers. Mainly used in for huggingface/transformers#33141

Assets 2

Loading

firengate reacted with thumbs up emoji

All reactions

👍 1 reaction

1 person reacted

v.0.2.0

28 Aug 10:05

mobicham

Compare

Choose a tag to compare

Loading

v.0.2.0

HQQ v0.2.0

Bug fixes
Safetensors support for transformers via huggingface/transformers#33141
quant_scale, quant_zero and offload_meta are now deprecated. You can still use them with the hqq lib, but you can't use them with the transformers lib

Assets 2

Loading

Nelathan reacted with hooray emoji

All reactions

🎉 1 reaction

1 person reacted

v.0.1.8

11 Jul 12:00

mobicham

Compare

Choose a tag to compare

Loading

v.0.1.8

HQQ v0.1.8

Add BitBlas backend support
Simpler HQQLinear from weights HQQLinear.from_weights(W, bias, etc.)
Fix memory leak while swaping layers for the TorchAO Backend
Add HQQLinear.unpack() call

Assets 2

Loading

firengate reacted with thumbs up emoji

firengate reacted with laugh emoji

firengate reacted with hooray emoji

All reactions

👍 1 reaction
😄 1 reaction
🎉 1 reaction

1 person reacted

v0.1.7.post3

28 May 07:48

mobicham

Compare

Choose a tag to compare

Loading

v0.1.7.post3

HQQ v0.1.7.post3

Enable CPU quantization and runtime
_load_state_dict fix
fix extra_repr in HQQLinear
fix from_quantized bugs
fix | typing
fix 3-bit axis=1 slicing bug
add 5/6 bit for testing

Assets 2

Loading

All reactions

v0.1.7.post2

06 May 16:41

mobicham

Compare

Choose a tag to compare

Loading

v0.1.7.post2

HQQ v0.1.7.post2

Various bug fixes, especially with AutoHQQHFModel and the patching logic, to make it work with any transformers model.
Readme refactoring.
Whisper example.

Assets 2

Loading

firengate reacted with thumbs up emoji

firengate reacted with laugh emoji

firengate reacted with hooray emoji

firengate reacted with heart emoji

All reactions

👍 1 reaction
😄 1 reaction
🎉 1 reaction
❤️ 1 reaction

1 person reacted

v0.1.7

24 Apr 08:59

mobicham

Compare

Choose a tag to compare

Loading

v0.1.7

HQQ v0.1.7

Faster inference with torchao / marlin 4-bit kernels
Multi-gpu support for model.quantize()
Custom HF generator
Various bug fixes/improvements

Assets 2

Loading

firengate and pdh930105 reacted with thumbs up emoji

firengate reacted with laugh emoji

younesbelkada, firengate, and fkouteib reacted with hooray emoji

All reactions

👍 2 reactions
😄 1 reaction
🎉 3 reactions

4 people reacted

v0.1.6.post2

19 Mar 18:24

mobicham

Compare

Choose a tag to compare

Loading

v0.1.6.post2

HQQ v0.1.6.post2

Same as v0.1.6 with setup.py fixes:

find_packages fix: #25
Auto-build CUDA kernels via pypi package: #26

Assets 2

Loading

firengate reacted with thumbs up emoji

All reactions

👍 1 reaction

1 person reacted

Previous 1 2 3 Next

Previous Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.