Skip to content

Releases: intel/auto-round

v0.5.1:bug fix release

23 Apr 08:50
v0.5.1
73669aa
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.5.0...v0.5.1

v0.5.0

22 Apr 08:05
v0.5.0
e90f991
Compare
Choose a tag to compare

Highlights

  • refine autoround format inference, support 2,3,4,8 bits and marlin kernel and fix several bugs in auto-round format
  • support xpu in tuning and inference by @wenhuach21 in #481
  • support for more vlms by @n1ck-guo in #390
  • change quantization method name and made several refinements by @wenhuach21 in #500
  • support rtn via iters==0 by @wenhuach21 in #510
  • fix bug of mix calib dataset by @n1ck-guo in #492

What's Changed

Full Changelog: v0.4.7...v0.5.0

v0.4.7

01 Apr 09:50
Compare
Choose a tag to compare

Highlights

Support W4AFP8 for HPU. Please refer to Intel Neural Compressor for guidance on running these models. by @yiliu30 in #467

Support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466

20x for awq and 4x for gptq packing speedup on cuda by @wenhuach21 in #459

Support auto-round-light to speed up the tuning process @WeiweiZhang1 in #454

Fix critic bug of mxfp4 in tuningby @wenhuach21 in #451

What's Changed

Full Changelog: v0.4.6...v0.4.7

v0.4.6

24 Feb 09:23
Compare
Choose a tag to compare

Highlights:

1 set torch compile to false by default in #447
2 Fix packing hang and force to fp16 at exporting in #430
3 align auto_quantizer with Transformers 4.49 in #437

What's Changed

Full Changelog: v0.4.5...v0.4.6

v0.4.5

27 Jan 12:12
Compare
Choose a tag to compare

Highlights:
We have enhanced support for extremely large models with the following updates:

Multi-Card Tuning Support: Added basic support for multi-GPU tuning. #415 support naive multi-card tuning

Accelerated Packing Stage: Improved the packing speed (2X-4X)for AutoGPTQ and AutoAWQ formats by leveraging cuda. #407 speedup packing stage for autogptq and autoawq forma

Deepseek V3 GGUF Export: Introduced support for exporting models to the Deepseek V3 GGUF format. #416 support to export deepseek v3 gguf format

What's Changed

Full Changelog: v0.4.4...v0.4.5

v0.4.4 release

10 Jan 01:47
86767b0
Compare
Choose a tag to compare

Highlights:
1 Fix install issue in #387
2 support to export gguf q4_0 and q4_1 format in #393
3 fix llm cmd line seqlen issue in #399

What's Changed

Full Changelog: v0.4.3...v0.4.4

v0.4.3: bug fix release

16 Dec 03:24
3323371
Compare
Choose a tag to compare

Highlights:
fix incorrect device setting in autoround format inference by @WeiweiZhang1 in #383
remove the dependency on AutoGPTQ by @XuehaoSun in #380

What's Changed

Full Changelog: v0.4.2...v0.4.3

v0.4.2: bug fix release

09 Dec 09:44
Compare
Choose a tag to compare

Highlights

1 Fix autoawq exporting issue
2 remove bias exporting if possible in autogptq format

What's Changed

Full Changelog: v0.4.1...v0.4.2

v0.4.1: bug fix release

27 Nov 09:53
Compare
Choose a tag to compare

Highlights:

  • Fixed vllm calibration infinite loop issue
  • Corrected the default value for the sym argument in the API configuration.

What's Changed

Full Changelog: v0.4...v0.4.1

v0.4

22 Nov 13:32
Compare
Choose a tag to compare

Highlights

[Experimental Feature] We provide API support for VLM models
[Kernel] We add ipex support for intel cpu
[Bug fix] We fix tuning bug for glm4 model
[Enhancement] better align gradient_accumulate_steps behavior for varied length input

What's Changed

Full Changelog: v0.3.1...v0.4