Releases · intel/auto-round

24 Jul 02:33

wenhuach21

v0.6.0

dd95bdb

v0.6.0 Latest

Latest

Highlights

provide experimental support for gguf q*_k format and customized mixed bits setting
support xpu in triton backend by @wenhuach21 in #563
add torch backend by @WeiweiZhang1 in #555
provide initial support of llmcompressor format, only INT8 W8A8 dynamic quantization is supported by @xin3he in #646

What's Changed

bump version into v0.5.1 by @XuehaoSun in #540
Freeze pytorch & ipex version in CI by @XuehaoSun in #541
fix_quantization_config_for_inference by @WeiweiZhang1 in #542
[critic bug]remove redundant round in dq simulation by @wenhuach21 in #543
update readme by @wenhuach21 in #550
add recipes for qwen3 8b and 14b by @n1ck-guo in #552
itrex requires torch<2.7 by @XuehaoSun in #548
[GGUF STEP4] fix search bug and improve packing & eval speed by @n1ck-guo in #545
refine xpu requirement/config json and fix several issues by @wenhuach21 in #558
add UE5M3 simulation by @wenhuach21 in #562
support xpu in triton backend by @wenhuach21 in #563
fix typo in backend by @wenhuach21 in #564
update habana docker to 1.21.0 by @XuehaoSun in #566
Support for more gguf format and float zp for Q*_1 by @n1ck-guo in #560
update readme by @wenhuach21 in #569
update readme by @wenhuach21 in #571
support for llava-based hf model by @n1ck-guo in #568
add gguf accuracy data by @wenhuach21 in #574
add sym & asym gguf quant for gguf baseline (iter==0) by @n1ck-guo in #573
modify default asym 4bits auto-round format to awq, fix save folder typo for mllm by @WeiweiZhang1 in #575
improve the robustness of parsing vlm config by @wenhuach21 in #577
switch to transformers API in cpu ut by @wenhuach21 in #580
add torch backend by @WeiweiZhang1 in #555
fix awq exporting at group_size=-1 by @wenhuach21 in #579
refact cuda ut to facilitate automation by @n1ck-guo in #559
fix tensor shape mismatch error for API usage by @WeiweiZhang1 in #582
fix device bug at calibration by @wenhuach21 in #587
Update gguf_accuracy (q3_ks) by @SinpackKonmakan in #590
add recipes for deepseek-r1-0528 by @n1ck-guo in #588
correct errors of deepseek-r1-0528 recipes by @n1ck-guo in #591
fix cuda ut by @wenhuach21 in #592
Bump protobuf from 3.20.1 to 3.20.2 in /test/test_cuda by @dependabot[bot] in #585
rm unnecessary forward to improve speed by @wenhuach21 in #593
update readme by @wenhuach21 in #597
fix q2k bug by @n1ck-guo in #599
support for q4_k_m by @n1ck-guo in #596
fix vlm uttest path error by @WeiweiZhang1 in #601
fix lots of gguf critic bugs and support imatrix in rtn mode by @wenhuach21 in #595
fix gguf bug by @wenhuach21 in #610
mv some checkers by @wenhuach21 in #611
fix gguf packing bug and moe regression by @wenhuach21 in #614
support customized mixed bits for gguf by @wenhuach21 in #615
fix double quant sym bug by @wenhuach21 in #616
FP8 WOQ export by @wenhuach21 in #617
fix bug of q5_k_s w/ imatrix by @n1ck-guo in #620
add auto-round related vllm and transformers UT by @WeiweiZhang1 in #613
refine_doc_0624 by @WeiweiZhang1 in #619
fix not using imatrix for gguf at rtn mode by @wenhuach21 in #623
fix vlm hf config loading issue by @WeiweiZhang1 in #624
refine gguf rtn algorithm and fix bugs by @wenhuach21 in #630
fix gguf bug of moe models and lmhead/embedding bits setting regression by @n1ck-guo in #628
[BUG FIX] fix bug of deepseek gguf:q*k by @n1ck-guo in #637
support packing immediately for gguf to reduce ram usage by @wenhuach21 in #638
support llmcompressor format by @xin3he in #646
fix norm_bias_tuning by @wenhuach21 in #639
[W4A8]Fix Packing by @yiliu30 in #648
Integrate RTN quantization into GGUF packing to enhance robustness by @n1ck-guo in #644
Remove vlm cuda UT dependencies version restrictions by @XuehaoSun in #651
speedup mxfp tuning and fix nvfp bug by @wenhuach21 in #647
support two more calib datasets and fix embedding layer bug by @wenhuach21 in #653
fix some issues by @wenhuach21 in #655
fix bug of q4_0 and q5_0 at iters==0 by @n1ck-guo in #658
support vlm models for gguf format by @n1ck-guo in #654
fix bug of block-wise quant imatrix by @n1ck-guo in #663
fix gguf block-wise issue by @wenhuach21 in #664
fix bugs of export deepseek gguf format when iters=0 and q3k accuracy by @n1ck-guo in #665
handle zeros in imatrix by @wenhuach21 in #667
fix ut issue by @WeiweiZhang1 in #668
fix cuda hanging issue during packing by @WeiweiZhang1 in #669
support to use lm_eval for vlm by @n1ck-guo in #670
add trust remote code to gguf format load tokenizer by @n1ck-guo in #675
fix 3bits asym accuracy and calib dataset issues by @WeiweiZhang1 in #674
restrict accelerate version to reduce ram usage by @wenhuach21 in #673
rm low_cpu when loading the model by @wenhuach21 in #676
rm_old_vlm_cuda_ut by @WeiweiZhang1 in #678
update gguf convert file and fix bug of permute bug by @n1ck-guo in #679
fix gguf regression for large models by @wenhuach21 in #680
fix gemma vlm gguf regression by @wenhuach21 in #685

New Contributors

@SinpackKonmakan made their first contribution in #590
@xin3he made their first contribution in #646

Full Changelog: v0.5.1...v0.6.0

Contributors

dependabot, xin3he, and 6 other contributors

Assets 2

23 Apr 08:50

wenhuach21

v0.5.1

73669aa

v0.5.1:bug fix release

What's Changed

bump version into v0.5.0 by @XuehaoSun in #538
fix triton multiple gpus and some other issues by @wenhuach21 in #539

Full Changelog: v0.5.0...v0.5.1

Contributors

wenhuach21 and XuehaoSun

Assets 2

22 Apr 08:05

wenhuach21

v0.5.0

e90f991

v0.5.0

Highlights

refine autoround format inference, support 2,3,4,8 bits and marlin kernel and fix several bugs in auto-round format
support xpu in tuning and inference by @wenhuach21 in #481
support for more vlms by @n1ck-guo in #390
change quantization method name and made several refinements by @wenhuach21 in #500
support rtn via iters==0 by @wenhuach21 in #510
fix bug of mix calib dataset by @n1ck-guo in #492

What's Changed

support xpu in tuning and inference by @wenhuach21 in #481
add light ut, fixtypos by @WeiweiZhang1 in #483
bump into v0.4.7 by @XuehaoSun in #487
fix dataset combine bug by @wenhuach21 in #489
fix llama 8b time cost by @WeiweiZhang1 in #490
update 2bits acc results by @WeiweiZhang1 in #491
fix bug of mix calib dataset by @n1ck-guo in #492
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #494
[GGUF support step3]patch for double quant by @n1ck-guo in #473
refine inference backend/code step 1 by @wenhuach21 in #486
refine inference step 2 by @wenhuach21 in #498
change quantization method name and made several refinements by @wenhuach21 in #500
fix bug of awq/gptq modules_to_not_convert by @n1ck-guo in #501
use --tasks to control evaluation enabling by @wenhuach21 in #505
fix gguf eval regression bug by @n1ck-guo in #506
change to new api in readme by @wenhuach21 in #507
fix setup issue on cuda machine by @wenhuach21 in #511
support rtn via iters==0 by @wenhuach21 in #510
fix critical bug of get_multimodal_block_names by @n1ck-guo in #509
Update requirements-lib.txt by @yiliu30 in #513
add group_size divisible check in backend by @wenhuach21 in #512
support for more vlms by @n1ck-guo in #390
move gguf-dq test to cuda by @n1ck-guo in #520
fix bs!=1 for gemma and MiniMax-Text-01 by @wenhuach21 in #515
add regex support in layer_config setting by @wenhuach21 in #519
patch for vlm by @n1ck-guo in #518
rename backend to packing_format in config.json by @wenhuach21 in #521
fix example's model_dtype by @WeiweiZhang1 in #523
rm fp16 export in autoround format by @wenhuach21 in #525
update convert_hf_to_gguf to support more models by @n1ck-guo in #524
fix light config by @WeiweiZhang1 in #526
fix typos, add model card link for VLMs by @WeiweiZhang1 in #527
add backend readme by @wenhuach21 in #528
update mllm readme by @WeiweiZhang1 in #530
fix bug of cuda ut by @n1ck-guo in #532
fix inference issue by @wenhuach21 in #529
update readme by @wenhuach21 in #531
refine readme by @WeiweiZhang1 in #536
fix cuda ut by @n1ck-guo in #537

Full Changelog: v0.4.7...v0.5.0

Contributors

pre-commit-ci, yiliu30, and 4 other contributors

Assets 2

01 Apr 09:50

wenhuach21

v0.4.7

2d904a4

v0.4.7

Highlights

Support W4AFP8 for HPU. Please refer to Intel Neural Compressor for guidance on running these models. by @yiliu30 in #467

Support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466

20x for awq and 4x for gptq packing speedup on cuda by @wenhuach21 in #459

Support auto-round-light to speed up the tuning process @WeiweiZhang1 in #454

Fix critic bug of mxfp4 in tuningby @wenhuach21 in #451

What's Changed

step-1 support naive double quant in tuning by @wenhuach21 in #442
fix critic bug of mxfp4 by @wenhuach21 in #451
update readme by @wenhuach21 in #455
update eval by @n1ck-guo in #450
awq exporting bugfix by @WeiweiZhang1 in #456
Support force loading into autoround Format by @WeiweiZhang1 in #453
20x for awq and 4x for gptq packing speedup by @wenhuach21 in #459
fixl eval bug by @n1ck-guo in #461
[STEP-1]W4Afp8 export by @wenhuach21 in #378
[HPU] Update W4A8 for HPU by @yiliu30 in #467
support for gemma3 by @n1ck-guo in #468
upload_auto-round-light results by @WeiweiZhang1 in #454
GGUF support step2: add naive Q2_KS and Q4_KS by @n1ck-guo in #448
fix incorrect recipe data by @WeiweiZhang1 in #471
support for mistral3 by @n1ck-guo in #472
support to export gemma3 gguf format by @n1ck-guo in #470
Increase unit test timeout from 120 to 240 minutes by @XuehaoSun in #474
support packing immediately in new quantization api to save ram usage by @wenhuach21 in #466
rm redundant line break by @WeiweiZhang1 in #475
Temporarily close qxk api for new release by @n1ck-guo in #478
add restrict for exporting act-quant models by @n1ck-guo in #480

Full Changelog: v0.4.6...v0.4.7

Contributors

yiliu30, wenhuach21, and 3 other contributors

Assets 2

24 Feb 09:23

wenhuach21

v0.4.6

1320752

v0.4.6

Highlights:

1 set torch compile to false by default in #447
2 Fix packing hang and force to fp16 at exporting in #430
3 align auto_quantizer with Transformers 4.49 in #437

What's Changed

Fix packing hang, torch compile and force to fp16 at exporting by @wenhuach21 in #430
fix nblocks issues by @wenhuach21 in #432
rm gc collect in packing by @wenhuach21 in #438
align auto_quantizer with main branch in Transformers by @WeiweiZhang1 in #437
[HPU]Fix compile bug when quant layer by @yiliu30 in #441
remove tricky setting in mxfp4 by @wenhuach21 in #445
fix bug of evaluate user model by @n1ck-guo in #444
Refine funcs by @WeiweiZhang1 in #446
set torch compile to false by default by @WeiweiZhang1 in #447

Full Changelog: v0.4.5...v0.4.6

Contributors

yiliu30, wenhuach21, and 2 other contributors

Assets 2

27 Jan 12:12

wenhuach21

v0.4.5

e38a306

v0.4.5

Highlights:
We have enhanced support for extremely large models with the following updates:

Multi-Card Tuning Support: Added basic support for multi-GPU tuning. #415 support naive multi-card tuning

Accelerated Packing Stage: Improved the packing speed (2X-4X)for AutoGPTQ and AutoAWQ formats by leveraging cuda. #407 speedup packing stage for autogptq and autoawq forma

Deepseek V3 GGUF Export: Introduced support for exporting models to the Deepseek V3 GGUF format. #416 support to export deepseek v3 gguf format

What's Changed

update format readme by @wenhuach21 in #411
fix log bug and device "auto" bug by @n1ck-guo in #409
speedup packing stage for autogptq and autoawq format by @wenhuach21 in #407
support naive multi-card tuning by @wenhuach21 in #415
support bf16 inference for autoround format by @wenhuach21 in #420
enable backup pile dataset loading by @WeiweiZhang1 in #417
fix evaluation device bug, relate to issue 413 by @n1ck-guo in #419
support to export deepseek v3 gguf format by @n1ck-guo in #416
fix cuda UT torch_dtype by @WeiweiZhang1 in #423
fix eval trust_remote_code by @n1ck-guo in #424

Full Changelog: v0.4.4...v0.4.5

Contributors

wenhuach21, WeiweiZhang1, and n1ck-guo

Assets 2

10 Jan 01:47

wenhuach21

v0.4.4

86767b0

v0.4.4 release

Highlights:
1 Fix install issue in #387
2 support to export gguf q4_0 and q4_1 format in #393
3 fix llm cmd line seqlen issue in #399

What's Changed

fix a critic bug of static activation quantization by @wenhuach21 in #392
vlm 70B+ in single card by @n1ck-guo in #395
enhance calibration dataset and add awq pre quantization warning by @wenhuach21 in #396
support awq format for vlms by @WeiweiZhang1 in #398
[critic bug]fix llm example seqlen issue by @WeiweiZhang1 in #399
fix device auto issue by @wenhuach21 in #400
Fix auto-round install & bump into 0.4.4 by @XuehaoSun in #387
fix dtype converting issue by @wenhuach21 in #403
support for deepseek vl2 by @n1ck-guo in #401
llm_layer_config_bugfix by @WeiweiZhang1 in #406
support awq with qbits, only support sym by @wenhuach21 in #402
support to export gguf q4_0 and q4_1 format by @n1ck-guo in #393

Full Changelog: v0.4.3...v0.4.4

Contributors

wenhuach21, XuehaoSun, and 2 other contributors

Assets 2

16 Dec 03:24

wenhuach21

v0.4.3

3323371

v0.4.3: bug fix release

Highlights:
fix incorrect device setting in autoround format inference by @WeiweiZhang1 in #383
remove the dependency on AutoGPTQ by @XuehaoSun in #380

What's Changed

support_llava_hf_vlm_example by @WeiweiZhang1 in #381
fix block_name_to_quantize by @WeiweiZhang1 in #382
fix incorrect device setting in autoround format inference by @WeiweiZhang1 in #383
refine homepage, update model links by @WeiweiZhang1 in #385
update eval basic usage by @n1ck-guo in #384
refine error msg and dump more log in the tuning by @wenhuach21 in #386
remove the dependency on AutoGPTQ for CPU and bump to V0.4.3 by @XuehaoSun in #380

Full Changelog: v0.4.2...v0.4.3

Contributors

wenhuach21, XuehaoSun, and 2 other contributors

Assets 2

09 Dec 09:44

wenhuach21

v0.4.2

9249b14

v0.4.2: bug fix release

Highlights

1 Fix autoawq exporting issue
2 remove bias exporting if possible in autogptq format

What's Changed

bump version into v0.4.1 by @XuehaoSun in #350
Update docker user and remove baseline UT by @XuehaoSun in #347
delete llm example and refine readme by @wenhuach21 in #354
Simulated W4Afp8 Quantization by @wenhuach21 in #331
add QWQ-32B, VLM, Qwen2.5, Llama3.1 int4 models by @wenhuach21 in #356
fix awq exporting by @wenhuach21 in #358
Tensor reshape bugfix by @WeiweiZhang1 in #364
fix awq backend and fp_layers issue by @wenhuach21 in #363
fix awq exporting bugs by @wenhuach21 in #365
fix bug of only_text_test check due to inference issue on cpu by @n1ck-guo in #362
add gpu test by @wenhuach21 in #367
using multicard when device set to "auto" by @n1ck-guo in #368
quant_block_names enhancement by @WeiweiZhang1 in #369
[HPU] Add lazy mode back by @yiliu30 in #371
remove bias exporting if possible in autogptq format by @wenhuach21 in #375
save processor automatically by @n1ck-guo in #372
Add gpu ut by @wenhuach21 in #370
fix gpu ut by @n1ck-guo in #376
fix typos by @wenhuach21 in #377

Full Changelog: v0.4.1...v0.4.2

Contributors

yiliu30, wenhuach21, and 3 other contributors

Assets 2

27 Nov 09:53

wenhuach21

v0.4.1

d562895

v0.4.1: bug fix release

Highlights:

Fixed vllm calibration infinite loop issue
Corrected the default value for the sym argument in the API configuration.

What's Changed

fix typo by @wenhuach21 in #342
vllm/llama-vision llava calibration infinite loop fix by @WeiweiZhang1 in #343
[HPU]Enhance numba check by @yiliu30 in #345
[VLM]fix bs and grad reset by @n1ck-guo in #344
[HPU]Enhance installation check by @yiliu30 in #346
[Critical Bug]API use sym as default by @wenhuach21 in #349
triton backend requires< 3.0 by @wenhuach21 in #348

Full Changelog: v0.4...v0.4.1

Contributors

yiliu30, wenhuach21, and 2 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

Highlights

What's Changed

Contributors

Uh oh!

Highlights

What's Changed

Contributors

Uh oh!

Highlights:

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

Highlights

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

Releases: intel/auto-round

v0.6.0

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.5.1:bug fix release

What's Changed

Contributors

Uh oh!

v0.5.0

Highlights

What's Changed

Contributors

Uh oh!

v0.4.7

Highlights

What's Changed

Contributors

Uh oh!

v0.4.6

Highlights:

What's Changed

Contributors

Uh oh!

v0.4.5

What's Changed

Contributors

Uh oh!

v0.4.4 release

What's Changed

Contributors

Uh oh!

v0.4.3: bug fix release

What's Changed

Contributors

Uh oh!

v0.4.2: bug fix release

Highlights

What's Changed

Contributors

Uh oh!

v0.4.1: bug fix release

What's Changed

Contributors

Uh oh!