Open
Description
Why is the size relationship of the model unreasonable after I use these three quantization methods on the same model?
from torchao.quantization import quantize_, int8_weight_only
quantize_(new_model, int8_weight_only())
# from torchao.quantization import quantize_, int8_dynamic_activation_int8_weight
# quantize_(new_model, int8_dynamic_activation_int8_weight())
# from torchao.quantization import int8_dynamic_activation_int4_weight
# quantize_(new_model, int8_dynamic_activation_int4_weight())
the result:
20786584 Feb 5 13:46 a8w4SWaT.pte
20373272 Feb 5 13:45 a8w8SWaT.pte
29685120 Oct 5 13:12 pytorch_checkpoint.pth
20262664 Feb 5 13:44 w8onlySWaT.pte
Because theoretically, the model after using the A8W4 quantization method should be the smallest, but the actual results are different