Skip to content

Model size after quantization #1701

Open
@TaylorYangX

Description

@TaylorYangX

Why is the size relationship of the model unreasonable after I use these three quantization methods on the same model?

from torchao.quantization import quantize_, int8_weight_only
quantize_(new_model, int8_weight_only())


# from torchao.quantization import quantize_, int8_dynamic_activation_int8_weight
# quantize_(new_model, int8_dynamic_activation_int8_weight())


# from torchao.quantization import int8_dynamic_activation_int4_weight
# quantize_(new_model, int8_dynamic_activation_int4_weight())

the result:

20786584 Feb  5 13:46 a8w4SWaT.pte
20373272 Feb  5 13:45 a8w8SWaT.pte
29685120 Oct  5 13:12 pytorch_checkpoint.pth
20262664 Feb  5 13:44 w8onlySWaT.pte

Because theoretically, the model after using the A8W4 quantization method should be the smallest, but the actual results are different

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions