Skip to content

[Question] Static Quantization for Open-Source LLMs #1724

Open
@yang-ahuan

Description

@yang-ahuan

Description

Hi, I am a beginner in quantization and would like to experiment with INT8 dynamic and static quantization on open-source LLMs.

  • For dynamic quantization, I found that int8_dynamic_activation_int8_weight is available in torchao/quantization/quant_api.py.
  • For static quantization, I did not find an INT8 version. Instead, I only found float8_static_activation_float8_weight.

Questions

  • Why is only INT8 dynamic quantization provided? Is there a specific concern that prevents static INT8 quantization?
  • If I want to implement INT8 static quantization, can I follow tutorials/calibration_flow/static_quant.py as a reference?
  • For float8_static_activation_float8_weight, it requires a scalar parameter. What would be a recommended way to determine this parameter?

Any insights or guidance would be greatly appreciated. Thanks in advance! 😊

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions