Open
Description
Description
Hi, I am a beginner in quantization and would like to experiment with INT8 dynamic and static quantization on open-source LLMs.
- For dynamic quantization, I found that
int8_dynamic_activation_int8_weight
is available intorchao/quantization/quant_api.py
. - For static quantization, I did not find an INT8 version. Instead, I only found
float8_static_activation_float8_weight
.
Questions
- Why is only INT8 dynamic quantization provided? Is there a specific concern that prevents static INT8 quantization?
- If I want to implement INT8 static quantization, can I follow
tutorials/calibration_flow/static_quant.py
as a reference? - For
float8_static_activation_float8_weight
, it requires a scalar parameter. What would be a recommended way to determine this parameter?
Any insights or guidance would be greatly appreciated. Thanks in advance! 😊