[Question] Static Quantization for Open-Source LLMs

## Description
Hi, I am a beginner in quantization and would like to experiment with INT8 dynamic and static quantization on open-source LLMs.

* For dynamic quantization, I found that `int8_dynamic_activation_int8_weight` is available in `torchao/quantization/quant_api.py`.
* For static quantization, I did not find an INT8 version. Instead, I only found `float8_static_activation_float8_weight`.

## Questions
* Why is only INT8 dynamic quantization provided? Is there a specific concern that prevents static INT8 quantization?
* If I want to implement INT8 static quantization, can I follow `tutorials/calibration_flow/static_quant.py` as a reference?
* For `float8_static_activation_float8_weight`, it requires a scalar parameter. What would be a recommended way to determine this parameter?

Any insights or guidance would be greatly appreciated. Thanks in advance! 😊

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Static Quantization for Open-Source LLMs #1724

Description

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Static Quantization for Open-Source LLMs #1724

Description

Description

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions