MX-scale discrepancy during quantization and dequantization

In the case of very small numbers input numbers around the subnormal range of `torch.float` or `torch.bfloat16`, the scale exponent will take its smallest unbiased value: `-127`. However, you only allow division with a scale of `2**-126`  in line [143](https://github.com/pytorch/ao/blob/6b529961bd0b41953d26cdde5851f460839d5cf6/torchao/prototype/mx_formats/mx_tensor.py#L143) of `mx_tensor.py`. This is because of an incompatibility with triton.

However, during dequantization you use the the smaller scale of `2**-127` when calling 
```python
s_fp = get_fp_scale(scale_e8m0).reshape(-1, 1).to(target_dtype
```
in line [235](https://github.com/pytorch/ao/blob/6b529961bd0b41953d26cdde5851f460839d5cf6/torchao/prototype/mx_formats/mx_tensor.py#L238). Why not clip the exponent to `-126` in the `get_fp_scale` function

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MX-scale discrepancy during quantization and dequantization #1104

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MX-scale discrepancy during quantization and dequantization #1104

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions