-
Notifications
You must be signed in to change notification settings - Fork 386
enable smoothquant for int8 static tensor #3468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary: This PR creates a new Int8Tensor and updates the configs to use the new Int8Tensor flow Test Plan: To ensure BC: ``` pytest test/quantization/test_quant_api.py ``` To test new Int8Tensor: ``` pytest test/quantization/quantize_/workflows/int8/test_int8_tensor.py ``` Reviewers: Subscribers: Tasks: Tags:
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3468
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 2586ab6 with merge base f99105a ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
cc @Xia-Weiwen and @cyxlily fyi |
torchao/prototype/smoothquant/api.py
Outdated
| qw = quant_mod.weight | ||
|
|
||
| # Add smoothing factor metadata | ||
| qw = to_weight_tensor_with_linear_activation_scale_metadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not be using this, please check awq on how this should be implemented in the new stack:
ao/torchao/prototype/awq/api.py
Lines 108 to 113 in 08e5e20
| assert isinstance(qw, SupportsActivationPreScaling), ( | |
| "weight must support activation scaling through implementing `SupportsActivationPreScaling`" | |
| ) | |
| # since we want to do `act` * `act_pre_scale` during runtime for speed, we'll save the | |
| # reciprocal of the `equalization_scale` | |
| qw.act_pre_scale = 1.0 / equalization_scale |
| """ | ||
|
|
||
| scale: torch.Tensor | ||
| scale: torch.Tensor = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Optional[torch.Tensor]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also maybe static_scale might be more descriptive I feel
|
@jcaip Our customer needs activation quantization PerTensor and weight quantization PerRow. Will you implement it, or may I create a new PR to do it? |
|
@cyxlily feel free to open a new PR for activation per tensor x weight per row, it's not something im planning to do currently. Thank you for your smoothquant pr btw, I used it to implement this. |
0c23589 to
f389a94
Compare
| sqnr_static_compile | ||
| == sqnr_static_eager | ||
| == sqnr_dynamic_compile | ||
| == sqnr_dynamic_eager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we trying to say static_out_compile == static_out_eager == dynamic_out_compile == dynamic_out_eager here? if so I think it might be clearer just to assert all these are equal to each other
| else: | ||
| raise ValueError(f"Unexpected step: {step}") | ||
|
|
||
| if isinstance(base_config, Int8StaticActivationInt8WeightConfig): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we shouldn't have specific config here, maybe change this to a similar protocol like SupportsActivationPreScaling for config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think figuring out how to do this generally will need a bit more design, we'd need to figure out how to map to the appropriate QuantizeTensorToInt/FloatXKwargs object. Agree we should be able to do this though, but can I address in a later PR?
f389a94 to
2586ab6
Compare
| block_size, | ||
| self.dtype, | ||
| act_quant_kwargs=self.act_quant_kwargs, | ||
| act_scale=self.act_scale, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess slice doesn't work for static quant int8 before, can you add a test for that?
| old_int8_tensor.scale[index], | ||
| old_int8_tensor.block_size[1:], | ||
| old_int8_tensor.dtype, | ||
| old_int8_tensor.act_scale, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for this one, seems like select op breaks before with static quant
This PR hooks up the static quant workflow added in #3442 to the prototype smoothquant API.
You can use the new flow like follows: