-
Notifications
You must be signed in to change notification settings - Fork 293
Add support for float8 activation for Int4GroupwisePreshuffleTensor #2437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: jerryzh168/stack/2
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2437
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New Failures, 1 Unrelated FailureAs of commit c13489e with merge base 2d61be8 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
d6d3477
to
26517e8
Compare
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
26517e8
to
d187f78
Compare
d187f78
to
2fcff42
Compare
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
2fcff42
to
95856ed
Compare
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
95856ed
to
1dec2cb
Compare
1dec2cb
to
1645c79
Compare
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
1645c79
to
5e9e869
Compare
# making sure param.data is updated | ||
assert param.data.packed_weight[0][0] != orig_value | ||
|
||
def test_bmm(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit tests are nice, feels like it would be good to also have an integration test to cover all of these ops in one go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e2e tests can be the ones we run in vllm/sglang integration on real models I think?
@@ -2040,6 +2040,8 @@ class FbgemmConfig(AOBaseConfig): | |||
weight_dtype (torch.dtype): weight dtype of the kernel | |||
output_dtype (torch.dtype): output dtype of the kernel | |||
group_size (int): The group size for weight | |||
preshuffle (bool): whether preshuffle the weights or not | |||
activation_dtype_for_int4 (str): the dtype for activation for int4 weight, either bf16 or fp8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's confusing to have both input_dtype
and activation_dtype_for_int4
, what are your thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it's a bit confusing, although this is temporary, I'm deprecating this later in the stack: #2474
I'll remove this one and just use input_dtype
) | ||
|
||
|
||
class Float8ActivationInt4GroupwisePreshuffleTensor(TorchAOBaseTensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not extend the existing Int4GroupwisePreshuffleTensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry this should be removed
c13fa2b
to
cc359e6
Compare
cc359e6
to
b86939c
Compare
b86939c
to
9ff26dd
Compare
9ff26dd
to
b8466ab
Compare
b8466ab
to
e666ef4
Compare
Summary: Added basic op support like linear and bmm, we have both float8 and bf16 in the same Tensor because it's the same dtype, only difference is whether the activation is quantized or not. Although there is some differneces in implementation: bf16 activaton: * group_scale * group_zero fp8 activation * group_scale * row_scale Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
e666ef4
to
c13489e
Compare
Stacked PRs:
transpose_input
from fbgemm configs #2422Add support for float8 activation for Int4GroupwisePreshuffleTensor
Summary:
Added basic op support like linear and bmm, we have both float8 and bf16 in the same Tensor
because it's the same dtype, only difference is whether the activation is quantized or not. Although
there is some differneces in implementation:
bf16 activaton:
fp8 activation
Test Plan:
python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py
Reviewers:
Subscribers:
Tasks:
Tags: