-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Float8Quantizer::create_tensor
calculates scale_inv
instead of creating an empty buffer
#1491
Comments
Logically,
Before #1083, we needed messy logic for the scale update after the forward pass: update After #1083, we can decouple In this case, I think the problem is that the API and behavior of |
@timmoon10 I totally understand what #1083 was doing. My question is why we changed away from it in TE 2.0, for example, calculating
Totally agree. |
I'd say that this is just a bug - we should remove this at::reciprocal. The rowwise data I think was added as a way to pass preallocated buffer (which we need e.g. for UserBuffer), we will need to extend that to maybe pass a dictionary of buffers (since e.g. to have UB support for MXFP8 we may need to do the same for the scale_inv @timmoon10? |
TransformerEngine/transformer_engine/pytorch/csrc/extensions/quantizer.cpp
Line 112 in b39397c
This brings some overheads. For example, in
fused_multi_quantize
, thereciprocal
kernels (along with the launch overheads) take most of the overall time.There was an optimization that updates FP8 scale-inverse in kernels with FP8 output #1083, why did we change it?
cc @timmoon10
The text was updated successfully, but these errors were encountered: