-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FlashAttention tutorial requires relaxed verification in advanced path (perf_attn
)
#2098
Comments
perf_attn
)
some different IR from poc, need to check they are the same. |
I see for that particular corner case we are more precise (even in the second example). However, could we test this for random inputs, e.g., comparing with XeTLA and other vendors like CUDA or CPU? If this is indeed an XeTLA issue, we could report to them. |
I've verified that Triton's result can match CUDA's using |
Thanks for the investigation! Good findings! |
Track: pytorch/pytorch#135085 |
Let's revisit this issue, pytorch/pytorch#135085 is closed. |
Still OOM on 1100, and take too much time on 1550 (not sure it's hung or a slow execution). Used |
Confirmed SDPA for XPU is a feature targeted Pytorch 2.7. |
Tracking in pytorch/pytorch#140389 |
Comparing Triton vs XeTLA FlashAttention output in FlashAttention using
atol=1e-2, rtol=0
as in upstream leads to size1 32 16384 64
missing verification. A more relaxedatol=1e-1
value verifies, but this might be a bit too permissive taking into account values will be less than 1 anyway (FlashAttention is a SoftMax).In order to reproduce, add the following code to the
forward
function, right before thereturn
:The text was updated successfully, but these errors were encountered: