FlashAttention tutorial requires relaxed verification in advanced path (`perf_attn`)

Comparing Triton vs XeTLA FlashAttention output in [FlashAttention](https://github.com/intel/intel-xpu-backend-for-triton/blob/7ea022d59cea9eb1660649ce8d7e9d05c13ca313/python/tutorials/06-fused-attention.forward.py) using `atol=1e-2, rtol=0` as in [upstream](https://github.com/triton-lang/triton/blob/15325e6eb1e048ba77f8c666be8b5c45cee8ab33/python/tutorials/06-fused-attention.py#L551) leads to size `1 32 16384 64` missing verification. A more relaxed `atol=1e-1` value verifies, but this might be a bit too permissive taking into account values will be less than 1 anyway (FlashAttention is a SoftMax).

In order to reproduce, add the following code to the `forward` function, right before the `return`:

```python
torch_output = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=None, dropout_p=0.0, is_causal=False).to(torch.float32)
torch.testing.assert_close(o, torch_output, atol=1e-2, rtol=0)
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FlashAttention tutorial requires relaxed verification in advanced path (`perf_attn`) #2098

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FlashAttention tutorial requires relaxed verification in advanced path (perf_attn) #2098

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

FlashAttention tutorial requires relaxed verification in advanced path (`perf_attn`) #2098