Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Attention Performance] FlashAttention2 backward get to 80%~90% of XeTLA #2159

Open
Dewei-Wang-sh opened this issue Sep 9, 2024 · 2 comments

Comments

@Dewei-Wang-sh
Copy link
Contributor

Dewei-Wang-sh commented Sep 9, 2024

this serves as an umbrella issue.
things to start with, will add more when dive into the backward code.

  1. refactor tt-to-ttgpu-warp pass
  2. xetla backward investigation
@Dewei-Wang-sh
Copy link
Contributor Author

dive into the backward algorithm, trying to split the task to separate issues.

@Dewei-Wang-sh
Copy link
Contributor Author

two way to make it work, need more discussion.

  1. rewrite the code to block-pointer and then add support for backward related feature.
  2. keep the non-block pointer way, and follow along what nv does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants