-
Notifications
You must be signed in to change notification settings - Fork 56
Fix scheduling of split-K with smem_epilogue on Hopper #4257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
!test |
Review updated until commit d302129 Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
This can be used to clarify the other tests too, but I will do that in another PR and run codediff etc.
!test --diff |
Failures are unrelated |
Introduces
cacheBefore
to matchcacheAfter
utility, which just propagates entries ingraph_
corresponding to new IDs in the cached tensors. Also avoids re-scheduling tensors if they are split-K sum tensors.There is a current limitation for 32-bit outputs where we skip stmatrix but our current vectorized stores encounter 2-way bank conflicts. This is probably not that important to perf and can be fixed in scheduling of that store in another PR.
Fixes #4159