-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue multiple wgmma operations when CTA k dim is a multiple of 16 #3616
Conversation
!test |
tests/cpp/test_matmul.cpp
Outdated
// NOTE Certain combinations of cta k dimension and circular buffer | ||
// prefetching can get incorrect results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
|
||
mparams.supported_vec_size = {8, 8, 4}; | ||
mparams.supported_vec_size = {8, 8, 8}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, although I think it currently has no meaning until we start handling epilogue inputs with supported vec size.
!test |
d366352
to
0c784e0
Compare
!test |
I think this covers the motivation for #3616
* Remove mma-output support from transformLikeMmaOutput
e3826e2
to
603d5c9
Compare
PR Reviewer Guide 🔍(Review updated until commit 74f2056)Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e68f85c
to
74f2056
Compare
!test |
@jacobhinkle I ran into issues with tests For
Why? Fixes:
|
Maybe we could handle this explicitly in the |
I tried modifying setupInlining this afternoon. InlineMost behaves differently than InlineSelectedAt so InlineSelectedAt won't place the computeAt at the cta tile position. |
I am wondering why I don't see this issue on #3642. I do the K split also: https://github.com/NVIDIA/Fuser/pull/3642/files#diff-e7aea6139145124edbbea47b4cb1541d90715e3be70e54051e42eb14eec7588fR46. |
Review updated until commit 2e703f8 Description
Changes walkthrough 📝
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
f0ffe15
to
0370585
Compare
!test |
Future TODOs:
|
This PR fixes the incorrect results issue when k dimension for CTA tile is a multiple of
getK(mma_macro)
.Why?
scheduleMmaResults
, we need to split thek
reduction bygetK(mma_macro)
. A serial reduction will add the results fromwgmma
along k-dimension.Details
transformLikeMmaOutput
function to not be used inscheduleMmaResults
.