-
Notifications
You must be signed in to change notification settings - Fork 61
Add epilogue subtiling #948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
PaulZhang12
wants to merge
1
commit into
main
Choose a base branch
from
PaulZhang12/stack/14
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
+1,727
−130
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
PaulZhang12
added a commit
that referenced
this pull request
Oct 15, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
cf439ac to
fcc7492
Compare
PaulZhang12
added a commit
that referenced
this pull request
Oct 15, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
fcc7492 to
cdbedf6
Compare
PaulZhang12
added a commit
that referenced
this pull request
Oct 15, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
cdbedf6 to
58496fb
Compare
oulgen
reviewed
Oct 15, 2025
PaulZhang12
added a commit
that referenced
this pull request
Oct 15, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
58496fb to
965b193
Compare
jansel
requested changes
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this help with matmul perf?
PaulZhang12
added a commit
that referenced
this pull request
Oct 16, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
965b193 to
2bc36d0
Compare
PaulZhang12
added a commit
that referenced
this pull request
Oct 17, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
2bc36d0 to
1c1e282
Compare
PaulZhang12
added a commit
that referenced
this pull request
Oct 20, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
1c1e282 to
cccb0af
Compare
PaulZhang12
added a commit
that referenced
this pull request
Oct 20, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
cccb0af to
a6dd082
Compare
PaulZhang12
added a commit
that referenced
this pull request
Oct 20, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
a6dd082 to
88d46a8
Compare
PaulZhang12
added a commit
that referenced
this pull request
Oct 20, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
88d46a8 to
48eed82
Compare
0826b24 to
0c3d607
Compare
PaulZhang12
added a commit
that referenced
this pull request
Oct 22, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
0c3d607 to
bdf0793
Compare
PaulZhang12
added a commit
that referenced
this pull request
Oct 27, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
bdf0793 to
9856699
Compare
PaulZhang12
added a commit
that referenced
this pull request
Oct 30, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
9856699 to
3ae89e1
Compare
PaulZhang12
added a commit
that referenced
this pull request
Oct 30, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
3ae89e1 to
0ef154f
Compare
|
Any perf data on this one? |
PaulZhang12
added a commit
that referenced
this pull request
Nov 3, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
0ef154f to
4e19822
Compare
PaulZhang12
added a commit
that referenced
this pull request
Nov 5, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
4e19822 to
7e8b05e
Compare
PaulZhang12
added a commit
that referenced
this pull request
Nov 5, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
7e8b05e to
a8c83b6
Compare
PaulZhang12
added a commit
that referenced
this pull request
Nov 5, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
a8c83b6 to
4ebc4f1
Compare
PaulZhang12
added a commit
that referenced
this pull request
Nov 5, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
4ebc4f1 to
5b75ab2
Compare
PaulZhang12
added a commit
that referenced
this pull request
Nov 5, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
5b75ab2 to
f48a0a3
Compare
PaulZhang12
added a commit
that referenced
this pull request
Nov 5, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
f48a0a3 to
95d9ef0
Compare
PaulZhang12
added a commit
that referenced
this pull request
Nov 5, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
95d9ef0 to
a9d2372
Compare
PaulZhang12
added a commit
that referenced
this pull request
Nov 5, 2025
stack-info: PR: #948, branch: PaulZhang12/stack/14
a9d2372 to
a56a3b7
Compare
stack-info: PR: #948, branch: PaulZhang12/stack/14
a56a3b7 to
cd3553e
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stacked PRs:
Co-author: @yf225
Epilogue Subtiling
Add it as an opt-in feature currently, as support for complex epilogues (such as loading a bias + adding to accumulator) is difficult and not currently supported. Furthermore, most kernels do not require epilogue subtiling, as it is generally useful for GEMMs in which the accumulator lives in TMEM for B200.
GEMM CI exhibits ~4% gain, epilogue_subtiling=[2] is often picked as the final config, 0.88x with subtiling, 0.84x without
