Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduction performance with subroup size 16 #1868

Open
sommerlukas opened this issue Aug 13, 2024 · 0 comments
Open

Reduction performance with subroup size 16 #1868

sommerlukas opened this issue Aug 13, 2024 · 0 comments

Comments

@sommerlukas
Copy link
Contributor

The investigation in #1371 has revealed that in some cases kernels perform slower with subgroup-size 16 than with subgroup-size 32.

Further analysis of one of the outliers (BlenderbotSmallForCausalLM inference with amp_fp16) revealed that a kernel with reduction was particularly affected by the change in sub-group size.

We should investigate why the change in subgroup-size causes a difference in performance and fix if possible.

The example kernel and more information can be found in this comment on #1371.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants