-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL] Skip subgroup tests when subgroup size doesn't divide local size #15520
Conversation
I believe that the tests assume that the subgroup id for items in the "peeling" subgroups is increased sequentially, but it may not be the case depending on the backend. E.g. in |
I don't think that is right. I don't believe the ordering of the items are guaranteed, but |
I think @PietroGhg might be right. The description of sub-groups in SYCL 2020 is incredibly vague, unfortunately. The line that I think is problematic is this one: size_t SGoff = gid - lid; ...because it assumes that the linear numbering of work-items within a work-group is the same as the linear numbering of work-items within a sub-group. This is probably safe in practice and it's something we wanted to clarify (there's wording in one of the sub-group extensions, somewhere...), but I don't think it's actually guaranteed by the specification. |
My point is not only about linearity, but also about how work items are divided into subgroups in case the max_subgroup_size doesn't divide the local size: in |
I agree on the point of linearity. This is something we falsely assumed in the CTS as well and had to change. However, I still believe the "left-over" group should have items with IDs within the local range it returns. I.e. say we have a launch with a local size of 12 and a sub-group size of 8, then I argue that the specification says that a valid set of IDs for the groups would be: Sub-group 0: {0,2,1,3,4,6,5,7} Or some shuffling of these. However, I do not believe it is right to have: Sub-group 0: {0,2,1,3,4,6,5,7} In sub-group 1 there are only 4 items, so even if the max sub-group size is 8, from SYCL's (and the user's) perspective there are only 4 items in that group. As such, the tests should not skip just because the last group might be smaller. We should be able to get enough information from the kernels to do the checks based on the above and just not do anything outside the range of the last group. |
Oh, I see. Sorry, I only looked at the changes for the specific test rather than the changes to common. You are correct that there is no guarantee all the "left over" work-items will be in the same sub-group.
This is true. The issue is that you can't assume you will only get two sub-groups in this case. An implementation is free to put all the work-items in one remainder sub-group, create a bunch of sub-groups of size 1, or even create a few sub-groups of different sizes. The only restriction is that each sub-group must be <= the requested maximum sub-group size. We really need to fix this. 😅 |
Ah, I see! Either way, it should be possible to rewrite these tests to work around this by saving each of their local sizes and base the checks on that, right? However, if we think the specification should be stricter, I would argue that a first step would be to guarantee that our backends follow the stricter pattern. Since these tests are for our implementation, we could make the stronger assumptions based on that for now. |
The relevant SPIR-V wording is being discussed over on #11301. @steffenlarsen, @PietroGhg, if you could take a look at that and give feedback it would be very helpful. Once the SPIR-V extension is defined, we can expose equivalent SYCL features. |
Thanks @Pennycook, the key point I was missing here is that the SYCL spec doesn't mandate that there can be only one subgroup with |
It does; it says:
It's a little indirect, but this modulo behavior would prevent using multiple sub-groups of size 1 to handle the case where things do not divide nicely.
This is a tricky one, and I think it really comes down to how much work you think it would be to refactor the tests... I think I'm leaning towards making sure that the Naive CPU device can pass the tests as written, though, since most users of that backend will probably want the behavior to match GPU devices. Note that it would still be useful for the Native CPU device to support the mode where it generates sub-groups of size 1. Even with the SPIR-V extension, we're planning to provide a mode for developers to declare that they don't need to know the sub-group mapping, and SYCL |
Thanks @Pennycook @steffenlarsen, I understand subgroups better now. I'll close this PR and have a discussion with the rest of my team about the best way to proceed for Native CPU. |
Skips tests in
Subgroup/barrier.cpp
andSubgroup/common.cpp
when the maximum sub group size doesn't divide the local size.