-
Notifications
You must be signed in to change notification settings - Fork 12.9k
Add option to disable MMA support on Turing (again) #15468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In case I haven't made it clear enough: I will not approve this PR. It should already be possible to achieve the exact same effect with the existing compilation options, this just adds additional complexity. Document how to get optimal performance for GTX 1600 instead, ideally by printing a warning when trying to run the code on GTX 1600 with Turing mma enabled. |
We could avoid the affected path at runtime - the easiest way would be to check if the current compile options trigger this problem and use the cublas path if so, presumably that performs at least better. Since someone could have a turing and crippled turing in the same system i think this is the only really good option. |
The optimal path is to use the dp4a mmq kernels but those aren't being compiled for CC 7.5 because tensor core instructions are available. Quite frankly, the number of GTX 1600 GPUs in circulation is so low and they're so old that I don't think it's worth the maintenance effort to either add dedicated compilation options or refactor the code to compile both dp4a and mma variants. It would be a different story if NVIDIA had just given GTX 1600 a compute capability that allows for differentiation at compile time. |
Yeah i know @dp4a mma path. But selecting the cublas path in this situation costs almost nothing code complexity wise as its always there anyhow and at least gives the user something that performs okish and works on mixed turing/crippled turing setups. I dont know about circulation, but i dont find them terribly old yet. that presumes that cublas performs resonably well here ofc, which i dont know. |
It seems like |
@JohannesGaessler Should a warning be added, or a fallback path to Cublas instead? |
I think the least bad behavior is to check the compute capabilities, device name, and compilation options during the initialization of the CUDA backend. If the CC and device name match the GTX 1600 GPUs, print a warning that best performance requires different compilation options. |
Is this good enough? pt13762104@4b40a71 |
|
I think all TU11x GPUs have this problem, not just the 16 series. |
To my knowledge there are only the MX450 and the MX550 as other affected GPUs. It's fine to just check whether GTX 16XX or MX 450 or MX 550 is in the GPU name. |
pt13762104@e236d2e One last question: It seems like the compilation options will be unsupported in CUDA 13 (they remove support for Pascal entirely), so, should they keep using CUDA 12 to compile the code? (edit: Consider that the Windows build still uses CUDA 12.4, probably yes.) |
Looks good, I would just change the wording a little. If you make a PR I can make suggestions for such changes directly in the Github UI. |
Continuation of #15360.