Add option to disable MMA support on Turing (again) #15468

pt13762104 · 2025-08-21T08:58:10Z

Continuation of #15360.

JohannesGaessler · 2025-08-21T12:03:36Z

In case I haven't made it clear enough: I will not approve this PR. It should already be possible to achieve the exact same effect with the existing compilation options, this just adds additional complexity. Document how to get optimal performance for GTX 1600 instead, ideally by printing a warning when trying to run the code on GTX 1600 with Turing mma enabled.

IMbackK · 2025-08-21T12:57:29Z

We could avoid the affected path at runtime - the easiest way would be to check if the current compile options trigger this problem and use the cublas path if so, presumably that performs at least better.

Since someone could have a turing and crippled turing in the same system i think this is the only really good option.

JohannesGaessler · 2025-08-21T13:22:10Z

The optimal path is to use the dp4a mmq kernels but those aren't being compiled for CC 7.5 because tensor core instructions are available. Quite frankly, the number of GTX 1600 GPUs in circulation is so low and they're so old that I don't think it's worth the maintenance effort to either add dedicated compilation options or refactor the code to compile both dp4a and mma variants. It would be a different story if NVIDIA had just given GTX 1600 a compute capability that allows for differentiation at compile time.

IMbackK · 2025-08-21T14:03:45Z

Yeah i know @dp4a mma path. But selecting the cublas path in this situation costs almost nothing code complexity wise as its always there anyhow and at least gives the user something that performs okish and works on mixed turing/crippled turing setups. I dont know about circulation, but i dont find them terribly old yet.

that presumes that cublas performs resonably well here ofc, which i dont know.

pt13762104 · 2025-08-22T02:02:07Z

It seems like -DCMAKE_CUDA_ARCHITECTURES=61 -DGGML_CUDA_FORCE_MMQ=1 gets the best performance. It gets slightly higher performance than Cublas (~8 vs ~7.5 TFLOPS).

pt13762104 · 2025-08-24T13:25:56Z

@JohannesGaessler Should a warning be added, or a fallback path to Cublas instead?
Side note: The check for TU11x can be done by checking the number of SMs, all TU11x devices have SM count <=24, but TU10x devices have >= 30 SMs (Excluding the GTX 1650 TU106, which doesn't have tensor cores anyway.).

JohannesGaessler · 2025-08-24T14:37:55Z

I think the least bad behavior is to check the compute capabilities, device name, and compilation options during the initialization of the CUDA backend. If the CC and device name match the GTX 1600 GPUs, print a warning that best performance requires different compilation options.

pt13762104 · 2025-08-24T15:03:35Z

Is this good enough? pt13762104@4b40a71

JohannesGaessler · 2025-08-24T17:39:00Z

Do the check using the device name instead of the SM count.
Explicitly print the name of the device with suboptimal performance.

pt13762104 · 2025-08-25T02:45:31Z

I think all TU11x GPUs have this problem, not just the 16 series.

JohannesGaessler · 2025-08-25T08:11:33Z

To my knowledge there are only the MX450 and the MX550 as other affected GPUs. It's fine to just check whether GTX 16XX or MX 450 or MX 550 is in the GPU name.

pt13762104 · 2025-08-25T10:41:02Z

pt13762104@e236d2e One last question: It seems like the compilation options will be unsupported in CUDA 13 (they remove support for Pascal entirely), so, should they keep using CUDA 12 to compile the code? (edit: Consider that the Windows build still uses CUDA 12.4, probably yes.)

JohannesGaessler · 2025-08-25T10:54:40Z

Looks good, I would just change the wording a little. If you make a PR I can make suggestions for such changes directly in the Github UI.

Add option (again)

651fdf4

pt13762104 requested a review from JohannesGaessler as a code owner August 21, 2025 08:58

pt13762104 marked this pull request as draft August 21, 2025 09:08

pt13762104 marked this pull request as ready for review August 21, 2025 09:20

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 21, 2025

pt13762104 marked this pull request as draft August 21, 2025 10:21

pt13762104 closed this Aug 21, 2025

pt13762104 deleted the turing-disable-mma-2 branch August 21, 2025 15:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add option to disable MMA support on Turing (again) #15468

Add option to disable MMA support on Turing (again) #15468

Uh oh!

pt13762104 commented Aug 21, 2025

Uh oh!

JohannesGaessler commented Aug 21, 2025

Uh oh!

IMbackK commented Aug 21, 2025 •

edited

Loading

Uh oh!

JohannesGaessler commented Aug 21, 2025

Uh oh!

IMbackK commented Aug 21, 2025 •

edited

Loading

Uh oh!

pt13762104 commented Aug 22, 2025 •

edited

Loading

Uh oh!

pt13762104 commented Aug 24, 2025 •

edited

Loading

Uh oh!

JohannesGaessler commented Aug 24, 2025

Uh oh!

pt13762104 commented Aug 24, 2025

Uh oh!

JohannesGaessler commented Aug 24, 2025

Uh oh!

pt13762104 commented Aug 25, 2025

Uh oh!

JohannesGaessler commented Aug 25, 2025

Uh oh!

pt13762104 commented Aug 25, 2025 •

edited

Loading

Uh oh!

JohannesGaessler commented Aug 25, 2025

Uh oh!

Uh oh!

Add option to disable MMA support on Turing (again) #15468

Add option to disable MMA support on Turing (again) #15468

Uh oh!

Conversation

pt13762104 commented Aug 21, 2025

Uh oh!

JohannesGaessler commented Aug 21, 2025

Uh oh!

IMbackK commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler commented Aug 21, 2025

Uh oh!

IMbackK commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pt13762104 commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pt13762104 commented Aug 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler commented Aug 24, 2025

Uh oh!

pt13762104 commented Aug 24, 2025

Uh oh!

JohannesGaessler commented Aug 24, 2025

Uh oh!

pt13762104 commented Aug 25, 2025

Uh oh!

JohannesGaessler commented Aug 25, 2025

Uh oh!

pt13762104 commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler commented Aug 25, 2025

Uh oh!

Uh oh!

IMbackK commented Aug 21, 2025 •

edited

Loading

IMbackK commented Aug 21, 2025 •

edited

Loading

pt13762104 commented Aug 22, 2025 •

edited

Loading

pt13762104 commented Aug 24, 2025 •

edited

Loading

pt13762104 commented Aug 25, 2025 •

edited

Loading