Override SSM_A op for Qwen3 Next to reduce splits #17587

pwilkin · 2025-11-29T01:16:22Z

This massively reduces the number of splits for the Qwen3 Next graph by placing the initial gate tensor on the backend, otherwise it's put on the CPU which recursively poisons all other layers, leading to splits.

pwilkin · 2025-11-29T01:18:49Z

On the test server this improves pp512 t/s from 900 to 1300.

jeffbolznv · 2025-11-29T05:54:52Z

I don't understand this change, but it also improves perf for Vulkan.

I see that the SSM_SCAN operation in this model isn't supported by the Vulkan backend, but even if I mark it supported it still isn't assigned to the GPU split. If we ran it on the GPU would that help?

ggerganov · 2025-11-29T08:28:52Z

It's better to use a different tensor identifier other from LLM_TENSOR_SSM_A that is associated with GGML_OP_MUL. If there isn't a suitable one - add a new one.

pwilkin · 2025-11-29T11:36:44Z

@ggerganov I thought about it, but people will kill me for breaking existing GGUFs :)

pwilkin · 2025-11-29T11:46:21Z

I don't understand this change, but it also improves perf for Vulkan.

I see that the SSM_SCAN operation in this model isn't supported by the Vulkan backend, but even if I mark it supported it still isn't assigned to the GPU split. If we ran it on the GPU would that help?

When graph building is performed, weights have to be assigned to a backend. That's where the tensor default operations come in - they will assign a weight based on the operation that it's supposed to be used in. If the default operation (in case of SSM_A it's SSM_SCAN) is unsupported on a given backend, it will be moved to CPU. Because of that, any tensor that is generated from that weight's projection will be also placed on the CPU, influencing further graph split decisions (that's what I called "poisoning").

CISC · 2025-11-29T16:14:40Z

@ggerganov I thought about it, but people will kill me for breaking existing GGUFs :)

You can fix it without breaking existing GGUFs as in #17548

pwilkin · 2025-11-29T17:00:00Z

@ggerganov I thought about it, but people will kill me for breaking existing GGUFs :)

You can fix it without breaking existing GGUFs as in #17548

Not that way because I'd break other hybrid models that use this tensor.

CISC · 2025-11-29T20:11:10Z

@ggerganov I thought about it, but people will kill me for breaking existing GGUFs :)

You can fix it without breaking existing GGUFs as in #17548

Not that way because I'd break other hybrid models that use this tensor.

Not really, look closely, you simply create a new identifier and map it to the old tensor name for LLM_ARCH_QWEN3NEXT.

Override SSM_A op for Qwen3 Next to reduce splits

0a60230

pwilkin requested a review from CISC as a code owner November 29, 2025 01:16

loci-dev mentioned this pull request Nov 29, 2025

UPSTREAM PR #17587: Override SSM_A op for Qwen3 Next to reduce splits auroralabs-loci/llama.cpp#357

Open

pwilkin added the model Model specific label Nov 29, 2025

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Override SSM_A op for Qwen3 Next to reduce splits #17587

Override SSM_A op for Qwen3 Next to reduce splits #17587

pwilkin commented Nov 29, 2025

Uh oh!

pwilkin commented Nov 29, 2025 •

edited

Loading

Uh oh!

jeffbolznv commented Nov 29, 2025

Uh oh!

ggerganov commented Nov 29, 2025

Uh oh!

pwilkin commented Nov 29, 2025

Uh oh!

pwilkin commented Nov 29, 2025

Uh oh!

CISC commented Nov 29, 2025

Uh oh!

pwilkin commented Nov 29, 2025

Uh oh!

This comment was marked as off-topic.

CISC commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Override SSM_A op for Qwen3 Next to reduce splits #17587

Are you sure you want to change the base?

Override SSM_A op for Qwen3 Next to reduce splits #17587

Conversation

pwilkin commented Nov 29, 2025

Uh oh!

pwilkin commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeffbolznv commented Nov 29, 2025

Uh oh!

ggerganov commented Nov 29, 2025

Uh oh!

pwilkin commented Nov 29, 2025

Uh oh!

pwilkin commented Nov 29, 2025

Uh oh!

CISC commented Nov 29, 2025

Uh oh!

pwilkin commented Nov 29, 2025

Uh oh!

This comment was marked as off-topic.

CISC commented Nov 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pwilkin commented Nov 29, 2025 •

edited

Loading