Skip to content

Qwen3-Next target crashes in speculative rollback hook: Qwen3NextGatedDeltaNet has no attribute in_proj_qkv #33

@JareyLi

Description

@JareyLi

Body

## Summary

When using dflash-mlx with a Qwen3-Next target model, DFlash generation crashes in the speculative linear cache hook:

```text
'Qwen3NextGatedDeltaNet' object has no attribute 'in_proj_qkv'

Environment

  • dflash-mlx: 0.1.5.1
  • mlx-lm: >=0.31.x
  • Platform: Apple Silicon / macOS
  • Target model examples:
    • Qwen3-Coder-Next-8bit
    • Qwen3-Coder-Next-oQ6
  • Draft model:
    • Qwen3-Coder-Next-DFlash

Error

DFlash streaming generation error: 'Qwen3NextGatedDeltaNet' object has no attribute 'in_proj_qkv'

Suspected cause

dflash_mlx/engine/target_qwen_gdn.py installs _install_speculative_linear_cache_hook() for linear attention layers.

Inside speculative_call, it assumes the GatedDeltaNet module has Qwen3.5-style projection attributes:

qkv = self.in_proj_qkv(inputs)
z_proj = self.in_proj_z(inputs)
b = self.in_proj_b(inputs)
a = self.in_proj_a(inputs)

But mlx_lm.models.qwen3_next.Qwen3NextGatedDeltaNet uses the Qwen3-Next projection layout:

self.in_proj_qkvz
self.in_proj_ba
self.fix_query_key_value_ordering(...)

So the hook works for Qwen3.5/Qwen3.6 GatedDeltaNet-style modules, but crashes for Qwen3-Next.

Expected behavior

dflash-mlx should either:

  1. support Qwen3-Next GatedDeltaNet by using in_proj_qkvz + in_proj_ba, or
  2. detect Qwen3-Next and disable the speculative recurrent rollback hook / mark it unsupported with a clear error.

Notes

This does not appear to be an mlx-lm missing attribute bug. The Qwen3-Next implementation intentionally uses fused in_proj_qkvz and in_proj_ba projections.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions