Body
## Summary
When using dflash-mlx with a Qwen3-Next target model, DFlash generation crashes in the speculative linear cache hook:
```text
'Qwen3NextGatedDeltaNet' object has no attribute 'in_proj_qkv'
Environment
- dflash-mlx: 0.1.5.1
- mlx-lm: >=0.31.x
- Platform: Apple Silicon / macOS
- Target model examples:
- Qwen3-Coder-Next-8bit
- Qwen3-Coder-Next-oQ6
- Draft model:
Error
DFlash streaming generation error: 'Qwen3NextGatedDeltaNet' object has no attribute 'in_proj_qkv'
Suspected cause
dflash_mlx/engine/target_qwen_gdn.py installs _install_speculative_linear_cache_hook() for linear attention layers.
Inside speculative_call, it assumes the GatedDeltaNet module has Qwen3.5-style projection attributes:
qkv = self.in_proj_qkv(inputs)
z_proj = self.in_proj_z(inputs)
b = self.in_proj_b(inputs)
a = self.in_proj_a(inputs)
But mlx_lm.models.qwen3_next.Qwen3NextGatedDeltaNet uses the Qwen3-Next projection layout:
self.in_proj_qkvz
self.in_proj_ba
self.fix_query_key_value_ordering(...)
So the hook works for Qwen3.5/Qwen3.6 GatedDeltaNet-style modules, but crashes for Qwen3-Next.
Expected behavior
dflash-mlx should either:
- support Qwen3-Next GatedDeltaNet by using
in_proj_qkvz + in_proj_ba, or
- detect Qwen3-Next and disable the speculative recurrent rollback hook / mark it unsupported with a clear error.
Notes
This does not appear to be an mlx-lm missing attribute bug. The Qwen3-Next implementation intentionally uses fused in_proj_qkvz and in_proj_ba projections.
Body
Environment
Error
Suspected cause
dflash_mlx/engine/target_qwen_gdn.pyinstalls_install_speculative_linear_cache_hook()for linear attention layers.Inside
speculative_call, it assumes the GatedDeltaNet module has Qwen3.5-style projection attributes:But
mlx_lm.models.qwen3_next.Qwen3NextGatedDeltaNetuses the Qwen3-Next projection layout:So the hook works for Qwen3.5/Qwen3.6 GatedDeltaNet-style modules, but crashes for Qwen3-Next.
Expected behavior
dflash-mlx should either:
in_proj_qkvz + in_proj_ba, orNotes
This does not appear to be an mlx-lm missing attribute bug. The Qwen3-Next implementation intentionally uses fused
in_proj_qkvzandin_proj_baprojections.