@will-lms Introduced in #298 (Qwen 3.5 Unified).
PatchedQwen3_5TextModel.__call__ returns self.norm(hidden_states), but the class it overrides (Qwen3_5TextModel) is expected to return un-normed hidden states. mlx-lm's TextModel.__call__ applies self.model.norm itself before the lm_head:
# mlx_lm/models/qwen3_5.py in TextModel.__call__
hidden = self.model(...) # expects un-normed hidden
normed = self.model.norm(hidden)
out = self.lm_head(normed)
Since self.model is now PatchedQwen3_5TextModel, hidden is already normed, and self.model.norm is applied a second time.
This causes a double RMSNorm before the lm_head on all Qwen3.5 text and vision inference.
Fix
Do not call self.norm in mlx_engine/model_kit/patches/qwen3_5.py:
- return self.norm(hidden_states)
+ return hidden_states
@will-lms Introduced in #298 (Qwen 3.5 Unified).
PatchedQwen3_5TextModel.__call__returnsself.norm(hidden_states), but the class it overrides (Qwen3_5TextModel) is expected to return un-normed hidden states. mlx-lm'sTextModel.__call__appliesself.model.normitself before the lm_head:Since
self.modelis nowPatchedQwen3_5TextModel,hiddenis already normed, andself.model.normis applied a second time.This causes a double RMSNorm before the lm_head on all Qwen3.5 text and vision inference.
Fix
Do not call
self.norminmlx_engine/model_kit/patches/qwen3_5.py: