Skip to content

GLM-4.7 PRISM loads and passes inspect after fixing num_nextn_predict_layers, but MTP generation crashes with AttributeError: LanguageModel has no attribute fa_idx #65

@godmonero

Description

@godmonero

mtplx doctor --json

Hi, I’m testing MTPLX 0.3.5 on a Mac Studio M3 Ultra (512 GB RAM) installed via Homebrew.

Environment

  • MTPLX 0.3.5
  • Apple M3 Ultra
  • 512 GiB unified memory
  • Python 3.13.13 arm64
  • MLX 0.31.2
  • mlx_lm 0.31.3

Doctor
I ran mtplx doctor --json and the environment looks healthy overall:

  • native arm64 Python: pass
  • MLX import/device: pass
  • estimated runtime memory well below available memory
  • no low power mode
  • no thermal warnings
    Only warnings are unrelated (default Qwen cache missing, port 8000 already open, no ThermalForge).

Model
Local model:
/Users/macstudio/.lmstudio/models/mlx-community/GLM-4.7-PRISM-8bit-gs64-mlx

Initial behavior
The model originally showed:

  • No MTP head · Glm4MoeForCausalLM

I found that:

  • the converted config had "num_nextn_predict_layers": 0
  • the original BF16 model had "num_nextn_predict_layers": 1

After adding mtp.safetensors and changing:
"num_nextn_predict_layers": 1

mtplx inspect changed to:

  • recognized: true
  • can_run: true
  • runtime_compatibility: native-family-gated
  • mtp_layers: 1
  • mtp_tensors_present: 502

So inspect now considers the model runnable.

Runtime result
The model loads successfully in sustained profile and enters:

  • Generation mode: MTP
  • Native-MTP speed path: draft-only LM head is active

But on the first prompt, generation crashes with:

AttributeError: 'LanguageModel' object has no attribute 'fa_idx'

Stack trace points into:

  • generate_mtpk
  • forward_ar_capture
  • forward_with_gdn_capture
  • gdn_capture.py

where it accesses:
cache[inner.fa_idx]

Additional observation
If I run with /mtp off, the model does generate, but it is extremely slow and GPU usage is unstable compared with direct MLX inference.

Question
Is GLM-4.7 PRISM / Glm4MoeForCausalLM expected to work in MTPLX 0.3.5 with MTP, or is this currently a known issue in the GLM runtime path?

doctor_output.json

Exact command

mtplx start --model /Users/macstudio/.lmstudio/models/mlx-community/GLM-4.7-PRISM-8bit-gs64-mlx

Model path or repo id

/Users/macstudio/.lmstudio/models/mlx-community/GLM-4.7-PRISM-8bit-gs64-mlx

Chip, RAM, macOS version

Apple M3 Ultra, 512 GB RAM, macOS 26.4.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions