GLM-4.7 PRISM loads and passes inspect after fixing num_nextn_predict_layers, but MTP generation crashes with AttributeError: LanguageModel has no attribute fa_idx

### mtplx doctor --json

Hi, I’m testing MTPLX 0.3.5 on a Mac Studio M3 Ultra (512 GB RAM) installed via Homebrew.

Environment
- MTPLX 0.3.5
- Apple M3 Ultra
- 512 GiB unified memory
- Python 3.13.13 arm64
- MLX 0.31.2
- mlx_lm 0.31.3

Doctor
I ran `mtplx doctor --json` and the environment looks healthy overall:
- native arm64 Python: pass
- MLX import/device: pass
- estimated runtime memory well below available memory
- no low power mode
- no thermal warnings
Only warnings are unrelated (default Qwen cache missing, port 8000 already open, no ThermalForge).

Model
Local model:
`/Users/macstudio/.lmstudio/models/mlx-community/GLM-4.7-PRISM-8bit-gs64-mlx`

Initial behavior
The model originally showed:
- `No MTP head · Glm4MoeForCausalLM`

I found that:
- the converted config had `"num_nextn_predict_layers": 0`
- the original BF16 model had `"num_nextn_predict_layers": 1`

After adding `mtp.safetensors` and changing:
`"num_nextn_predict_layers": 1`

`mtplx inspect` changed to:
- recognized: true
- can_run: true
- runtime_compatibility: native-family-gated
- mtp_layers: 1
- mtp_tensors_present: 502

So inspect now considers the model runnable.

Runtime result
The model loads successfully in `sustained` profile and enters:
- `Generation mode: MTP`
- `Native-MTP speed path: draft-only LM head is active`

But on the first prompt, generation crashes with:

`AttributeError: 'LanguageModel' object has no attribute 'fa_idx'`

Stack trace points into:
- `generate_mtpk`
- `forward_ar_capture`
- `forward_with_gdn_capture`
- `gdn_capture.py`

where it accesses:
`cache[inner.fa_idx]`

Additional observation
If I run with `/mtp off`, the model does generate, but it is extremely slow and GPU usage is unstable compared with direct MLX inference.

Question
Is GLM-4.7 PRISM / Glm4MoeForCausalLM expected to work in MTPLX 0.3.5 with MTP, or is this currently a known issue in the GLM runtime path?

[doctor_output.json](https://github.com/user-attachments/files/27746309/doctor_output.json)

### Exact command

mtplx start --model /Users/macstudio/.lmstudio/models/mlx-community/GLM-4.7-PRISM-8bit-gs64-mlx

### Model path or repo id

/Users/macstudio/.lmstudio/models/mlx-community/GLM-4.7-PRISM-8bit-gs64-mlx

### Chip, RAM, macOS version

Apple M3 Ultra, 512 GB RAM, macOS 26.4.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GLM-4.7 PRISM loads and passes inspect after fixing num_nextn_predict_layers, but MTP generation crashes with AttributeError: LanguageModel has no attribute fa_idx #65

mtplx doctor --json

Exact command

Model path or repo id

Chip, RAM, macOS version

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

GLM-4.7 PRISM loads and passes inspect after fixing num_nextn_predict_layers, but MTP generation crashes with AttributeError: LanguageModel has no attribute fa_idx #65

Description

mtplx doctor --json

Exact command

Model path or repo id

Chip, RAM, macOS version

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions