from mlx_lm import load, generate
model, tokenizer = load("mlx-community/gemma-4-31B-it-assistant-bf16")
messages = [{"role": "user", "content": "Explain quantum entanglement simply."}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512, verbose=True)
Resulting error: ValueError: Model type gemma4_assistant not supported.
Post from Google about new MTP models: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/
HF link to models: https://huggingface.co/collections/mlx-community/gemma-4-assistant-mtp
Simple script to download and prompt the model:
Resulting error: ValueError: Model type gemma4_assistant not supported.