Summary
Calling cactus_complete with the google/gemma-4-E4B-it model and an audio input (either via pcm_data or an "audio": ["path"] field on a user message) consistently fails with:
[WARN] [npu] [gemma4] model.mlpackage not found; using CPU prefill
[WARN] [npu] [gemma4-audio] unsupported NPU input shape; falling back to CPU audio encoder
[ERROR] [complete] Exception: unordered_map::at: key not found
Text-only cactus_complete on the same loaded handle works fine. The error is reproducible across two API shapes and multiple audio durations (1.64 s, 4.54 s, 30 s padded), all int16 mono 16 kHz PCM.
Environment
cactus CLI installed via brew install cactus-compute/cactus/cactus, version reported by brew: cactus 1.13_1
- macOS 26.3.1, Apple Silicon (M4 Pro), 64 GB RAM
- Xcode command line tools present; Python 3.12.x venv created via
source ./setup
- Weights downloaded via
cactus download google/gemma-4-E4B-it; source zip: Cactus-Compute/gemma-4-E4B-it/gemma-4-e4b-it-int4-apple.zip. Extracted directory contains 2088 files, including audio_encoder.mlpackage and audio_encoder.mlmodelc, but no model.mlpackage at the root of the weights dir.
libcactus.dylib built from source with cactus build --python (commit HEAD of main, fetched 2026-04-14).
Reproducer (Python, minimal)
import sys, json, wave, subprocess
sys.path.insert(0, "cactus/python/src")
from cactus import cactus_init, cactus_complete, cactus_destroy
# 1. Generate a test WAV using macOS `say`
subprocess.run(["say", "-v", "Samantha", "-o", "/tmp/q.aiff",
"What is the capital of France?"], check=True)
subprocess.run(["afconvert", "-f", "WAVE", "-d", "LEI16@16000", "-c", "1",
"/tmp/q.aiff", "/tmp/q.wav"], check=True)
# 2. Load Gemma 4 E4B
model = cactus_init("/opt/homebrew/Cellar/cactus/1.13_1/libexec/weights/gemma-4-e4b-it",
None, False)
# 3a. Try native audio-in via pcm_data (6th arg)
with wave.open("/tmp/q.wav", "rb") as w:
pcm = w.readframes(w.getnframes())
msgs = json.dumps([{"role": "user",
"content": "Answer the question I just spoke in one short sentence."}])
opts = json.dumps({"max_tokens": 40})
try:
raw = cactus_complete(model, msgs, opts, None, None, pcm)
print("pcm_data path:", json.loads(raw).get("response"))
except Exception as e:
print("pcm_data path FAILED:", e)
# 3b. Try native audio-in via "audio" field on the message
msgs_audio = json.dumps([{
"role": "user",
"content": "Answer the question I just spoke in one short sentence.",
"audio": ["/tmp/q.wav"]
}])
try:
raw = cactus_complete(model, msgs_audio, opts, None, None)
print("audio-field path:", json.loads(raw).get("response"))
except Exception as e:
print("audio-field path FAILED:", e)
# 4. Control: text-only call on the same handle succeeds
msgs_text = json.dumps([{"role": "user", "content": "What is the capital of France?"}])
raw = cactus_complete(model, msgs_text, opts, None, None)
print("text-only path:", json.loads(raw).get("response"))
cactus_destroy(model)
Output on my machine:
[WARN] [npu] [gemma4] model.mlpackage not found; using CPU prefill
[WARN] [npu] [gemma4-audio] unsupported NPU input shape; falling back to CPU audio encoder
[ERROR] [complete] Exception: unordered_map::at: key not found
pcm_data path FAILED: Completion failed
[ERROR] [complete] Exception: unordered_map::at: key not found
audio-field path FAILED: Completion failed
text-only path: Paris.
What I checked
- The weights dir contains
audio_encoder.mlpackage, audio_encoder.mlmodelc, vision_encoder.mlpackage, vision_encoder.mlmodelc, and ~2088 per-tensor .weights files. It does not contain a top-level model.mlpackage, which I suspect is related to the first warning ([npu] [gemma4] model.mlpackage not found; using CPU prefill). Re-running cactus download google/gemma-4-E4B-it --reconvert did not add it.
- Padding the audio to exactly 30 × 16000 × 2 = 960000 bytes did not change the outcome.
- Changing the system/user prompt, adding or removing a system message, and adjusting temperature/max_tokens did not change the outcome.
cactus_transcribe(whisper-small, ...) on the same WAV works perfectly and returns the expected text; cactus_complete(gemma-4-e4b-it, ...) with the resulting text also works perfectly (the cascade path is fine).
What I'd like help with
- Is native audio input to Gemma 4 via
cactus_complete supported in 1.13_1, or only in a later release?
- If supported, is the
"audio": [path] field the correct message shape, or is there a different prompt template (e.g., with an <audio> placeholder token)?
- Is the missing
model.mlpackage a known issue with the -apple zip, and is there a way to get the full NPU path for Gemma 4 E4B?
Happy to collect more logs, run with extra tracing, or try a different precision. I'm building a voice agent for the Gemma 4 Voice Agents Hackathon and the on-device audio-in latency story is central to the demo.
Thanks for the excellent engine — looking forward to hacking on it this weekend.
Summary
Calling
cactus_completewith thegoogle/gemma-4-E4B-itmodel and an audio input (either viapcm_dataor an"audio": ["path"]field on a user message) consistently fails with:Text-only
cactus_completeon the same loaded handle works fine. The error is reproducible across two API shapes and multiple audio durations (1.64 s, 4.54 s, 30 s padded), all int16 mono 16 kHz PCM.Environment
cactusCLI installed viabrew install cactus-compute/cactus/cactus, version reported by brew:cactus 1.13_1source ./setupcactus download google/gemma-4-E4B-it; source zip:Cactus-Compute/gemma-4-E4B-it/gemma-4-e4b-it-int4-apple.zip. Extracted directory contains 2088 files, includingaudio_encoder.mlpackageandaudio_encoder.mlmodelc, but nomodel.mlpackageat the root of the weights dir.libcactus.dylibbuilt from source withcactus build --python(commit HEAD ofmain, fetched 2026-04-14).Reproducer (Python, minimal)
Output on my machine:
What I checked
audio_encoder.mlpackage,audio_encoder.mlmodelc,vision_encoder.mlpackage,vision_encoder.mlmodelc, and ~2088 per-tensor.weightsfiles. It does not contain a top-levelmodel.mlpackage, which I suspect is related to the first warning ([npu] [gemma4] model.mlpackage not found; using CPU prefill). Re-runningcactus download google/gemma-4-E4B-it --reconvertdid not add it.cactus_transcribe(whisper-small, ...)on the same WAV works perfectly and returns the expected text;cactus_complete(gemma-4-e4b-it, ...)with the resulting text also works perfectly (the cascade path is fine).What I'd like help with
cactus_completesupported in 1.13_1, or only in a later release?"audio": [path]field the correct message shape, or is there a different prompt template (e.g., with an<audio>placeholder token)?model.mlpackagea known issue with the-applezip, and is there a way to get the full NPU path for Gemma 4 E4B?Happy to collect more logs, run with extra tracing, or try a different precision. I'm building a voice agent for the Gemma 4 Voice Agents Hackathon and the on-device audio-in latency story is central to the demo.
Thanks for the excellent engine — looking forward to hacking on it this weekend.