cactus_complete with Gemma 4 E4B + audio input throws unordered_map::at: key not found

### Summary

Calling `cactus_complete` with the `google/gemma-4-E4B-it` model and an audio input (either via `pcm_data` or an `"audio": ["path"]` field on a user message) consistently fails with:

```
[WARN] [npu] [gemma4] model.mlpackage not found; using CPU prefill
[WARN] [npu] [gemma4-audio] unsupported NPU input shape; falling back to CPU audio encoder
[ERROR] [complete] Exception: unordered_map::at: key not found
```

Text-only `cactus_complete` on the same loaded handle works fine. The error is reproducible across two API shapes and multiple audio durations (1.64 s, 4.54 s, 30 s padded), all int16 mono 16 kHz PCM.

### Environment

- `cactus` CLI installed via `brew install cactus-compute/cactus/cactus`, version reported by brew: `cactus 1.13_1`
- macOS 26.3.1, Apple Silicon (M4 Pro), 64 GB RAM
- Xcode command line tools present; Python 3.12.x venv created via `source ./setup`
- Weights downloaded via `cactus download google/gemma-4-E4B-it`; source zip: `Cactus-Compute/gemma-4-E4B-it/gemma-4-e4b-it-int4-apple.zip`. Extracted directory contains 2088 files, including `audio_encoder.mlpackage` and `audio_encoder.mlmodelc`, but **no** `model.mlpackage` at the root of the weights dir.
- `libcactus.dylib` built from source with `cactus build --python` (commit HEAD of `main`, fetched 2026-04-14).

### Reproducer (Python, minimal)

```python
import sys, json, wave, subprocess
sys.path.insert(0, "cactus/python/src")
from cactus import cactus_init, cactus_complete, cactus_destroy

# 1. Generate a test WAV using macOS `say`
subprocess.run(["say", "-v", "Samantha", "-o", "/tmp/q.aiff",
                "What is the capital of France?"], check=True)
subprocess.run(["afconvert", "-f", "WAVE", "-d", "LEI16@16000", "-c", "1",
                "/tmp/q.aiff", "/tmp/q.wav"], check=True)

# 2. Load Gemma 4 E4B
model = cactus_init("/opt/homebrew/Cellar/cactus/1.13_1/libexec/weights/gemma-4-e4b-it",
                    None, False)

# 3a. Try native audio-in via pcm_data (6th arg)
with wave.open("/tmp/q.wav", "rb") as w:
    pcm = w.readframes(w.getnframes())

msgs = json.dumps([{"role": "user",
                    "content": "Answer the question I just spoke in one short sentence."}])
opts = json.dumps({"max_tokens": 40})

try:
    raw = cactus_complete(model, msgs, opts, None, None, pcm)
    print("pcm_data path:", json.loads(raw).get("response"))
except Exception as e:
    print("pcm_data path FAILED:", e)

# 3b. Try native audio-in via "audio" field on the message
msgs_audio = json.dumps([{
    "role": "user",
    "content": "Answer the question I just spoke in one short sentence.",
    "audio": ["/tmp/q.wav"]
}])
try:
    raw = cactus_complete(model, msgs_audio, opts, None, None)
    print("audio-field path:", json.loads(raw).get("response"))
except Exception as e:
    print("audio-field path FAILED:", e)

# 4. Control: text-only call on the same handle succeeds
msgs_text = json.dumps([{"role": "user", "content": "What is the capital of France?"}])
raw = cactus_complete(model, msgs_text, opts, None, None)
print("text-only path:", json.loads(raw).get("response"))

cactus_destroy(model)
```

Output on my machine:

```
[WARN] [npu] [gemma4] model.mlpackage not found; using CPU prefill
[WARN] [npu] [gemma4-audio] unsupported NPU input shape; falling back to CPU audio encoder
[ERROR] [complete] Exception: unordered_map::at: key not found
pcm_data path FAILED: Completion failed
[ERROR] [complete] Exception: unordered_map::at: key not found
audio-field path FAILED: Completion failed
text-only path: Paris.
```

### What I checked

- The weights dir contains `audio_encoder.mlpackage`, `audio_encoder.mlmodelc`, `vision_encoder.mlpackage`, `vision_encoder.mlmodelc`, and ~2088 per-tensor `.weights` files. It does **not** contain a top-level `model.mlpackage`, which I suspect is related to the first warning (`[npu] [gemma4] model.mlpackage not found; using CPU prefill`). Re-running `cactus download google/gemma-4-E4B-it --reconvert` did not add it.
- Padding the audio to exactly 30 × 16000 × 2 = 960000 bytes did not change the outcome.
- Changing the system/user prompt, adding or removing a system message, and adjusting temperature/max_tokens did not change the outcome.
- `cactus_transcribe(whisper-small, ...)` on the same WAV works perfectly and returns the expected text; `cactus_complete(gemma-4-e4b-it, ...)` with the resulting text also works perfectly (the cascade path is fine).

### What I'd like help with

1. Is native audio input to Gemma 4 via `cactus_complete` supported in 1.13_1, or only in a later release?
2. If supported, is the `"audio": [path]` field the correct message shape, or is there a different prompt template (e.g., with an `<audio>` placeholder token)?
3. Is the missing `model.mlpackage` a known issue with the `-apple` zip, and is there a way to get the full NPU path for Gemma 4 E4B?

Happy to collect more logs, run with extra tracing, or try a different precision. I'm building a voice agent for the Gemma 4 Voice Agents Hackathon and the on-device audio-in latency story is central to the demo.

Thanks for the excellent engine — looking forward to hacking on it this weekend.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cactus_complete with Gemma 4 E4B + audio input throws unordered_map::at: key not found #584

Summary

Environment

Reproducer (Python, minimal)

What I checked

What I'd like help with

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cactus_complete with Gemma 4 E4B + audio input throws unordered_map::at: key not found #584

Description

Summary

Environment

Reproducer (Python, minimal)

What I checked

What I'd like help with

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions