Environment
- iPhone 17 Pro Max (iOS 26)
- Cactus iOS xcframework (from cactus-react-native v1.14)
- Model: gemma-4-e2b-it INT4 (Cactus-Compute/gemma-4-E2B-it)
- Swift, native app (not React Native)
Bug
cactus_complete returns -1 on the second call to the same model handle. The first call after cactus_init + warm-up always succeeds. Every subsequent call fails.
cactus_transcribe with Parakeet works repeatedly on the same model handle — only LLM completions are affected.
Minimal Repro
let model = try cactusInit(path, nil, false)
// Warm-up — works
let _ = try? cactusComplete(model, "[{\"role\":\"user\",\"content\":\"Hi\"}]", "{\"max_tokens\":1}", nil, nil)
// First real call — works
cactusReset(model)
let r1 = try cactusComplete(model, "[{\"role\":\"user\",\"content\":\"Hello\"}]", "{\"max_tokens\":256}", nil, nil)
// r1 = valid JSON response ✓
// Second call — FAILS (result < 0)
cactusReset(model)
let r2 = try cactusComplete(model, "[{\"role\":\"user\",\"content\":\"Who are you?\"}]", "{\"max_tokens\":256}", nil, nil)
// throws "Completion failed"
Workarounds Attempted (all fail on 2nd call)
cactusReset(model) before each call
cactusStop(model) + cactusReset(model)
- No reset at all
cactusDestroy + cactusInit (full re-init)
- Running on
Task.detached / serial DispatchQueue
- Running on
@MainActor directly
- Reducing
max_tokens to 1
- Different message content
Context
Building at the YC Gemma 4 Voice Agents Hackathon. This is blocking our demo — we can only get one inference per app launch. Happy to test a fix immediately if you have one.
Environment
Bug
cactus_completereturns-1on the second call to the same model handle. The first call aftercactus_init+ warm-up always succeeds. Every subsequent call fails.cactus_transcribewith Parakeet works repeatedly on the same model handle — only LLM completions are affected.Minimal Repro
Workarounds Attempted (all fail on 2nd call)
cactusReset(model)before each callcactusStop(model)+cactusReset(model)cactusDestroy+cactusInit(full re-init)Task.detached/ serialDispatchQueue@MainActordirectlymax_tokensto 1Context
Building at the YC Gemma 4 Voice Agents Hackathon. This is blocking our demo — we can only get one inference per app launch. Happy to test a fix immediately if you have one.