Skip to content

cactus_complete returns -1 on second invocation (iOS xcframework, Gemma 4 E2B int4) #599

@benikigai

Description

@benikigai

Environment

  • iPhone 17 Pro Max (iOS 26)
  • Cactus iOS xcframework (from cactus-react-native v1.14)
  • Model: gemma-4-e2b-it INT4 (Cactus-Compute/gemma-4-E2B-it)
  • Swift, native app (not React Native)

Bug

cactus_complete returns -1 on the second call to the same model handle. The first call after cactus_init + warm-up always succeeds. Every subsequent call fails.

cactus_transcribe with Parakeet works repeatedly on the same model handle — only LLM completions are affected.

Minimal Repro

let model = try cactusInit(path, nil, false)

// Warm-up — works
let _ = try? cactusComplete(model, "[{\"role\":\"user\",\"content\":\"Hi\"}]", "{\"max_tokens\":1}", nil, nil)

// First real call — works
cactusReset(model)
let r1 = try cactusComplete(model, "[{\"role\":\"user\",\"content\":\"Hello\"}]", "{\"max_tokens\":256}", nil, nil)
// r1 = valid JSON response ✓

// Second call — FAILS (result < 0)
cactusReset(model)
let r2 = try cactusComplete(model, "[{\"role\":\"user\",\"content\":\"Who are you?\"}]", "{\"max_tokens\":256}", nil, nil)
// throws "Completion failed"

Workarounds Attempted (all fail on 2nd call)

  • cactusReset(model) before each call
  • cactusStop(model) + cactusReset(model)
  • No reset at all
  • cactusDestroy + cactusInit (full re-init)
  • Running on Task.detached / serial DispatchQueue
  • Running on @MainActor directly
  • Reducing max_tokens to 1
  • Different message content

Context

Building at the YC Gemma 4 Voice Agents Hackathon. This is blocking our demo — we can only get one inference per app launch. Happy to test a fix immediately if you have one.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions