diff --git a/demo/rogue/rogue-fast-changes.md b/demo/rogue/rogue-fast-changes.md
new file mode 100644
index 00000000..7b0fdb78
--- /dev/null
+++ b/demo/rogue/rogue-fast-changes.md
@@ -0,0 +1,46 @@
+# Rogue Fast Change Log
+
+## 2026-02-26
+
+- Switched `check_toolchain` routing to explicit success/fail edges and added `toolchain_fail` hard-stop so failed prerequisites abort immediately instead of continuing or reporting false success.
+- Removed the ambiguous `check_dod` branch split and forced linear DoD/planning flow: `check_dod -> consolidate_dod -> debate_consolidate`.
+- Updated `check_dod` to an audit-only step (no `has_dod/needs_dod` branching contract) so the prompt matches linear fast-path routing.
+- Added `prepare_ai_inputs` tool stage to deterministically scaffold missing `.ai` artifacts (`spec`, `definition_of_done`, `plan_final`, and baseline review/log files) before `implement`.
+- Added `fix_fmt` auto-format stage before `verify_fmt` to avoid postmortem cycles from trivial formatting-only failures.
+- Enabled `auto_status=true` on codergen stages that were repeatedly failing with `missing status.json (auto_status=false)` (`check_dod`, `consolidate_dod`, `debate_consolidate`, `implement`, `verify_fidelity`, `review_consensus`, `postmortem`).
+- Simplified implement/verify flow by removing diamond check nodes that depended on status files and routing directly with condition gates between tool stages.
+- Kept postmortem recovery but changed non-transient failures to re-enter planning (`debate_consolidate`) instead of hard-exiting the run.
+- Removed graph-level `retry_target`/`fallback_retry_target` so stage failures do not silently jump to unrelated nodes (for example, `toolchain_fail -> implement`).
+- Minimal rollback: removed `toolchain_fail` hard-stop routing and restored `check_toolchain -> expand_spec` unconditional flow so runs do not die at startup when host toolchain preconditions are missing.
+- Minimal stabilization pass: restored explicit toolchain gate routing (`check_toolchain -> expand_spec` on success, fail to `postmortem`), added unconditional fallback edges for conditional-only routing nodes, and removed explicit `$KILROY_STAGE_STATUS_PATH` write instructions from `auto_status=true` codergen prompts.
+- Toolchain hardening pass: made `check_toolchain` install-method agnostic by removing `cargo install --list` dependency for `wasm-bindgen-cli`, adding explicit `rustup` presence check, and prepending both `$HOME/.cargo/bin` and `$USERPROFILE/.cargo/bin` to `PATH` in all Rust/WASM tool stages (`check_toolchain`, `fix_fmt`, `verify_fmt`, `verify_build`, `verify_test`) so Windows runs can find user-local Rust tools without shell profile assumptions.
+
+### Run Ops Log (same day)
+
+- Reproduced/fixed codex prompt-probe auth seeding bug in engine (PR #43); prompt probe now passes with seeded `auth.json`/`config.toml`.
+- Launched detached real run `rogue-fast-20260226T214843Z`; preflight passed; run failed early due missing CXDB service (`127.0.0.1:9010` unreachable).
+- Relaunched with `--no-cxdb` as approved: run `rogue-fast-20260226T215515Z`; failed at `check_toolchain` (`FAIL: cargo not found`) and entered `postmortem`.
+- Installed Rustup (`Rustlang.Rustup` 1.28.2), added `wasm32-unknown-unknown` target; `cargo install wasm-pack wasm-bindgen-cli` failed because MSVC linker (`link.exe`) is not present.
+- Installed prebuilt `wasm-pack` binary (`v0.14.0`) to `C:\Users\dan\.cargo\bin\wasm-pack.exe` and verified `wasm-pack --version`.
+- Relaunched with current dotfile + `--no-cxdb`: run `rogue-fast-20260226T231941Z`; run is currently blocked in `check_toolchain` with `bash` process behavior indicating Windows shell-resolution issues (system `bash.exe`/WSL pathing), not codex auth/CXDB.
+
+### Dependency-order fix (same day)
+
+- Root cause identified: `debate_consolidate` required `.ai/spec.md` and `.ai/definition_of_done.md` before `prepare_ai_inputs` created them.
+- Fixed flow ordering in `demo/rogue/rogue-fast.dot`:
+ - `consolidate_dod -> prepare_ai_inputs -> debate_consolidate -> implement`
+ - removed the incorrect `debate_consolidate -> prepare_ai_inputs` dependency inversion.
+- Relaunch attempt `rogue-fast-20260226T235930Z` failed immediately due config mismatch (`missing llm.providers.openai.backend` from `demo/rogue/run.yaml` in this environment).
+- Relaunched with prior known-good real/no-CXDB config: `rogue-fast-20260227T002621Z`.
+- Polled every 5 minutes (9 polls from `2026-02-26 16:27` to `17:07` PT):
+ - Run progressed through `expand_spec`, `check_dod`, `consolidate_dod`, `prepare_ai_inputs`, `debate_consolidate`, and `implement`.
+ - This confirms the ordering fix executed as intended (the run no longer fails at `debate_consolidate` due missing `.ai` scaffolding).
+ - Run ultimately failed later at `fix_fmt` with deterministic cycle breaker: `/usr/bin/bash: line 1: cd: demo/rogue/rogue-wasm: No such file or directory`.
+
+### Root-cause correction (same day)
+
+- Root cause isolated: `implement` was set to `auto_status=true`, so the engine marked success when the stage finished without writing status, even when the model output explicitly reported `outcome: fail`.
+- Corrected `implement` status contract in `demo/rogue/rogue-fast.dot`:
+ - set `auto_status=false`
+ - restored explicit instruction to write status to `$KILROY_STAGE_STATUS_PATH` with `outcome=success|fail` and failure fields.
+- This prevents false-positive routing into `fix_fmt` after a failed implement stage.
diff --git a/demo/rogue/rogue-fast.dot b/demo/rogue/rogue-fast.dot
new file mode 100644
index 00000000..ff7dd993
--- /dev/null
+++ b/demo/rogue/rogue-fast.dot
@@ -0,0 +1,189 @@
+digraph rogue_pipeline {
+ graph [
+ goal="Port Rogue 5.4.4 from C to Rust/WebAssembly: exact 1:1 mechanical translation of all game systems (dungeon generation, monster AI, combat math, item tables, RNG formula), playable in browser at demo/rogue/rogue-wasm/www/index.html with classic 80x24 ASCII terminal rendering. Original C source at demo/rogue/original-rogue/ (~16,800 lines, 33 files). Replace ncurses I/O with WASM bridge to JS terminal renderer; save/load uses localStorage.",
+ rankdir=LR,
+ default_max_retry=3,
+ provenance_version="1",
+ model_stylesheet="
+ * { llm_model: gpt-5.3-codex-spark; llm_provider: openai; }
+ .hard { llm_model: gpt-5.3-codex-spark; llm_provider: openai; }
+ .verify { llm_model: gpt-5.3-codex-spark; llm_provider: openai; }
+ "
+ ]
+
+ exit [shape=Msquare, label="Exit"]
+
+ subgraph cluster_bootstrap {
+ label="Bootstrap"
+ start [shape=Mdiamond, label="Start"]
+
+ check_toolchain [
+ shape=parallelogram,
+ max_retries=0,
+ tool_command="set -e; for d in \"$HOME/.cargo/bin\" \"$USERPROFILE/.cargo/bin\"; do [ -d \"$d\" ] && PATH=\"$d:$PATH\"; done; export PATH; command -v cargo >/dev/null 2>&1 || { echo 'FAIL: cargo not found' >&2; exit 1; }; command -v rustup >/dev/null 2>&1 || { echo 'FAIL: rustup not found' >&2; exit 1; }; command -v wasm-pack >/dev/null 2>&1 || { echo 'FAIL: wasm-pack not found' >&2; exit 1; }; rustup target list --installed | grep -qx wasm32-unknown-unknown || { echo 'FAIL: wasm32-unknown-unknown target not installed' >&2; exit 1; }; echo 'All toolchain checks passed: cargo, rustup, wasm-pack, wasm32-unknown-unknown target'"
+ ]
+
+ expand_spec [
+ shape=box,
+ auto_status=true,
+ prompt="Goal: $goal\n\nExpand the requirements into a detailed spec for the Rogue 5.4.4 C-to-Rust/WASM port.\n\nRead the original C source listing at demo/rogue/original-rogue/ to understand every game system:\n- Dungeon generation (rooms, corridors, mazes, doors, stairs)\n- Monster definitions, AI, chase logic, special abilities (A-Z)\n- Combat math (attack/defense, hit tables, damage dice)\n- Item tables (potions, scrolls, rings, weapons, armor, food, wands, amulet)\n- RNG formula and seed handling\n- Player stats, experience, leveling, hunger\n- Save/load serialization format\n\nRead any existing Rust source at demo/rogue/rogue-wasm/src/ to understand what has already been ported.\n\nThe spec must cover:\n1. Module decomposition mapping each C file to a Rust module.\n2. Data structure translations (C structs -> Rust structs/enums).\n3. ncurses replacement strategy (WASM bridge to JS canvas/DOM terminal renderer).\n4. localStorage save/load replacing filesystem I/O.\n5. Build pipeline: wasm-pack build -> www/index.html integration.\n6. Fidelity contract: same algorithms, same constants, same behavior.\n\nWrite the spec to .ai/spec.md."
+ ]
+
+ check_dod [
+ shape=box,
+ auto_status=true,
+ label="DoD exists?",
+ prompt="Goal: $goal\n\nAudit whether .ai/definition_of_done.md exists and is credible, then summarize findings in .ai/check_dod_notes.md.\n\nDoD rubric (must be specific, verifiable, and not over-prescriptive):\n- Scope: defines what is in-scope and what is out-of-scope.\n- Deliverables: names concrete outputs/outcomes (artifacts, behaviors, docs), not implementation steps.\n- Acceptance criteria: includes observable pass/fail criteria a reviewer can verify.\n- Verification: includes how to verify (commands or steps) appropriate for a Rust/WASM project.\n- Quality/safety: includes project-appropriate quality expectations (build/tests/lint) stated as outcomes/evidence.\n- Non-goals/deferrals: explicitly calls out what is intentionally not being done in this iteration."
+ ]
+ }
+
+ subgraph cluster_dod {
+ label="Definition Of Done"
+
+ consolidate_dod [
+ shape=box,
+ auto_status=true,
+ prompt="Goal: $goal\n\nPropose a project Definition of Done (DoD) for the Rogue C-to-Rust/WASM port. Read .ai/spec.md.\n\nRequirements:\n- DoD must be a checklist of outcomes/evidence, not a plan.\n- Each item must be verifiable (someone can check it and say pass/fail).\n- Avoid prescribing the implementation approach unless the spec explicitly requires it.\n- Include scope, deliverables, acceptance criteria, verification approach, and explicit non-goals/deferrals.\n- Verification must include: cargo build --target wasm32-unknown-unknown succeeds, cargo test passes, wasm-pack build --target web succeeds, index.html loads and renders 80x24 grid.\n- Fidelity criteria: dungeon generation, monster behavior, combat math, item effects must match original C.\n\nWrite the final DoD to .ai/definition_of_done.md."
+ ]
+ }
+
+ subgraph cluster_planning {
+ label="Planning"
+
+ debate_consolidate [
+ shape=box,
+ auto_status=true,
+ prompt="Goal: $goal\n\nCreate a final implementation plan for the Rogue C-to-Rust/WASM port.\nRead .ai/spec.md, .ai/definition_of_done.md.\nIf .ai/postmortem_latest.md exists, incorporate its lessons.\n\nThe plan must address:\n1. Module-by-module porting order (dependency-aware).\n2. Data structure translations with Rust idioms.\n3. ncurses -> WASM bridge architecture.\n4. localStorage save/load strategy.\n5. Progressive compilation milestones.\n6. www/index.html integration with wasm-pack output.\n\nRead existing Rust source at demo/rogue/rogue-wasm/src/ to build on what exists.\nRead original C source at demo/rogue/original-rogue/ for reference.\nResolve conflicts. Ensure dependency order.\nIf .ai/postmortem_latest.md exists, verify the plan addresses every issue.\nWrite to .ai/plan_final.md."
+ ]
+
+ prepare_ai_inputs [
+ shape=parallelogram,
+ max_retries=0,
+ tool_command="set -e; mkdir -p .ai; [ -f .ai/spec.md ] || { [ -f demo/rogue/spec.md ] && cp demo/rogue/spec.md .ai/spec.md || true; }; [ -f .ai/definition_of_done.md ] || { [ -f demo/rogue/DoD.md ] && cp demo/rogue/DoD.md .ai/definition_of_done.md || true; }; if [ ! -f .ai/plan_final.md ]; then printf '%s\n' '# Rogue fast fallback plan' '' '- Preserve existing work; do not restart from scratch.' '- Implement highest-impact missing pieces first.' '- Run fmt, build, and tests and capture exact failures.' '- Update .ai/implementation_log.md and .ai/verify_fidelity.md with evidence.' > .ai/plan_final.md; fi; [ -f .ai/implementation_log.md ] || printf '%s\n' '# Implementation Log' '' '- Initialized by prepare_ai_inputs stage.' > .ai/implementation_log.md; [ -f .ai/review_consensus.md ] || printf '%s\n' '# Review Consensus' '' '- Initialized by prepare_ai_inputs stage.' > .ai/review_consensus.md; [ -f .ai/verify_fidelity.md ] || printf '%s\n' '# Fidelity Review' '' '- Initialized by prepare_ai_inputs stage.' > .ai/verify_fidelity.md; echo 'Prepared .ai inputs for implement stage'"
+ ]
+ }
+
+ subgraph cluster_implement_verify {
+ label="Implement And Verify"
+
+ implement [
+ shape=box,
+ class="hard",
+ auto_status=false,
+ max_retries=2,
+ prompt="Goal: $goal\n\nExecute .ai/plan_final.md. Read .ai/definition_of_done.md for acceptance criteria.\nIf .ai/postmortem_latest.md exists, prioritize fixing those issues.\n\nOriginal C source: demo/rogue/original-rogue/\nRust target: demo/rogue/rogue-wasm/\nHTML deliverable: demo/rogue/rogue-wasm/www/index.html\n\nThis is an exact mechanical port. Every C function must have a Rust equivalent with identical behavior:\n- Same dungeon generation algorithms (rooms, corridors, mazes).\n- Same monster stats table, AI chase logic, special abilities.\n- Same combat math (hit tables, damage dice, armor class).\n- Same item tables (potions, scrolls, rings, weapons, armor, food, wands, amulet of Yendor).\n- Same RNG formula and seed handling.\n- Same player progression (XP, leveling, hunger).\n\nReplace ncurses with a WASM bridge to a JS terminal renderer.\nReplace filesystem save/load with localStorage.\n\nUse progressive compilation: get each module compiling before starting the next.\nLog progress to .ai/implementation_log.md.\n\nWrite status JSON to $KILROY_STAGE_STATUS_PATH.\noutcome=success if build passes, outcome=fail with failure_reason, failure_class, and failure_signature otherwise."
+ ]
+
+ fix_fmt [
+ shape=parallelogram,
+ tool_command="set -e; for d in \"$HOME/.cargo/bin\" \"$USERPROFILE/.cargo/bin\"; do [ -d \"$d\" ] && PATH=\"$d:$PATH\"; done; export PATH; cd demo/rogue/rogue-wasm && cargo fmt 2>&1"
+ ]
+
+ verify_fmt [
+ shape=parallelogram,
+ tool_command="set -e; for d in \"$HOME/.cargo/bin\" \"$USERPROFILE/.cargo/bin\"; do [ -d \"$d\" ] && PATH=\"$d:$PATH\"; done; export PATH; cd demo/rogue/rogue-wasm && cargo fmt --check 2>&1"
+ ]
+
+ verify_build [
+ shape=parallelogram,
+ tool_command="set -e; for d in \"$HOME/.cargo/bin\" \"$USERPROFILE/.cargo/bin\"; do [ -d \"$d\" ] && PATH=\"$d:$PATH\"; done; export PATH; cd demo/rogue/rogue-wasm && wasm-pack build --target web 2>&1"
+ ]
+
+ verify_test [
+ shape=parallelogram,
+ tool_command="set -e; for d in \"$HOME/.cargo/bin\" \"$USERPROFILE/.cargo/bin\"; do [ -d \"$d\" ] && PATH=\"$d:$PATH\"; done; export PATH; cd demo/rogue/rogue-wasm && cargo test 2>&1"
+ ]
+
+ verify_artifacts [
+ shape=parallelogram,
+ tool_command="set -e; DIRTY=$(git diff --name-only HEAD -- demo/rogue/rogue-wasm/ | grep -E '\\.(o|a|so|dylib|wasm|d|rmeta|rlib|fingerprint)$|/target/|/pkg/\\.gitignore' || true); if [ -n \"$DIRTY\" ]; then echo \"FAIL: build artifacts in diff: $DIRTY\" >&2; exit 1; fi; echo 'No build artifacts in diff'"
+ ]
+
+ verify_fidelity [
+ shape=box,
+ class="verify",
+ auto_status=true,
+ prompt="Perform semantic fidelity review after deterministic checks pass.\n\nThis is the critical gate for a faithful 1:1 port. Verify:\n1. Read demo/rogue/original-rogue/*.c and compare against demo/rogue/rogue-wasm/src/*.rs.\n2. Dungeon generation: room placement, corridor carving, maze generation, door/stair placement use same algorithms.\n3. Monster table: all 26 monster types (A-Z) have correct stats, flags, and special abilities.\n4. Combat math: attack rolls, hit tables, damage dice, armor class calculations match original.\n5. Item tables: all potion/scroll/ring/weapon/armor/food/wand types and effects match.\n6. RNG: same formula and seed propagation as original.\n7. Player systems: XP thresholds, level-up stats, hunger ticks match.\n8. I/O bridge: ncurses calls map to WASM-exported functions.\n9. Save/load: serialization covers all game state, localStorage integration works.\n10. HTML: demo/rogue/rogue-wasm/www/index.html exists with 80x24 ASCII grid, monospace font, dark background.\n\nIf semantic gaps exist, outcome=fail with stable failure_reason code (e.g., fidelity_gap_monster_stats) and details listing each discrepancy.\n\nWrite results to .ai/verify_fidelity.md.\n\nReport outcome=success if semantic review passes; otherwise report outcome=fail with failure_reason and details."
+ ]
+ }
+
+ subgraph cluster_review {
+ label="Review"
+
+ review_consensus [
+ shape=box,
+ goal_gate=true,
+ auto_status=true,
+ prompt="Goal: $goal\n\nReview the implementation against .ai/definition_of_done.md.\nCheck: wasm-pack build succeeds, cargo test passes, completeness of port (all 33 C files accounted for), correctness of game logic, www/index.html renders properly.\nVerify fidelity: spot-check monster stats, combat math constants, RNG formula against original C source.\n\nConsensus policy:\n- If criteria are met and no critical gaps: outcome=success.\n- Otherwise: outcome=retry with failure_reason listing specific issues.\n\nWrite review to .ai/review_consensus.md with APPROVED or REJECTED verdict.\n\nReport outcome=success or outcome=retry with failure_reason."
+ ]
+ }
+
+ subgraph cluster_postmortem {
+ label="Postmortem"
+
+ postmortem [
+ shape=box,
+ auto_status=true,
+ prompt="Goal: $goal\n\nAnalyze why the implementation failed.\nRead .ai/review_consensus.md.\nRead .ai/implementation_log.md.\nRead .ai/verify_fidelity.md if it exists.\n\nProduce actionable guidance: root causes, what worked, what failed, specific fixes.\nFor fidelity issues, reference exact C source lines and expected Rust equivalents.\nThe next iteration must NOT start from scratch - preserve working code and fix gaps.\n\nWrite to .ai/postmortem_latest.md (overwrite previous)."
+ ]
+ }
+
+ // =========================================================================
+ // Flow
+ // =========================================================================
+
+ start -> check_toolchain
+ check_toolchain -> expand_spec [condition="outcome=success"]
+ check_toolchain -> postmortem [condition="outcome=fail"]
+ check_toolchain -> postmortem
+ expand_spec -> check_dod
+
+ // keep DoD/planning linear in fast mode
+ check_dod -> consolidate_dod
+ consolidate_dod -> prepare_ai_inputs
+ prepare_ai_inputs -> debate_consolidate
+
+ debate_consolidate -> implement
+
+ prepare_ai_inputs -> implement [condition="outcome=success"]
+ prepare_ai_inputs -> postmortem [condition="outcome=fail"]
+ prepare_ai_inputs -> postmortem
+
+ implement -> fix_fmt [condition="outcome=success"]
+ implement -> implement [condition="outcome=fail && context.failure_class=transient_infra", loop_restart=true]
+ implement -> postmortem [condition="outcome=fail && context.failure_class!=transient_infra"]
+ implement -> postmortem
+
+ fix_fmt -> verify_fmt [condition="outcome=success"]
+ fix_fmt -> postmortem [condition="outcome=fail"]
+ fix_fmt -> postmortem
+
+ verify_fmt -> verify_build [condition="outcome=success"]
+ verify_fmt -> postmortem [condition="outcome=fail"]
+ verify_fmt -> postmortem
+
+ verify_build -> verify_test [condition="outcome=success"]
+ verify_build -> postmortem [condition="outcome=fail"]
+ verify_build -> postmortem
+
+ verify_test -> verify_artifacts [condition="outcome=success"]
+ verify_test -> postmortem [condition="outcome=fail"]
+ verify_test -> postmortem
+
+ verify_artifacts -> verify_fidelity [condition="outcome=success"]
+ verify_artifacts -> postmortem [condition="outcome=fail"]
+ verify_artifacts -> postmortem
+
+ verify_fidelity -> review_consensus [condition="outcome=success"]
+ verify_fidelity -> postmortem [condition="outcome=fail"]
+ verify_fidelity -> postmortem
+
+ review_consensus -> exit [condition="outcome=success"]
+ review_consensus -> postmortem [condition="outcome=retry"]
+ review_consensus -> postmortem [condition="outcome=fail"]
+ review_consensus -> postmortem
+
+ postmortem -> debate_consolidate [condition="context.failure_class=transient_infra", loop_restart=true]
+ postmortem -> debate_consolidate [condition="context.failure_class!=transient_infra"]
+ postmortem -> debate_consolidate
+}
diff --git a/docs/strongdm/attractor/README.md b/docs/strongdm/attractor/README.md
index baaa853e..0da67d96 100644
--- a/docs/strongdm/attractor/README.md
+++ b/docs/strongdm/attractor/README.md
@@ -26,7 +26,7 @@ Although bringing your own agentic loop and unified LLM SDK is not required to b
- Sensitive Codex state roots (`codex-home*`, `.codex/auth.json`, `.codex/config.toml`) are excluded from `stage.tgz` and `run.tgz`.
- Idle watchdog enforces process-group cleanup for stalled Codex CLI stages.
- Codex schema behavior:
- - Structured output schema requires `final` and `summary`, but allows additional properties for CLI compatibility.
+ - Structured output schema requires `final` and `summary` and sets `additionalProperties: false` (strict object contract required by Codex/OpenAI structured-output validation).
- If codex rejects schema validation (`invalid_json_schema`-class errors), Attractor retries once without `--output-schema` and records fallback metadata in stage artifacts.
- If codex returns unknown structured keys on schema-enabled output, Attractor emits a loud warning, writes `structured_output_unknown_keys.json`, retries once without `--output-schema`, and records fallback metadata in `cli_invocation.json`.
- If codex emits known state-db discrepancy signatures, Attractor retries once with a fresh isolated state root and records state-db fallback metadata.
diff --git a/internal/agent/profile.go b/internal/agent/profile.go
index 0ccb2b1c..1724b39f 100644
--- a/internal/agent/profile.go
+++ b/internal/agent/profile.go
@@ -139,6 +139,29 @@ func NewOpenAIProfile(model string) ProviderProfile {
}
}
+func NewCodexAppServerProfile(model string) ProviderProfile {
+ return &baseProfile{
+ id: "codex-app-server",
+ model: strings.TrimSpace(model),
+ parallel: true,
+ contextWindow: 1_047_576,
+ basePrompt: openAIProfileBasePrompt,
+ docFiles: []string{"AGENTS.md", ".codex/instructions.md"},
+ toolDefs: []llm.ToolDefinition{
+ defReadFile(),
+ defApplyPatch(),
+ defWriteFile(),
+ defShell(),
+ defGrep(),
+ defGlob(),
+ defSpawnAgent(),
+ defSendInput(),
+ defWait(),
+ defCloseAgent(),
+ },
+ }
+}
+
func NewAnthropicProfile(model string) ProviderProfile {
return &baseProfile{
id: "anthropic",
diff --git a/internal/agent/profile_registry.go b/internal/agent/profile_registry.go
index fbf92d8c..aa265c82 100644
--- a/internal/agent/profile_registry.go
+++ b/internal/agent/profile_registry.go
@@ -9,9 +9,11 @@ import (
var (
profileFactoriesMu sync.RWMutex
profileFactories = map[string]func(string) ProviderProfile{
- "openai": NewOpenAIProfile,
- "anthropic": NewAnthropicProfile,
- "google": NewGeminiProfile,
+ "openai": NewOpenAIProfile,
+ "anthropic": NewAnthropicProfile,
+ "google": NewGeminiProfile,
+ "codex-app-server": NewCodexAppServerProfile,
+ "codex": NewCodexAppServerProfile,
}
)
diff --git a/internal/agent/profile_test.go b/internal/agent/profile_test.go
index 1ec97ef7..59c055bf 100644
--- a/internal/agent/profile_test.go
+++ b/internal/agent/profile_test.go
@@ -41,6 +41,19 @@ func TestProviderProfiles_ToolsetsAndDocSelection(t *testing.T) {
assertHasTool(t, gemini, "read_many_files")
assertHasTool(t, gemini, "list_dir")
assertMissingTool(t, gemini, "apply_patch")
+
+ codex := NewCodexAppServerProfile("gpt-5-codex")
+ if codex.ID() != "codex-app-server" {
+ t.Fatalf("codex id: %q", codex.ID())
+ }
+ if !codex.SupportsParallelToolCalls() {
+ t.Fatalf("codex profile should support parallel tool calls")
+ }
+ if codex.ContextWindowSize() != 1_047_576 {
+ t.Fatalf("codex context window: got %d want %d", codex.ContextWindowSize(), 1_047_576)
+ }
+ assertHasTool(t, codex, "apply_patch")
+ assertMissingTool(t, codex, "edit_file")
}
func TestProviderProfiles_ToolLists_MatchSpec(t *testing.T) {
@@ -92,6 +105,21 @@ func TestProviderProfiles_ToolLists_MatchSpec(t *testing.T) {
"close_agent",
})
})
+ t.Run("codex-app-server", func(t *testing.T) {
+ p := NewCodexAppServerProfile("gpt-5-codex")
+ assertToolListExact(t, p, []string{
+ "read_file",
+ "apply_patch",
+ "write_file",
+ "shell",
+ "grep",
+ "glob",
+ "spawn_agent",
+ "send_input",
+ "wait",
+ "close_agent",
+ })
+ })
}
func TestProviderProfiles_BuildSystemPrompt_IncludesProviderSpecificBaseInstructions(t *testing.T) {
@@ -181,4 +209,20 @@ func TestNewProfileForFamily_DefaultFamiliesAndRegistration(t *testing.T) {
if _, err := NewProfileForFamily("missing-family", "m3"); err == nil {
t.Fatalf("expected unsupported family error")
}
+
+ codex, err := NewProfileForFamily("codex-app-server", "gpt-5-codex")
+ if err != nil {
+ t.Fatalf("NewProfileForFamily(codex-app-server): %v", err)
+ }
+ if codex.ID() != "codex-app-server" {
+ t.Fatalf("codex profile id=%q want codex-app-server", codex.ID())
+ }
+
+ codexAlias, err := NewProfileForFamily("codex", "gpt-5-codex")
+ if err != nil {
+ t.Fatalf("NewProfileForFamily(codex): %v", err)
+ }
+ if codexAlias.ID() != "codex-app-server" {
+ t.Fatalf("codex alias profile id=%q want codex-app-server", codexAlias.ID())
+ }
}
diff --git a/internal/agent/session.go b/internal/agent/session.go
index 68ce8db4..e58dfd4a 100644
--- a/internal/agent/session.go
+++ b/internal/agent/session.go
@@ -9,6 +9,7 @@ import (
"strings"
"sync"
"time"
+ "unicode/utf8"
"github.com/oklog/ulid/v2"
@@ -293,16 +294,11 @@ func (s *Session) execTool(ctx context.Context, call llm.ToolCallData) ToolExecR
// Emit output deltas (best-effort). Even for non-streaming tools, this gives consumers a uniform
// incremental event pattern that mirrors provider LLM streaming.
full := res.FullOutput
- const chunk = 4000
- for i := 0; i < len(full); i += chunk {
- j := i + chunk
- if j > len(full) {
- j = len(full)
- }
+ for _, delta := range utf8Chunk(full, 4000) {
s.emit(EventToolCallOutputDelta, map[string]any{
"tool_name": res.ToolName,
"call_id": res.CallID,
- "delta": full[i:j],
+ "delta": delta,
})
}
@@ -494,8 +490,8 @@ func (s *Session) processOneInput(ctx context.Context, input string) (string, er
if s.cfg.LLMRetryPolicy != nil {
policy = *s.cfg.LLMRetryPolicy
}
- resp, err := llm.Retry(ctx, policy, s.cfg.LLMSleep, nil, func() (llm.Response, error) {
- return s.client.Complete(ctx, req)
+ stream, err := llm.Retry(ctx, policy, s.cfg.LLMSleep, nil, func() (llm.Stream, error) {
+ return s.client.Stream(ctx, req)
})
if err != nil {
s.emit(EventError, map[string]any{"error": err.Error()})
@@ -519,15 +515,175 @@ func (s *Session) processOneInput(ctx context.Context, input string) (string, er
}
}
+ acc := llm.NewStreamAccumulator()
+ var resp *llm.Response
+ var streamErr error
+ providerToolCallCount := 0
+ seenProviderToolCalls := map[string]struct{}{}
+ seenProviderOutputDeltas := map[string]struct{}{}
+ seenProviderToolEnds := map[string]struct{}{}
+ providerToolNameByCallID := map[string]string{}
+ providerOutputByCallID := map[string]string{}
+ assistantTextStarted := false
+ assistantTextDelta := false
+ emitAssistantTextStart := func() {
+ if assistantTextStarted {
+ return
+ }
+ assistantTextStarted = true
+ s.emit(EventAssistantTextStart, map[string]any{})
+ }
+ emitToolOutputDeltas := func(toolName, callID, fullOutput string) {
+ for _, delta := range utf8Chunk(fullOutput, 4000) {
+ s.emit(EventToolCallOutputDelta, map[string]any{
+ "tool_name": toolName,
+ "call_id": callID,
+ "delta": delta,
+ "source": "provider",
+ })
+ }
+ }
+ for ev := range stream.Events() {
+ acc.Process(ev)
+ switch ev.Type {
+ case llm.StreamEventTextStart:
+ emitAssistantTextStart()
+ case llm.StreamEventTextDelta:
+ emitAssistantTextStart()
+ if ev.Delta != "" {
+ assistantTextDelta = true
+ s.emit(EventAssistantTextDelta, map[string]any{"delta": ev.Delta})
+ }
+ case llm.StreamEventFinish:
+ if ev.Response != nil {
+ cp := *ev.Response
+ resp = &cp
+ }
+ case llm.StreamEventError:
+ if ev.Err != nil {
+ streamErr = ev.Err
+ } else {
+ streamErr = llm.NewStreamError(req.Provider, "stream error")
+ }
+ case llm.StreamEventProviderEvent:
+ if lifecycle, ok := llm.ParseCodexAppServerToolLifecycle(ev); ok {
+ callID := strings.TrimSpace(lifecycle.CallID)
+ if callID == "" {
+ if !lifecycle.Completed {
+ providerToolCallCount++
+ }
+ } else {
+ if _, exists := seenProviderToolCalls[callID]; !exists {
+ seenProviderToolCalls[callID] = struct{}{}
+ providerToolCallCount++
+ }
+ if tn := strings.TrimSpace(lifecycle.ToolName); tn != "" {
+ providerToolNameByCallID[callID] = tn
+ }
+ }
+ if lifecycle.Completed {
+ if callID != "" {
+ if _, ended := seenProviderToolEnds[callID]; ended {
+ continue
+ }
+ }
+ if callID == "" {
+ emitToolOutputDeltas(lifecycle.ToolName, lifecycle.CallID, lifecycle.FullOutput)
+ } else if _, seen := seenProviderOutputDeltas[callID]; !seen {
+ emitToolOutputDeltas(lifecycle.ToolName, lifecycle.CallID, lifecycle.FullOutput)
+ seenProviderOutputDeltas[callID] = struct{}{}
+ providerOutputByCallID[callID] = lifecycle.FullOutput
+ } else if lifecycle.FullOutput != "" && providerOutputByCallID[callID] != lifecycle.FullOutput {
+ // Reconcile mismatch: provider completion output is authoritative.
+ emitToolOutputDeltas(lifecycle.ToolName, lifecycle.CallID, lifecycle.FullOutput)
+ providerOutputByCallID[callID] = lifecycle.FullOutput
+ }
+ s.emit(EventToolCallEnd, map[string]any{
+ "tool_name": lifecycle.ToolName,
+ "call_id": lifecycle.CallID,
+ "is_error": lifecycle.IsError,
+ "full_output": lifecycle.FullOutput,
+ "source": "provider",
+ })
+ if callID != "" {
+ seenProviderToolEnds[callID] = struct{}{}
+ }
+ } else {
+ data := map[string]any{
+ "tool_name": lifecycle.ToolName,
+ "call_id": lifecycle.CallID,
+ "arguments_json": lifecycle.ArgumentsJSON,
+ "source": "provider",
+ }
+ s.emit(EventToolCallStart, data)
+ }
+ } else if outputDelta, ok := llm.ParseCodexAppServerToolOutputDelta(ev); ok {
+ callID := strings.TrimSpace(outputDelta.CallID)
+ toolName := strings.TrimSpace(outputDelta.ToolName)
+ if callID != "" {
+ if _, exists := seenProviderToolCalls[callID]; !exists {
+ seenProviderToolCalls[callID] = struct{}{}
+ providerToolCallCount++
+ }
+ if mappedToolName := strings.TrimSpace(providerToolNameByCallID[callID]); mappedToolName != "" {
+ toolName = mappedToolName
+ } else if toolName != "" {
+ providerToolNameByCallID[callID] = toolName
+ }
+ seenProviderOutputDeltas[callID] = struct{}{}
+ providerOutputByCallID[callID] += outputDelta.Delta
+ }
+ s.emit(EventToolCallOutputDelta, map[string]any{
+ "tool_name": toolName,
+ "call_id": callID,
+ "delta": outputDelta.Delta,
+ "source": "provider",
+ })
+ }
+ }
+ }
+ _ = stream.Close()
+
+ if streamErr != nil {
+ s.emit(EventError, map[string]any{"error": streamErr.Error()})
+ // Spec: context overflow should emit a warning (no automatic compaction).
+ var cle *llm.ContextLengthError
+ if errors.As(streamErr, &cle) {
+ s.emit(EventWarning, map[string]any{"message": "Context length exceeded"})
+ }
+ // Spec: non-retryable/unrecoverable errors transition the session to CLOSED.
+ var le llm.Error
+ if errors.As(streamErr, &le) && !le.Retryable() {
+ s.Close()
+ }
+ return "", streamErr
+ }
+
+ if resp == nil {
+ resp = acc.Response()
+ }
+ if resp == nil {
+ err := llm.NewStreamError(req.Provider, "stream ended without finish event")
+ s.emit(EventError, map[string]any{"error": err.Error()})
+ return "", err
+ }
+
+ calls := resp.ToolCalls()
+ turnToolCallCount := len(calls)
+ if providerToolCallCount > turnToolCallCount {
+ turnToolCallCount = providerToolCallCount
+ }
txt := resp.Text()
- s.emit(EventAssistantTextStart, map[string]any{})
+ emitAssistantTextStart()
s.appendTurn(TurnAssistant, resp.Message)
- if strings.TrimSpace(txt) != "" {
+ if !assistantTextDelta && strings.TrimSpace(txt) != "" {
s.emit(EventAssistantTextDelta, map[string]any{"delta": txt})
}
- s.emit(EventAssistantTextEnd, map[string]any{"text": txt})
+ s.emit(EventAssistantTextEnd, map[string]any{
+ "text": txt,
+ "tool_call_count": turnToolCallCount,
+ })
- calls := resp.ToolCalls()
if len(calls) == 0 {
return txt, nil
}
@@ -629,6 +785,33 @@ func (s *Session) processOneInput(ctx context.Context, input string) (string, er
return "", fmt.Errorf("max tool rounds reached")
}
+func utf8Chunk(full string, maxBytes int) []string {
+ if maxBytes <= 0 || len(full) == 0 {
+ return nil
+ }
+ chunks := make([]string, 0, len(full)/maxBytes+1)
+ for i := 0; i < len(full); {
+ j := i + maxBytes
+ if j >= len(full) {
+ chunks = append(chunks, full[i:])
+ break
+ }
+ for j > i && !utf8.RuneStart(full[j]) {
+ j--
+ }
+ if j == i {
+ _, size := utf8.DecodeRuneInString(full[i:])
+ if size <= 0 {
+ size = 1
+ }
+ j = i + size
+ }
+ chunks = append(chunks, full[i:j])
+ i = j
+ }
+ return chunks
+}
+
func (s *Session) drainSteering() []string {
s.mu.Lock()
defer s.mu.Unlock()
diff --git a/internal/agent/session_dod_test.go b/internal/agent/session_dod_test.go
index b48f3048..9696c050 100644
--- a/internal/agent/session_dod_test.go
+++ b/internal/agent/session_dod_test.go
@@ -874,7 +874,8 @@ func (a *errAdapter) Complete(ctx context.Context, req llm.Request) (llm.Respons
func (a *errAdapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, error) {
_ = ctx
_ = req
- return nil, fmt.Errorf("stream not implemented in errAdapter")
+ a.calls++
+ return nil, a.err
}
type flaky429Adapter struct {
@@ -894,9 +895,12 @@ func (a *flaky429Adapter) Complete(ctx context.Context, req llm.Request) (llm.Re
return llm.Response{Message: llm.Assistant("ok")}, nil
}
func (a *flaky429Adapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, error) {
- _ = ctx
_ = req
- return nil, fmt.Errorf("stream not implemented in flaky429Adapter")
+ a.calls++
+ if a.calls <= a.failCount {
+ return nil, llm.ErrorFromHTTPStatus(a.name, 429, "rate limited", nil, nil)
+ }
+ return streamFromResponse(ctx, llm.Response{Provider: a.name, Model: req.Model, Message: llm.Assistant("ok")}), nil
}
func TestSession_AuthenticationError_ClosesSession(t *testing.T) {
diff --git a/internal/agent/session_test.go b/internal/agent/session_test.go
index 7eff41c1..9f96c269 100644
--- a/internal/agent/session_test.go
+++ b/internal/agent/session_test.go
@@ -3,7 +3,6 @@ package agent
import (
"context"
"encoding/json"
- "errors"
"fmt"
"os"
"path/filepath"
@@ -45,17 +44,582 @@ func (a *fakeAdapter) Complete(ctx context.Context, req llm.Request) (llm.Respon
}
func (a *fakeAdapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, error) {
+ resp, err := a.Complete(ctx, req)
+ if err != nil {
+ return nil, err
+ }
+ return streamFromResponse(ctx, resp), nil
+}
+
+func (a *fakeAdapter) Requests() []llm.Request {
+ a.mu.Lock()
+ defer a.mu.Unlock()
+ return append([]llm.Request{}, a.requests...)
+}
+
+type streamFinishWithoutResponseAdapter struct {
+ name string
+
+ mu sync.Mutex
+ requests []llm.Request
+ i int
+}
+
+func (a *streamFinishWithoutResponseAdapter) Name() string { return a.name }
+
+func (a *streamFinishWithoutResponseAdapter) Complete(ctx context.Context, req llm.Request) (llm.Response, error) {
_ = ctx
_ = req
- return nil, errors.New("stream not implemented in fakeAdapter")
+ return llm.Response{Provider: a.name, Model: req.Model, Message: llm.Assistant("unexpected complete")}, nil
}
-func (a *fakeAdapter) Requests() []llm.Request {
+func (a *streamFinishWithoutResponseAdapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, error) {
+ a.mu.Lock()
+ a.requests = append(a.requests, req)
+ step := a.i
+ a.i++
+ a.mu.Unlock()
+
+ stream := llm.NewChanStream(nil)
+ go func() {
+ defer stream.CloseSend()
+ select {
+ case <-ctx.Done():
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventError, Err: ctx.Err()})
+ return
+ default:
+ }
+
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventStreamStart, Model: req.Model})
+ if step == 0 {
+ call := llm.ToolCallData{
+ ID: "c1",
+ Name: "write_file",
+ Type: "function",
+ Arguments: json.RawMessage(`{"file_path":"hello.txt","content":"Hello"}`),
+ }
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventToolCallStart, ToolCall: &llm.ToolCallData{ID: call.ID, Name: call.Name, Type: call.Type}})
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventToolCallDelta, ToolCall: &call})
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventToolCallEnd, ToolCall: &call})
+ finish := llm.FinishReason{Reason: llm.FinishReasonToolCalls}
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventFinish, FinishReason: &finish})
+ return
+ }
+
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventTextStart, TextID: "text_0"})
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventTextDelta, TextID: "text_0", Delta: "ok"})
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventTextEnd, TextID: "text_0"})
+ resp := llm.Response{
+ Provider: a.name,
+ Model: req.Model,
+ Message: llm.Assistant("ok"),
+ Finish: llm.FinishReason{Reason: llm.FinishReasonStop},
+ }
+ stream.Send(llm.StreamEvent{
+ Type: llm.StreamEventFinish,
+ FinishReason: &resp.Finish,
+ Response: &resp,
+ })
+ }()
+ return stream, nil
+}
+
+func (a *streamFinishWithoutResponseAdapter) Requests() []llm.Request {
a.mu.Lock()
defer a.mu.Unlock()
return append([]llm.Request{}, a.requests...)
}
+type providerEventStreamAdapter struct {
+ name string
+ completedItem map[string]any
+ disableOutputDelta bool
+}
+
+func (a *providerEventStreamAdapter) Name() string { return a.name }
+
+func (a *providerEventStreamAdapter) Complete(ctx context.Context, req llm.Request) (llm.Response, error) {
+ _ = ctx
+ return llm.Response{Provider: a.name, Model: req.Model, Message: llm.Assistant("unexpected complete")}, nil
+}
+
+func (a *providerEventStreamAdapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, error) {
+ stream := llm.NewChanStream(nil)
+ go func() {
+ defer stream.CloseSend()
+ select {
+ case <-ctx.Done():
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventError, Err: ctx.Err()})
+ return
+ default:
+ }
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventStreamStart, Model: req.Model})
+ stream.Send(llm.StreamEvent{
+ Type: llm.StreamEventProviderEvent,
+ EventType: "item/started",
+ Raw: map[string]any{
+ "item": map[string]any{
+ "id": "cmd_1",
+ "type": "commandExecution",
+ "command": "pwd",
+ "cwd": "/tmp/worktree",
+ "status": "inProgress",
+ },
+ },
+ })
+ if !a.disableOutputDelta {
+ stream.Send(llm.StreamEvent{
+ Type: llm.StreamEventProviderEvent,
+ EventType: "item/commandExecution/outputDelta",
+ Raw: map[string]any{
+ "itemId": "cmd_1",
+ "delta": "streamed output",
+ },
+ })
+ }
+ stream.Send(llm.StreamEvent{
+ Type: llm.StreamEventProviderEvent,
+ EventType: "item/completed",
+ Raw: map[string]any{
+ "item": a.providerCompletedItem(),
+ },
+ })
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventTextStart, TextID: "text_0"})
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventTextDelta, TextID: "text_0", Delta: "ok"})
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventTextEnd, TextID: "text_0"})
+ resp := llm.Response{
+ Provider: a.name,
+ Model: req.Model,
+ Message: llm.Assistant("ok"),
+ Finish: llm.FinishReason{Reason: llm.FinishReasonStop},
+ }
+ stream.Send(llm.StreamEvent{
+ Type: llm.StreamEventFinish,
+ FinishReason: &resp.Finish,
+ Response: &resp,
+ })
+ }()
+ return stream, nil
+}
+
+func (a *providerEventStreamAdapter) providerCompletedItem() map[string]any {
+ item := map[string]any{
+ "id": "cmd_1",
+ "type": "commandExecution",
+ "status": "completed",
+ }
+ for k, v := range a.completedItem {
+ item[k] = v
+ }
+ return item
+}
+
+func streamFromResponse(ctx context.Context, resp llm.Response) llm.Stream {
+ stream := llm.NewChanStream(nil)
+ go func() {
+ defer stream.CloseSend()
+ select {
+ case <-ctx.Done():
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventError, Err: ctx.Err()})
+ return
+ default:
+ }
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventStreamStart, ID: resp.ID, Model: resp.Model})
+ text := resp.Text()
+ if text != "" {
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventTextStart, TextID: "text_0"})
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventTextDelta, TextID: "text_0", Delta: text})
+ stream.Send(llm.StreamEvent{Type: llm.StreamEventTextEnd, TextID: "text_0"})
+ }
+ finish := resp.Finish
+ usage := resp.Usage
+ response := resp
+ stream.Send(llm.StreamEvent{
+ Type: llm.StreamEventFinish,
+ FinishReason: &finish,
+ Usage: &usage,
+ Response: &response,
+ })
+ }()
+ return stream
+}
+
+func TestSession_ProviderToolLifecycleEvents_EmitToolCallStartEnd(t *testing.T) {
+ dir := t.TempDir()
+ c := llm.NewClient()
+ c.Register(&providerEventStreamAdapter{name: "openai"})
+
+ sess, err := NewSession(c, NewOpenAIProfile("gpt-5.2"), NewLocalExecutionEnvironment(dir), SessionConfig{})
+ if err != nil {
+ t.Fatalf("NewSession: %v", err)
+ }
+ ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+ out, err := sess.ProcessInput(ctx, "hi")
+ if err != nil {
+ t.Fatalf("ProcessInput: %v", err)
+ }
+ if strings.TrimSpace(out) != "ok" {
+ t.Fatalf("out: %q", out)
+ }
+ sess.Close()
+
+ seenStart := false
+ seenEnd := false
+ seenTurnCount := false
+ startArgsJSON := ""
+ endFullOutput := "unset"
+ for ev := range sess.Events() {
+ switch ev.Kind {
+ case EventToolCallStart:
+ if ev.Data["call_id"] == "cmd_1" && ev.Data["tool_name"] == "exec_command" {
+ seenStart = true
+ rawArgs, ok := ev.Data["arguments_json"]
+ if !ok {
+ t.Fatalf("provider tool start missing arguments_json: %+v", ev.Data)
+ }
+ argsJSON, ok := rawArgs.(string)
+ if !ok {
+ t.Fatalf("provider tool start arguments_json type: got %T want string", rawArgs)
+ }
+ startArgsJSON = strings.TrimSpace(argsJSON)
+ }
+ case EventToolCallEnd:
+ if ev.Data["call_id"] == "cmd_1" && ev.Data["tool_name"] == "exec_command" {
+ seenEnd = true
+ rawFull, ok := ev.Data["full_output"]
+ if !ok {
+ t.Fatalf("provider tool end missing full_output: %+v", ev.Data)
+ }
+ fullOutput, ok := rawFull.(string)
+ if !ok {
+ t.Fatalf("provider tool end full_output type: got %T want string", rawFull)
+ }
+ endFullOutput = fullOutput
+ }
+ case EventAssistantTextEnd:
+ if v, ok := ev.Data["tool_call_count"]; ok {
+ if n, ok := v.(int); ok && n == 1 {
+ seenTurnCount = true
+ }
+ }
+ }
+ }
+ if !seenStart || !seenEnd {
+ t.Fatalf("expected provider-derived tool lifecycle events, got start=%t end=%t", seenStart, seenEnd)
+ }
+ if !seenTurnCount {
+ t.Fatalf("expected assistant text end with tool_call_count=1 from provider lifecycle events")
+ }
+ if startArgsJSON == "" {
+ t.Fatalf("expected non-empty provider arguments_json")
+ }
+ var parsedArgs map[string]any
+ if err := json.Unmarshal([]byte(startArgsJSON), &parsedArgs); err != nil {
+ t.Fatalf("provider arguments_json must be valid json: %v (value=%q)", err, startArgsJSON)
+ }
+ if endFullOutput != "" {
+ t.Fatalf("provider full_output: got %q want empty string", endFullOutput)
+ }
+}
+
+func TestSession_ProviderToolLifecycleEvents_PropagatesProviderOutput(t *testing.T) {
+ dir := t.TempDir()
+ c := llm.NewClient()
+ c.Register(&providerEventStreamAdapter{
+ name: "openai",
+ disableOutputDelta: true,
+ completedItem: map[string]any{
+ "aggregatedOutput": "first\nsecond\nwarning",
+ },
+ })
+
+ sess, err := NewSession(c, NewOpenAIProfile("gpt-5.2"), NewLocalExecutionEnvironment(dir), SessionConfig{})
+ if err != nil {
+ t.Fatalf("NewSession: %v", err)
+ }
+ ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+ out, err := sess.ProcessInput(ctx, "hi")
+ if err != nil {
+ t.Fatalf("ProcessInput: %v", err)
+ }
+ if strings.TrimSpace(out) != "ok" {
+ t.Fatalf("out: %q", out)
+ }
+ sess.Close()
+
+ seenOutputDelta := false
+ endFullOutput := ""
+ for ev := range sess.Events() {
+ switch ev.Kind {
+ case EventToolCallOutputDelta:
+ if ev.Data["call_id"] == "cmd_1" && ev.Data["tool_name"] == "exec_command" {
+ delta, _ := ev.Data["delta"].(string)
+ if strings.Contains(delta, "first") || strings.Contains(delta, "warning") {
+ seenOutputDelta = true
+ }
+ }
+ case EventToolCallEnd:
+ if ev.Data["call_id"] == "cmd_1" && ev.Data["tool_name"] == "exec_command" {
+ endFullOutput, _ = ev.Data["full_output"].(string)
+ }
+ }
+ }
+ if !seenOutputDelta {
+ t.Fatalf("expected provider output delta event for cmd_1")
+ }
+ if !strings.Contains(endFullOutput, "first\nsecond") {
+ t.Fatalf("provider full_output missing stdout: %q", endFullOutput)
+ }
+ if !strings.Contains(endFullOutput, "warning") {
+ t.Fatalf("provider full_output missing stderr: %q", endFullOutput)
+ }
+}
+
+func TestSession_ProviderToolLifecycleEvents_DoesNotDuplicateOutputDeltaWhenProviderStreamsAndAggregates(t *testing.T) {
+ dir := t.TempDir()
+ c := llm.NewClient()
+ c.Register(&providerEventStreamAdapter{
+ name: "openai",
+ completedItem: map[string]any{
+ "aggregatedOutput": "streamed output",
+ },
+ })
+
+ sess, err := NewSession(c, NewOpenAIProfile("gpt-5.2"), NewLocalExecutionEnvironment(dir), SessionConfig{})
+ if err != nil {
+ t.Fatalf("NewSession: %v", err)
+ }
+ ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+ out, err := sess.ProcessInput(ctx, "hi")
+ if err != nil {
+ t.Fatalf("ProcessInput: %v", err)
+ }
+ if strings.TrimSpace(out) != "ok" {
+ t.Fatalf("out: %q", out)
+ }
+ sess.Close()
+
+ deltaCount := 0
+ for ev := range sess.Events() {
+ if ev.Kind != EventToolCallOutputDelta {
+ continue
+ }
+ if ev.Data["call_id"] != "cmd_1" || ev.Data["tool_name"] != "exec_command" {
+ continue
+ }
+ delta, _ := ev.Data["delta"].(string)
+ if delta == "streamed output" {
+ deltaCount++
+ }
+ }
+ if deltaCount != 1 {
+ t.Fatalf("expected exactly one streamed output delta for cmd_1, got %d", deltaCount)
+ }
+}
+
+func TestSession_ProviderToolLifecycleEvents_ReconcilesAggregatedOutputMismatch(t *testing.T) {
+ dir := t.TempDir()
+ c := llm.NewClient()
+ c.Register(&providerEventStreamAdapter{
+ name: "openai",
+ completedItem: map[string]any{
+ "aggregatedOutput": "streamed output\ntail",
+ },
+ })
+
+ sess, err := NewSession(c, NewOpenAIProfile("gpt-5.2"), NewLocalExecutionEnvironment(dir), SessionConfig{})
+ if err != nil {
+ t.Fatalf("NewSession: %v", err)
+ }
+ ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+ out, err := sess.ProcessInput(ctx, "hi")
+ if err != nil {
+ t.Fatalf("ProcessInput: %v", err)
+ }
+ if strings.TrimSpace(out) != "ok" {
+ t.Fatalf("out: %q", out)
+ }
+ sess.Close()
+
+ seenReconciledDelta := false
+ endFullOutput := ""
+ for ev := range sess.Events() {
+ switch ev.Kind {
+ case EventToolCallOutputDelta:
+ if ev.Data["call_id"] == "cmd_1" && ev.Data["tool_name"] == "exec_command" {
+ delta, _ := ev.Data["delta"].(string)
+ if delta == "streamed output\ntail" {
+ seenReconciledDelta = true
+ }
+ }
+ case EventToolCallEnd:
+ if ev.Data["call_id"] == "cmd_1" && ev.Data["tool_name"] == "exec_command" {
+ endFullOutput, _ = ev.Data["full_output"].(string)
+ }
+ }
+ }
+ if !seenReconciledDelta {
+ t.Fatalf("expected reconciled provider output delta for mismatched completion output")
+ }
+ if endFullOutput != "streamed output\ntail" {
+ t.Fatalf("full_output mismatch: got %q", endFullOutput)
+ }
+}
+
+func TestSession_ProviderToolLifecycleEvents_EmitsProviderOutputDeltaNotifications(t *testing.T) {
+ dir := t.TempDir()
+ c := llm.NewClient()
+ c.Register(&providerEventStreamAdapter{name: "openai"})
+
+ sess, err := NewSession(c, NewOpenAIProfile("gpt-5.2"), NewLocalExecutionEnvironment(dir), SessionConfig{})
+ if err != nil {
+ t.Fatalf("NewSession: %v", err)
+ }
+ ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+ out, err := sess.ProcessInput(ctx, "hi")
+ if err != nil {
+ t.Fatalf("ProcessInput: %v", err)
+ }
+ if strings.TrimSpace(out) != "ok" {
+ t.Fatalf("out: %q", out)
+ }
+ sess.Close()
+
+ seenProviderDelta := false
+ for ev := range sess.Events() {
+ if ev.Kind != EventToolCallOutputDelta {
+ continue
+ }
+ if ev.Data["call_id"] != "cmd_1" || ev.Data["tool_name"] != "exec_command" {
+ continue
+ }
+ delta, _ := ev.Data["delta"].(string)
+ if strings.Contains(delta, "streamed output") {
+ seenProviderDelta = true
+ }
+ }
+ if !seenProviderDelta {
+ t.Fatalf("expected streamed provider output delta for cmd_1")
+ }
+}
+
+func TestUTF8Chunk_DoesNotSplitMultiByteRunes(t *testing.T) {
+ in := "a😀b" // 1 + 4 + 1 bytes
+ chunks := utf8Chunk(in, 2)
+ if len(chunks) != 3 {
+ t.Fatalf("chunk count: got %d want %d (%v)", len(chunks), 3, chunks)
+ }
+ if chunks[0] != "a" {
+ t.Fatalf("chunk[0]: got %q want %q", chunks[0], "a")
+ }
+ if chunks[1] != "😀" {
+ t.Fatalf("chunk[1]: got %q want %q", chunks[1], "😀")
+ }
+ if chunks[2] != "b" {
+ t.Fatalf("chunk[2]: got %q want %q", chunks[2], "b")
+ }
+}
+
+func TestSession_StreamFinishWithoutResponse_PreservesToolCalls(t *testing.T) {
+ dir := t.TempDir()
+ c := llm.NewClient()
+ f := &streamFinishWithoutResponseAdapter{name: "openai"}
+ c.Register(f)
+
+ sess, err := NewSession(c, NewOpenAIProfile("gpt-5.2"), NewLocalExecutionEnvironment(dir), SessionConfig{})
+ if err != nil {
+ t.Fatalf("NewSession: %v", err)
+ }
+ ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+ out, err := sess.ProcessInput(ctx, "write a file")
+ if err != nil {
+ t.Fatalf("ProcessInput: %v", err)
+ }
+ if strings.TrimSpace(out) != "ok" {
+ t.Fatalf("out: %q", out)
+ }
+ sess.Close()
+
+ b, err := os.ReadFile(filepath.Join(dir, "hello.txt"))
+ if err != nil {
+ t.Fatalf("read hello.txt: %v", err)
+ }
+ if strings.TrimSpace(string(b)) != "Hello" {
+ t.Fatalf("hello.txt: %q", string(b))
+ }
+
+ reqs := f.Requests()
+ if len(reqs) != 2 {
+ t.Fatalf("requests: got %d want 2", len(reqs))
+ }
+ foundToolResult := false
+ for _, m := range reqs[1].Messages {
+ if m.Role == llm.RoleTool {
+ foundToolResult = true
+ break
+ }
+ }
+ if !foundToolResult {
+ t.Fatalf("expected tool result in second request, got %+v", reqs[1].Messages)
+ }
+}
+
+func TestSession_AssistantTextEnd_IncludesToolCallCount(t *testing.T) {
+ dir := t.TempDir()
+ c := llm.NewClient()
+ f := &streamFinishWithoutResponseAdapter{name: "openai"}
+ c.Register(f)
+
+ sess, err := NewSession(c, NewOpenAIProfile("gpt-5.2"), NewLocalExecutionEnvironment(dir), SessionConfig{})
+ if err != nil {
+ t.Fatalf("NewSession: %v", err)
+ }
+ ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
+ defer cancel()
+ out, err := sess.ProcessInput(ctx, "write a file")
+ if err != nil {
+ t.Fatalf("ProcessInput: %v", err)
+ }
+ if strings.TrimSpace(out) != "ok" {
+ t.Fatalf("out: %q", out)
+ }
+ sess.Close()
+
+ seenToolRoundEnd := false
+ seenFinalRoundEnd := false
+ for ev := range sess.Events() {
+ if ev.Kind != EventAssistantTextEnd {
+ continue
+ }
+ v, ok := ev.Data["tool_call_count"]
+ if !ok {
+ t.Fatalf("assistant text end missing tool_call_count: %+v", ev.Data)
+ }
+ toolCalls, ok := v.(int)
+ if !ok {
+ t.Fatalf("tool_call_count type: got %T want int", v)
+ }
+ if toolCalls == 1 {
+ seenToolRoundEnd = true
+ }
+ if toolCalls == 0 {
+ seenFinalRoundEnd = true
+ }
+ }
+ if !seenToolRoundEnd {
+ t.Fatalf("expected assistant text end with tool_call_count=1")
+ }
+ if !seenFinalRoundEnd {
+ t.Fatalf("expected assistant text end with tool_call_count=0")
+ }
+}
+
func TestSession_NaturalCompletion_LoadsOnlyProfileDocs(t *testing.T) {
dir := t.TempDir()
_ = os.WriteFile(filepath.Join(dir, "AGENTS.md"), []byte("AGENTS\n"), 0o644)
diff --git a/internal/attractor/engine/api_client_from_runtime.go b/internal/attractor/engine/api_client_from_runtime.go
index 6d3fadc0..854d9412 100644
--- a/internal/attractor/engine/api_client_from_runtime.go
+++ b/internal/attractor/engine/api_client_from_runtime.go
@@ -8,6 +8,7 @@ import (
"github.com/danshapiro/kilroy/internal/llm"
"github.com/danshapiro/kilroy/internal/llm/providers/anthropic"
+ "github.com/danshapiro/kilroy/internal/llm/providers/codexappserver"
"github.com/danshapiro/kilroy/internal/llm/providers/google"
"github.com/danshapiro/kilroy/internal/llm/providers/openai"
"github.com/danshapiro/kilroy/internal/llm/providers/openaicompat"
@@ -21,7 +22,15 @@ func newAPIClientFromProviderRuntimes(runtimes map[string]ProviderRuntime) (*llm
if rt.Backend != BackendAPI {
continue
}
- apiKey := strings.TrimSpace(os.Getenv(rt.API.DefaultAPIKeyEnv))
+ apiKeyEnv := strings.TrimSpace(rt.API.DefaultAPIKeyEnv)
+ if rt.API.Protocol == providerspec.ProtocolCodexAppServer && apiKeyEnv == "" {
+ c.Register(codexappserver.NewAdapter(codexappserver.AdapterOptions{Provider: key}))
+ continue
+ }
+ if apiKeyEnv == "" {
+ continue
+ }
+ apiKey := strings.TrimSpace(os.Getenv(apiKeyEnv))
if apiKey == "" {
continue
}
@@ -41,6 +50,8 @@ func newAPIClientFromProviderRuntimes(runtimes map[string]ProviderRuntime) (*llm
OptionsKey: rt.API.ProviderOptionsKey,
ExtraHeaders: rt.APIHeaders(),
}))
+ case providerspec.ProtocolCodexAppServer:
+ c.Register(codexappserver.NewAdapter(codexappserver.AdapterOptions{Provider: key}))
default:
return nil, fmt.Errorf("unsupported api protocol %q for provider %s", rt.API.Protocol, key)
}
diff --git a/internal/attractor/engine/api_client_from_runtime_test.go b/internal/attractor/engine/api_client_from_runtime_test.go
index 48e36838..3b5f1071 100644
--- a/internal/attractor/engine/api_client_from_runtime_test.go
+++ b/internal/attractor/engine/api_client_from_runtime_test.go
@@ -114,6 +114,77 @@ func TestNewAPIClientFromProviderRuntimes_RegistersMinimaxViaOpenAICompat(t *tes
}
}
+func TestNewAPIClientFromProviderRuntimes_RegistersCodexAppServerProtocol(t *testing.T) {
+ runtimes := map[string]ProviderRuntime{
+ "codex-app-server": {
+ Key: "codex-app-server",
+ Backend: BackendAPI,
+ API: providerspec.APISpec{
+ Protocol: providerspec.ProtocolCodexAppServer,
+ DefaultAPIKeyEnv: "",
+ },
+ },
+ }
+ c, err := newAPIClientFromProviderRuntimes(runtimes)
+ if err != nil {
+ t.Fatalf("newAPIClientFromProviderRuntimes: %v", err)
+ }
+ if len(c.ProviderNames()) != 1 || c.ProviderNames()[0] != "codex-app-server" {
+ t.Fatalf("expected codex-app-server adapter, got %v", c.ProviderNames())
+ }
+}
+
+func TestNewAPIClientFromProviderRuntimes_CodexAppServerHonorsExplicitAPIKeyEnv(t *testing.T) {
+ runtimes := map[string]ProviderRuntime{
+ "codex-app-server": {
+ Key: "codex-app-server",
+ Backend: BackendAPI,
+ API: providerspec.APISpec{
+ Protocol: providerspec.ProtocolCodexAppServer,
+ DefaultAPIKeyEnv: "CODEX_APP_SERVER_TOKEN",
+ },
+ },
+ }
+
+ t.Setenv("CODEX_APP_SERVER_TOKEN", "")
+ c, err := newAPIClientFromProviderRuntimes(runtimes)
+ if err != nil {
+ t.Fatalf("newAPIClientFromProviderRuntimes: %v", err)
+ }
+ if len(c.ProviderNames()) != 0 {
+ t.Fatalf("expected no adapters when explicit codex api key env is unset, got %v", c.ProviderNames())
+ }
+
+ t.Setenv("CODEX_APP_SERVER_TOKEN", "present")
+ c, err = newAPIClientFromProviderRuntimes(runtimes)
+ if err != nil {
+ t.Fatalf("newAPIClientFromProviderRuntimes: %v", err)
+ }
+ if len(c.ProviderNames()) != 1 || c.ProviderNames()[0] != "codex-app-server" {
+ t.Fatalf("expected codex-app-server adapter when explicit env is set, got %v", c.ProviderNames())
+ }
+}
+
+func TestNewAPIClientFromProviderRuntimes_CodexAppServerPreservesCustomProviderKey(t *testing.T) {
+ runtimes := map[string]ProviderRuntime{
+ "my-codex-provider": {
+ Key: "my-codex-provider",
+ Backend: BackendAPI,
+ API: providerspec.APISpec{
+ Protocol: providerspec.ProtocolCodexAppServer,
+ DefaultAPIKeyEnv: "",
+ },
+ },
+ }
+ c, err := newAPIClientFromProviderRuntimes(runtimes)
+ if err != nil {
+ t.Fatalf("newAPIClientFromProviderRuntimes: %v", err)
+ }
+ if len(c.ProviderNames()) != 1 || c.ProviderNames()[0] != "my-codex-provider" {
+ t.Fatalf("expected custom codex provider key to be preserved, got %v", c.ProviderNames())
+ }
+}
+
func TestResolveBuiltInBaseURLOverride_MinimaxUsesEnvOverride(t *testing.T) {
t.Setenv("MINIMAX_BASE_URL", "http://127.0.0.1:8888")
got := resolveBuiltInBaseURLOverride("minimax", "https://api.minimax.io")
diff --git a/internal/attractor/engine/api_stream_progress.go b/internal/attractor/engine/api_stream_progress.go
new file mode 100644
index 00000000..98afa408
--- /dev/null
+++ b/internal/attractor/engine/api_stream_progress.go
@@ -0,0 +1,141 @@
+package engine
+
+import (
+ "sync"
+ "time"
+)
+
+// streamProgressEmitter batches LLM text deltas and flushes them
+// as progress events at a capped rate (default 100ms). Tool-call
+// and turn-end events are forwarded immediately.
+type streamProgressEmitter struct {
+ eng *Engine
+ nodeID string
+ runID string
+
+ mu sync.Mutex
+ buf string
+ flushTimer *time.Timer
+ interval time.Duration
+ closed bool
+}
+
+func newStreamProgressEmitter(eng *Engine, nodeID, runID string) *streamProgressEmitter {
+ return &streamProgressEmitter{
+ eng: eng,
+ nodeID: nodeID,
+ runID: runID,
+ interval: 100 * time.Millisecond,
+ }
+}
+
+// appendDelta buffers a text delta and schedules a flush.
+func (e *streamProgressEmitter) appendDelta(delta string) {
+ if delta == "" || e.eng == nil {
+ return
+ }
+ e.mu.Lock()
+ defer e.mu.Unlock()
+ if e.closed {
+ return
+ }
+ e.buf += delta
+ if e.flushTimer == nil {
+ e.flushTimer = time.AfterFunc(e.interval, e.timerFlush)
+ }
+}
+
+// emitToolCallStart forwards a tool-call-start event immediately.
+func (e *streamProgressEmitter) emitToolCallStart(toolName, callID string) {
+ if e.eng == nil {
+ return
+ }
+ e.flushDelta()
+ e.eng.appendProgress(map[string]any{
+ "event": "llm_tool_call_start",
+ "node_id": e.nodeID,
+ "run_id": e.runID,
+ "backend": "api",
+ "tool_name": toolName,
+ "call_id": callID,
+ })
+}
+
+// emitToolCallEnd forwards a tool-call-end event immediately.
+func (e *streamProgressEmitter) emitToolCallEnd(toolName, callID string, isError bool) {
+ if e.eng == nil {
+ return
+ }
+ e.flushDelta()
+ e.eng.appendProgress(map[string]any{
+ "event": "llm_tool_call_end",
+ "node_id": e.nodeID,
+ "run_id": e.runID,
+ "backend": "api",
+ "tool_name": toolName,
+ "call_id": callID,
+ "is_error": isError,
+ })
+}
+
+// emitTurnEnd flushes any pending delta and emits a turn-end event.
+func (e *streamProgressEmitter) emitTurnEnd(textLen int, toolCallCount int) {
+ if e.eng == nil {
+ return
+ }
+ e.flushDelta()
+ e.eng.appendProgress(map[string]any{
+ "event": "llm_turn_end",
+ "node_id": e.nodeID,
+ "run_id": e.runID,
+ "backend": "api",
+ "text_length": textLen,
+ "tool_call_count": toolCallCount,
+ })
+}
+
+// close stops the timer and flushes remaining data.
+func (e *streamProgressEmitter) close() {
+ e.mu.Lock()
+ e.closed = true
+ if e.flushTimer != nil {
+ e.flushTimer.Stop()
+ e.flushTimer = nil
+ }
+ pending := e.buf
+ e.buf = ""
+ e.mu.Unlock()
+ if pending != "" && e.eng != nil {
+ e.eng.appendProgress(map[string]any{
+ "event": "llm_text_delta",
+ "node_id": e.nodeID,
+ "run_id": e.runID,
+ "backend": "api",
+ "delta": pending,
+ })
+ }
+}
+
+func (e *streamProgressEmitter) timerFlush() {
+ e.flushDelta()
+}
+
+func (e *streamProgressEmitter) flushDelta() {
+ e.mu.Lock()
+ if e.flushTimer != nil {
+ e.flushTimer.Stop()
+ e.flushTimer = nil
+ }
+ pending := e.buf
+ e.buf = ""
+ e.mu.Unlock()
+ if pending != "" && e.eng != nil {
+ e.eng.appendProgress(map[string]any{
+ "event": "llm_text_delta",
+ "node_id": e.nodeID,
+ "run_id": e.runID,
+ "backend": "api",
+ "delta": pending,
+ })
+ }
+}
diff --git a/internal/attractor/engine/api_stream_progress_test.go b/internal/attractor/engine/api_stream_progress_test.go
new file mode 100644
index 00000000..20574f64
--- /dev/null
+++ b/internal/attractor/engine/api_stream_progress_test.go
@@ -0,0 +1,286 @@
+package engine
+
+import (
+ "context"
+ "sync"
+ "testing"
+ "time"
+
+ "github.com/danshapiro/kilroy/internal/agent"
+)
+
+func TestThrottledEmitter_ShouldBatchTextDeltasWithinFlushInterval(t *testing.T) {
+ var mu sync.Mutex
+ var captured []map[string]any
+ eng := &Engine{
+ progressSink: func(ev map[string]any) {
+ mu.Lock()
+ defer mu.Unlock()
+ captured = append(captured, ev)
+ },
+ Options: RunOptions{RunID: "run-1"},
+ }
+
+ em := newStreamProgressEmitter(eng, "node_1", "run-1")
+ em.interval = 50 * time.Millisecond
+
+ em.appendDelta("Hello")
+ em.appendDelta(" wor")
+ em.appendDelta("ld")
+
+ // Before flush interval elapses, nothing should be emitted.
+ mu.Lock()
+ count := len(captured)
+ mu.Unlock()
+ if count != 0 {
+ t.Fatalf("expected 0 events before flush interval, got %d", count)
+ }
+
+ // Wait for the flush timer to fire.
+ time.Sleep(100 * time.Millisecond)
+
+ mu.Lock()
+ count = len(captured)
+ mu.Unlock()
+ if count != 1 {
+ t.Fatalf("expected 1 batched event after flush, got %d", count)
+ }
+
+ mu.Lock()
+ ev := captured[0]
+ mu.Unlock()
+ if ev["event"] != "llm_text_delta" {
+ t.Fatalf("expected llm_text_delta event, got %v", ev["event"])
+ }
+ if ev["delta"] != "Hello world" {
+ t.Fatalf("expected batched delta 'Hello world', got %q", ev["delta"])
+ }
+ if ev["backend"] != "api" {
+ t.Fatalf("expected backend 'api', got %v", ev["backend"])
+ }
+ em.close()
+}
+
+func TestThrottledEmitter_ShouldForceFlushOnTurnEnd(t *testing.T) {
+ var mu sync.Mutex
+ var captured []map[string]any
+ eng := &Engine{
+ progressSink: func(ev map[string]any) {
+ mu.Lock()
+ defer mu.Unlock()
+ captured = append(captured, ev)
+ },
+ Options: RunOptions{RunID: "run-1"},
+ }
+
+ em := newStreamProgressEmitter(eng, "node_1", "run-1")
+ em.interval = 5 * time.Second // Long interval to prove force-flush works.
+
+ em.appendDelta("partial response")
+ em.emitTurnEnd(16, 0)
+
+ mu.Lock()
+ defer mu.Unlock()
+ if len(captured) != 2 {
+ t.Fatalf("expected 2 events (delta + turn_end), got %d", len(captured))
+ }
+ if captured[0]["event"] != "llm_text_delta" {
+ t.Fatalf("first event should be llm_text_delta, got %v", captured[0]["event"])
+ }
+ if captured[0]["delta"] != "partial response" {
+ t.Fatalf("expected flushed delta, got %q", captured[0]["delta"])
+ }
+ if captured[1]["event"] != "llm_turn_end" {
+ t.Fatalf("second event should be llm_turn_end, got %v", captured[1]["event"])
+ }
+}
+
+func TestThrottledEmitter_ShouldFlushOnClose(t *testing.T) {
+ var mu sync.Mutex
+ var captured []map[string]any
+ eng := &Engine{
+ progressSink: func(ev map[string]any) {
+ mu.Lock()
+ defer mu.Unlock()
+ captured = append(captured, ev)
+ },
+ Options: RunOptions{RunID: "run-1"},
+ }
+
+ em := newStreamProgressEmitter(eng, "node_1", "run-1")
+ em.interval = 5 * time.Second
+
+ em.appendDelta("buffered data")
+ em.close()
+
+ mu.Lock()
+ defer mu.Unlock()
+ if len(captured) != 1 {
+ t.Fatalf("expected 1 flushed event on close, got %d", len(captured))
+ }
+ if captured[0]["delta"] != "buffered data" {
+ t.Fatalf("expected flushed delta, got %q", captured[0]["delta"])
+ }
+}
+
+func TestThrottledEmitter_ShouldEmitToolCallEventsImmediately(t *testing.T) {
+ var mu sync.Mutex
+ var captured []map[string]any
+ eng := &Engine{
+ progressSink: func(ev map[string]any) {
+ mu.Lock()
+ defer mu.Unlock()
+ captured = append(captured, ev)
+ },
+ Options: RunOptions{RunID: "run-1"},
+ }
+
+ em := newStreamProgressEmitter(eng, "node_1", "run-1")
+ em.emitToolCallStart("write_file", "call_1")
+ em.emitToolCallEnd("write_file", "call_1", false)
+
+ mu.Lock()
+ defer mu.Unlock()
+ if len(captured) != 2 {
+ t.Fatalf("expected 2 events, got %d", len(captured))
+ }
+ if captured[0]["event"] != "llm_tool_call_start" {
+ t.Fatalf("expected llm_tool_call_start, got %v", captured[0]["event"])
+ }
+ if captured[0]["tool_name"] != "write_file" {
+ t.Fatalf("expected tool_name write_file, got %v", captured[0]["tool_name"])
+ }
+ if captured[1]["event"] != "llm_tool_call_end" {
+ t.Fatalf("expected llm_tool_call_end, got %v", captured[1]["event"])
+ }
+ if captured[1]["is_error"] != false {
+ t.Fatalf("expected is_error false, got %v", captured[1]["is_error"])
+ }
+ em.close()
+}
+
+func TestEmitStreamProgress_ShouldForwardAgentSessionEvents(t *testing.T) {
+ var mu sync.Mutex
+ var captured []map[string]any
+ eng := &Engine{
+ progressSink: func(ev map[string]any) {
+ mu.Lock()
+ defer mu.Unlock()
+ captured = append(captured, ev)
+ },
+ Options: RunOptions{RunID: "run-1"},
+ }
+
+ em := newStreamProgressEmitter(eng, "node_1", "run-1")
+ em.interval = 5 * time.Second
+
+ events := []agent.SessionEvent{
+ {Kind: agent.EventAssistantTextDelta, Data: map[string]any{"delta": "Hello"}, Timestamp: time.Now()},
+ {Kind: agent.EventAssistantTextDelta, Data: map[string]any{"delta": " world"}, Timestamp: time.Now()},
+ {Kind: agent.EventToolCallStart, Data: map[string]any{"tool_name": "read_file", "call_id": "c1"}, Timestamp: time.Now()},
+ {Kind: agent.EventToolCallEnd, Data: map[string]any{"tool_name": "read_file", "call_id": "c1", "is_error": false}, Timestamp: time.Now()},
+ {Kind: agent.EventAssistantTextEnd, Data: map[string]any{"text": "Hello world"}, Timestamp: time.Now()},
+ }
+
+ for _, ev := range events {
+ emitStreamProgress(em, ev)
+ }
+
+ mu.Lock()
+ defer mu.Unlock()
+
+ var eventTypes []string
+ for _, ev := range captured {
+ eventTypes = append(eventTypes, ev["event"].(string))
+ }
+
+ has := func(eventType string) bool {
+ for _, et := range eventTypes {
+ if et == eventType {
+ return true
+ }
+ }
+ return false
+ }
+ for _, want := range []string{"llm_text_delta", "llm_tool_call_start", "llm_tool_call_end", "llm_turn_end"} {
+ if !has(want) {
+ t.Errorf("missing event type %q in %v", want, eventTypes)
+ }
+ }
+
+ // All events should have backend: "api".
+ for _, ev := range captured {
+ if ev["backend"] != "api" {
+ t.Errorf("expected backend 'api', got %v for event %v", ev["backend"], ev["event"])
+ }
+ }
+}
+
+func TestEmitCXDBToolTurns_ShouldRecordAssistantMessageOnTextEnd(t *testing.T) {
+ srv := newCXDBTestServer(t)
+ eng := newTestEngineWithCXDB(t, srv)
+
+ ev := agent.SessionEvent{
+ Kind: agent.EventAssistantTextEnd,
+ Timestamp: time.Now(),
+ Data: map[string]any{"text": "Here is the implementation of your feature."},
+ }
+ emitCXDBToolTurns(context.Background(), eng, "codegen_1", ev)
+
+ ctxIDs := srv.ContextIDs()
+ if len(ctxIDs) == 0 {
+ t.Fatal("expected at least one CXDB context")
+ }
+ turns := srv.Turns(ctxIDs[0])
+ found := false
+ for _, turn := range turns {
+ if turn["type_id"] == "com.kilroy.attractor.AssistantMessage" {
+ found = true
+ payload, _ := turn["payload"].(map[string]any)
+ if payload == nil {
+ t.Fatal("expected payload in turn")
+ }
+ if payload["node_id"] != "codegen_1" {
+ t.Fatalf("expected node_id codegen_1, got %v", payload["node_id"])
+ }
+ text, _ := payload["text"].(string)
+ if text != "Here is the implementation of your feature." {
+ t.Fatalf("expected full text, got %q", text)
+ }
+ }
+ }
+ if !found {
+ t.Fatal("expected AssistantMessage turn in CXDB")
+ }
+}
+
+func TestEmitCXDBToolTurns_EmptyAssistantTextRecordsFallback(t *testing.T) {
+ srv := newCXDBTestServer(t)
+ eng := newTestEngineWithCXDB(t, srv)
+
+ ev := agent.SessionEvent{
+ Kind: agent.EventAssistantTextEnd,
+ Timestamp: time.Now(),
+ Data: map[string]any{"text": " "},
+ }
+ emitCXDBToolTurns(context.Background(), eng, "codegen_1", ev)
+
+ ctxIDs := srv.ContextIDs()
+ if len(ctxIDs) == 0 {
+ t.Fatal("expected at least one CXDB context")
+ }
+ turns := srv.Turns(ctxIDs[0])
+ found := false
+ for _, turn := range turns {
+ if turn["type_id"] == "com.kilroy.attractor.AssistantMessage" {
+ found = true
+ payload, _ := turn["payload"].(map[string]any)
+ if text, _ := payload["text"].(string); text != "[tool_use]" {
+ t.Fatalf("expected [tool_use] fallback text, got %q", text)
+ }
+ }
+ }
+ if !found {
+ t.Fatal("expected AssistantMessage with [tool_use] fallback for empty text")
+ }
+}
diff --git a/internal/attractor/engine/cli_only_models.go b/internal/attractor/engine/cli_only_models.go
index 2feb2bff..f9721d6b 100644
--- a/internal/attractor/engine/cli_only_models.go
+++ b/internal/attractor/engine/cli_only_models.go
@@ -4,10 +4,7 @@ import "strings"
// cliOnlyModelIDs lists models that MUST route through CLI backend regardless
// of provider backend configuration. These models have no API endpoint.
-var cliOnlyModelIDs = map[string]bool{
- "gpt-5.3-codex-spark": true,
- "gpt-5.4-spark": true,
-}
+var cliOnlyModelIDs = map[string]bool{}
// isCLIOnlyModel returns true if the given model ID (with or without provider
// prefix) must be routed exclusively through the CLI backend.
diff --git a/internal/attractor/engine/cli_only_models_test.go b/internal/attractor/engine/cli_only_models_test.go
index 503144f8..2d279f28 100644
--- a/internal/attractor/engine/cli_only_models_test.go
+++ b/internal/attractor/engine/cli_only_models_test.go
@@ -7,9 +7,9 @@ func TestIsCLIOnlyModel(t *testing.T) {
model string
want bool
}{
- {"gpt-5.4-spark", true},
- {"GPT-5.4-SPARK", true}, // case-insensitive
- {"openai/gpt-5.4-spark", true}, // with provider prefix
+ {"gpt-5.4-spark", false},
+ {"GPT-5.4-SPARK", false}, // case-insensitive
+ {"openai/gpt-5.4-spark", false}, // with provider prefix
{"gpt-5.4", false}, // regular codex
{"gpt-5.4", false},
{"claude-opus-4-6", false},
@@ -21,3 +21,20 @@ func TestIsCLIOnlyModel(t *testing.T) {
}
}
}
+
+func TestIsCLIOnlyModel_UsesConfiguredRegistry(t *testing.T) {
+ orig := cliOnlyModelIDs
+ cliOnlyModelIDs = map[string]bool{
+ "test-cli-only-model": true,
+ }
+ t.Cleanup(func() {
+ cliOnlyModelIDs = orig
+ })
+
+ if got := isCLIOnlyModel("test-cli-only-model"); !got {
+ t.Fatalf("isCLIOnlyModel(test-cli-only-model) = %v, want true", got)
+ }
+ if got := isCLIOnlyModel("openai/test-cli-only-model"); !got {
+ t.Fatalf("isCLIOnlyModel(openai/test-cli-only-model) = %v, want true", got)
+ }
+}
diff --git a/internal/attractor/engine/codergen_cli_invocation_test.go b/internal/attractor/engine/codergen_cli_invocation_test.go
index 14b94abd..1c789420 100644
--- a/internal/attractor/engine/codergen_cli_invocation_test.go
+++ b/internal/attractor/engine/codergen_cli_invocation_test.go
@@ -187,6 +187,51 @@ func TestBuildCodexIsolatedEnv_ConfiguresCodexScopedOverrides(t *testing.T) {
}
}
+func TestBuildCodexIsolatedEnv_SeedsFromUserProfileWhenHomeUnset(t *testing.T) {
+ home := t.TempDir()
+ if err := os.MkdirAll(filepath.Join(home, ".codex"), 0o755); err != nil {
+ t.Fatal(err)
+ }
+ if err := os.WriteFile(filepath.Join(home, ".codex", "auth.json"), []byte(`{"token":"x"}`), 0o644); err != nil {
+ t.Fatal(err)
+ }
+ if err := os.WriteFile(filepath.Join(home, ".codex", "config.toml"), []byte(`model = "gpt-5"`), 0o644); err != nil {
+ t.Fatal(err)
+ }
+
+ t.Setenv("HOME", "")
+ t.Setenv("USERPROFILE", home)
+ t.Setenv("HOMEDRIVE", "")
+ t.Setenv("HOMEPATH", "")
+ t.Setenv("KILROY_CODEX_STATE_BASE", filepath.Join(t.TempDir(), "codex-state-base"))
+
+ stageDir := t.TempDir()
+ _, meta, err := buildCodexIsolatedEnv(stageDir, os.Environ())
+ if err != nil {
+ t.Fatalf("buildCodexIsolatedEnv: %v", err)
+ }
+
+ stateRoot := strings.TrimSpace(anyToString(meta["state_root"]))
+ assertExists(t, filepath.Join(stateRoot, "auth.json"))
+ assertExists(t, filepath.Join(stateRoot, "config.toml"))
+}
+
+func TestCodexStateBaseRoot_FallsBackToUserProfileWhenHomeUnset(t *testing.T) {
+ userProfile := t.TempDir()
+ t.Setenv("KILROY_CODEX_STATE_BASE", "")
+ t.Setenv("XDG_STATE_HOME", "")
+ t.Setenv("HOME", "")
+ t.Setenv("USERPROFILE", userProfile)
+ t.Setenv("HOMEDRIVE", "")
+ t.Setenv("HOMEPATH", "")
+
+ got := codexStateBaseRoot()
+ want := filepath.Join(userProfile, ".local", "state", "kilroy", "attractor", "codex-state")
+ if got != want {
+ t.Fatalf("codexStateBaseRoot: got %q want %q", got, want)
+ }
+}
+
func TestEnvHasKey(t *testing.T) {
env := []string{"HOME=/tmp", "PATH=/usr/bin", "CARGO_TARGET_DIR=/foo/bar"}
if !envHasKey(env, "CARGO_TARGET_DIR") {
diff --git a/internal/attractor/engine/codergen_failover_test.go b/internal/attractor/engine/codergen_failover_test.go
index c399b75d..4a54ee2e 100644
--- a/internal/attractor/engine/codergen_failover_test.go
+++ b/internal/attractor/engine/codergen_failover_test.go
@@ -31,6 +31,195 @@ func (a *okAdapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, er
return nil, fmt.Errorf("stream not implemented")
}
+type scriptedStreamAdapter struct {
+ name string
+ script func(s *llm.ChanStream)
+}
+
+func (a *scriptedStreamAdapter) Name() string { return a.name }
+func (a *scriptedStreamAdapter) Complete(ctx context.Context, req llm.Request) (llm.Response, error) {
+ _ = ctx
+ return llm.Response{Provider: a.name, Model: req.Model, Message: llm.Assistant("ok")}, nil
+}
+func (a *scriptedStreamAdapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, error) {
+ _ = ctx
+ _ = req
+ st := llm.NewChanStream(nil)
+ go func() {
+ defer st.CloseSend()
+ if a.script != nil {
+ a.script(st)
+ }
+ }()
+ return st, nil
+}
+
+func TestCodergenRouter_RunAPI_OneShot_StreamErrorEventTakesPrecedence(t *testing.T) {
+ cfg := &RunConfigFile{Version: 1}
+ cfg.LLM.Providers = map[string]ProviderConfig{
+ "openai": {Backend: BackendAPI},
+ }
+ r := NewCodergenRouterWithRuntimes(cfg, nil, map[string]ProviderRuntime{
+ "openai": {Key: "openai", Backend: BackendAPI},
+ })
+ r.apiClientFactory = func(map[string]ProviderRuntime) (*llm.Client, error) {
+ client := llm.NewClient()
+ client.Register(&scriptedStreamAdapter{
+ name: "openai",
+ script: func(s *llm.ChanStream) {
+ s.Send(llm.StreamEvent{Type: llm.StreamEventStreamStart})
+ s.Send(llm.StreamEvent{Type: llm.StreamEventError, Err: llm.NewStreamError("openai", "synthetic stream failure")})
+ resp := llm.Response{Provider: "openai", Model: "gpt-5.2", Message: llm.Assistant("finish should not win")}
+ finish := llm.FinishReason{Reason: "stop"}
+ usage := llm.Usage{InputTokens: 1, OutputTokens: 2, TotalTokens: 3}
+ s.Send(llm.StreamEvent{Type: llm.StreamEventFinish, FinishReason: &finish, Usage: &usage, Response: &resp})
+ },
+ })
+ return client, nil
+ }
+
+ execCtx := &Execution{
+ LogsRoot: t.TempDir(),
+ WorktreeDir: t.TempDir(),
+ }
+ node := &model.Node{
+ ID: "stage-a",
+ Attrs: map[string]string{"codergen_mode": "one_shot"},
+ }
+
+ ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+ defer cancel()
+
+ text, out, err := r.runAPI(ctx, execCtx, node, "openai", "gpt-5.2", "say hi")
+ if err == nil || !strings.Contains(err.Error(), "synthetic stream failure") {
+ t.Fatalf("expected stream failure, got err=%v", err)
+ }
+ if strings.TrimSpace(text) != "" {
+ t.Fatalf("text: got %q want empty on stream error", text)
+ }
+ if out != nil {
+ t.Fatalf("outcome: got %+v want nil", out)
+ }
+}
+
+func TestCodergenRouter_RunAPI_OneShot_EmitsProviderToolLifecycleProgress(t *testing.T) {
+ cfg := &RunConfigFile{Version: 1}
+ cfg.LLM.Providers = map[string]ProviderConfig{
+ "openai": {Backend: BackendAPI},
+ }
+ r := NewCodergenRouterWithRuntimes(cfg, nil, map[string]ProviderRuntime{
+ "openai": {Key: "openai", Backend: BackendAPI},
+ })
+ r.apiClientFactory = func(map[string]ProviderRuntime) (*llm.Client, error) {
+ client := llm.NewClient()
+ client.Register(&scriptedStreamAdapter{
+ name: "openai",
+ script: func(s *llm.ChanStream) {
+ s.Send(llm.StreamEvent{Type: llm.StreamEventStreamStart, ID: "turn_1", Model: "gpt-5.2"})
+ s.Send(llm.StreamEvent{
+ Type: llm.StreamEventProviderEvent,
+ EventType: "item/started",
+ Raw: map[string]any{
+ "item": map[string]any{
+ "id": "cmd_1",
+ "type": "commandExecution",
+ "command": "pwd",
+ "cwd": "/tmp/worktree",
+ "status": "inProgress",
+ },
+ },
+ })
+ s.Send(llm.StreamEvent{Type: llm.StreamEventTextDelta, TextID: "text_0", Delta: "ok"})
+ s.Send(llm.StreamEvent{
+ Type: llm.StreamEventProviderEvent,
+ EventType: "item/completed",
+ Raw: map[string]any{
+ "item": map[string]any{
+ "id": "cmd_1",
+ "type": "commandExecution",
+ "status": "completed",
+ },
+ },
+ })
+ resp := llm.Response{
+ Provider: "openai",
+ Model: "gpt-5.2",
+ Message: llm.Assistant("ok"),
+ Finish: llm.FinishReason{Reason: llm.FinishReasonStop},
+ }
+ s.Send(llm.StreamEvent{Type: llm.StreamEventFinish, FinishReason: &resp.Finish, Response: &resp})
+ },
+ })
+ return client, nil
+ }
+
+ var mu sync.Mutex
+ captured := make([]map[string]any, 0, 8)
+ eng := &Engine{
+ Options: RunOptions{RunID: "run-provider-progress"},
+ progressSink: func(ev map[string]any) {
+ mu.Lock()
+ defer mu.Unlock()
+ captured = append(captured, ev)
+ },
+ }
+ execCtx := &Execution{
+ LogsRoot: t.TempDir(),
+ WorktreeDir: t.TempDir(),
+ Engine: eng,
+ }
+ node := &model.Node{
+ ID: "stage-a",
+ Attrs: map[string]string{"codergen_mode": "one_shot"},
+ }
+
+ ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+ defer cancel()
+
+ text, out, err := r.runAPI(ctx, execCtx, node, "openai", "gpt-5.2", "say hi")
+ if err != nil {
+ t.Fatalf("runAPI: %v", err)
+ }
+ if out != nil {
+ t.Fatalf("outcome: got %+v want nil", out)
+ }
+ if strings.TrimSpace(text) != "ok" {
+ t.Fatalf("text: got %q want %q", text, "ok")
+ }
+
+ mu.Lock()
+ defer mu.Unlock()
+ seenStart := false
+ seenEnd := false
+ seenTurnEndWithCount := false
+ for _, ev := range captured {
+ eventName, _ := ev["event"].(string)
+ switch eventName {
+ case "llm_tool_call_start":
+ if ev["tool_name"] == "exec_command" && ev["call_id"] == "cmd_1" {
+ seenStart = true
+ }
+ case "llm_tool_call_end":
+ if ev["tool_name"] == "exec_command" && ev["call_id"] == "cmd_1" {
+ if isErr, ok := ev["is_error"].(bool); !ok || isErr {
+ t.Fatalf("expected successful tool completion, got is_error=%v", ev["is_error"])
+ }
+ seenEnd = true
+ }
+ case "llm_turn_end":
+ if fmt.Sprint(ev["tool_call_count"]) == "1" {
+ seenTurnEndWithCount = true
+ }
+ }
+ }
+ if !seenStart || !seenEnd {
+ t.Fatalf("missing provider tool lifecycle progress events: start=%t end=%t captured=%v", seenStart, seenEnd, captured)
+ }
+ if !seenTurnEndWithCount {
+ t.Fatalf("missing llm_turn_end with tool_call_count=1; captured=%v", captured)
+ }
+}
+
func TestCodergenRouter_WithFailoverText_FailsOverToDifferentProvider(t *testing.T) {
cfg := &RunConfigFile{Version: 1}
cfg.LLM.Providers = map[string]ProviderConfig{
@@ -392,3 +581,72 @@ func TestShouldFailoverLLMError_GetwdBootstrapErrorDoesNotFailover(t *testing.T)
t.Fatalf("getwd bootstrap errors should not trigger failover")
}
}
+
+func TestAgentLoopProviderOptions_CodexAppServer_UsesFullAutonomousPermissions(t *testing.T) {
+ got := agentLoopProviderOptions("codex_app_server", "/tmp/worktree")
+ if len(got) != 1 {
+ t.Fatalf("provider options length=%d want 1", len(got))
+ }
+ raw, ok := got["codex_app_server"]
+ if !ok {
+ t.Fatalf("missing codex_app_server provider options: %#v", got)
+ }
+ opts, ok := raw.(map[string]any)
+ if !ok {
+ t.Fatalf("codex_app_server options type=%T want map[string]any", raw)
+ }
+ if gotCwd := fmt.Sprint(opts["cwd"]); gotCwd != "/tmp/worktree" {
+ t.Fatalf("cwd=%q want %q", gotCwd, "/tmp/worktree")
+ }
+ if gotApproval := fmt.Sprint(opts["approvalPolicy"]); gotApproval != "never" {
+ t.Fatalf("approvalPolicy=%q want %q", gotApproval, "never")
+ }
+ if gotSandbox := fmt.Sprint(opts["sandbox"]); gotSandbox != "danger-full-access" {
+ t.Fatalf("sandbox=%q want %q", gotSandbox, "danger-full-access")
+ }
+ rawSandboxPolicy, ok := opts["sandboxPolicy"]
+ if !ok {
+ t.Fatalf("missing sandboxPolicy in codex options: %#v", opts)
+ }
+ sandboxPolicy, ok := rawSandboxPolicy.(map[string]any)
+ if !ok {
+ t.Fatalf("sandboxPolicy type=%T want map[string]any", rawSandboxPolicy)
+ }
+ if gotType := fmt.Sprint(sandboxPolicy["type"]); gotType != "dangerFullAccess" {
+ t.Fatalf("sandboxPolicy.type=%q want %q", gotType, "dangerFullAccess")
+ }
+}
+
+func TestAgentLoopProviderOptions_Cerebras_PreservesReasoningHistory(t *testing.T) {
+ got := agentLoopProviderOptions("cerebras", "")
+ raw, ok := got["cerebras"]
+ if !ok {
+ t.Fatalf("missing cerebras provider options: %#v", got)
+ }
+ opts, ok := raw.(map[string]any)
+ if !ok {
+ t.Fatalf("cerebras options type=%T want map[string]any", raw)
+ }
+ clearThinking, ok := opts["clear_thinking"].(bool)
+ if !ok {
+ t.Fatalf("clear_thinking type=%T want bool", opts["clear_thinking"])
+ }
+ if clearThinking {
+ t.Fatalf("clear_thinking=%v want false", clearThinking)
+ }
+}
+
+func TestAgentLoopProviderOptions_CodexAppServer_OmitsCwdWhenWorktreeEmpty(t *testing.T) {
+ got := agentLoopProviderOptions("codex-app-server", "")
+ raw, ok := got["codex_app_server"]
+ if !ok {
+ t.Fatalf("missing codex_app_server provider options: %#v", got)
+ }
+ opts, ok := raw.(map[string]any)
+ if !ok {
+ t.Fatalf("codex_app_server options type=%T want map[string]any", raw)
+ }
+ if _, exists := opts["cwd"]; exists {
+ t.Fatalf("expected cwd to be omitted when worktreeDir is empty: %#v", opts["cwd"])
+ }
+}
diff --git a/internal/attractor/engine/codergen_heartbeat_test.go b/internal/attractor/engine/codergen_heartbeat_test.go
index 531b1a62..7ed57516 100644
--- a/internal/attractor/engine/codergen_heartbeat_test.go
+++ b/internal/attractor/engine/codergen_heartbeat_test.go
@@ -3,6 +3,8 @@ package engine
import (
"context"
"encoding/json"
+ "fmt"
+ "io"
"net/http"
"net/http/httptest"
"os"
@@ -53,7 +55,7 @@ digraph G {
graph [goal="test heartbeat"]
start [shape=Mdiamond]
exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, prompt="say hi"]
+ a [shape=box, llm_provider=openai, llm_model=gpt-5.2, prompt="say hi"]
start -> a -> exit
}
`)
@@ -138,27 +140,26 @@ func TestRunWithConfig_APIBackend_HeartbeatEmitsDuringAgentLoop(t *testing.T) {
n := requestCount
reqMu.Unlock()
+ toolCallResp := map[string]any{
+ "id": "resp_1", "model": "gpt-5.2",
+ "output": []any{map[string]any{"type": "function_call", "id": "call_1", "name": "shell", "arguments": `{"command":"sleep 1"}`}},
+ "usage": map[string]any{"input_tokens": 1, "output_tokens": 2, "total_tokens": 3},
+ }
+ textResp := map[string]any{
+ "id": "resp_2", "model": "gpt-5.2",
+ "output": []any{map[string]any{"type": "message", "content": []any{map[string]any{"type": "output_text", "text": "done"}}}},
+ "usage": map[string]any{"input_tokens": 1, "output_tokens": 2, "total_tokens": 3},
+ }
+
// First request: return a shell tool call that sleeps briefly.
if n == 1 {
- w.Header().Set("Content-Type", "application/json")
- _, _ = w.Write([]byte(`{
- "id": "resp_1",
- "model": "gpt-5.4",
- "output": [{"type":"function_call","id":"call_1","name":"shell","arguments":"{\"command\":\"sleep 1\"}"}],
- "usage": {"input_tokens": 1, "output_tokens": 2, "total_tokens": 3}
-}`))
+ writeOpenAIResponseAuto(w, r, toolCallResp)
return
}
// Second request onward: simulate API thinking time so the heartbeat
// goroutine fires at least once before the session completes.
time.Sleep(500 * time.Millisecond)
- w.Header().Set("Content-Type", "application/json")
- _, _ = w.Write([]byte(`{
- "id": "resp_2",
- "model": "gpt-5.4",
- "output": [{"type":"message","content":[{"type":"output_text","text":"done"}]}],
- "usage": {"input_tokens": 1, "output_tokens": 2, "total_tokens": 3}
-}`))
+ writeOpenAIResponseAuto(w, r, textResp)
}))
t.Cleanup(openaiSrv.Close)
@@ -182,7 +183,7 @@ digraph G {
graph [goal="test api heartbeat"]
start [shape=Mdiamond]
exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, auto_status=true, prompt="run a command"]
+ a [shape=box, llm_provider=openai, llm_model=gpt-5.2, auto_status=true, prompt="run a command"]
start -> a -> exit
}
`)
@@ -234,47 +235,68 @@ func TestRunWithConfig_APIBackend_SessionEventsPreventFalseStallWithoutHeartbeat
cxdbSrv := newCXDBTestServer(t)
requestCount := 0
+ const toolRounds = 8
var reqMu sync.Mutex
openaiSrv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost || r.URL.Path != "/v1/responses" {
w.WriteHeader(http.StatusNotFound)
return
}
+ b, _ := io.ReadAll(r.Body)
+ _ = r.Body.Close()
+ var reqBody map[string]any
+ _ = json.Unmarshal(b, &reqBody)
+ streaming, _ := reqBody["stream"].(bool)
+
reqMu.Lock()
requestCount++
n := requestCount
reqMu.Unlock()
- w.Header().Set("Content-Type", "application/json")
- switch n {
- case 1:
- _, _ = w.Write([]byte(`{
- "id": "resp_1",
- "model": "gpt-5.4",
- "output": [{"type":"function_call","id":"call_1","name":"shell","arguments":"{\"command\":\"sleep 0.3\"}"}],
- "usage": {"input_tokens": 1, "output_tokens": 2, "total_tokens": 3}
-}`))
- case 2:
- _, _ = w.Write([]byte(`{
- "id": "resp_2",
- "model": "gpt-5.4",
- "output": [{"type":"function_call","id":"call_2","name":"shell","arguments":"{\"command\":\"sleep 0.3\"}"}],
- "usage": {"input_tokens": 1, "output_tokens": 2, "total_tokens": 3}
-}`))
- case 3:
- _, _ = w.Write([]byte(`{
- "id": "resp_3",
- "model": "gpt-5.4",
- "output": [{"type":"function_call","id":"call_3","name":"shell","arguments":"{\"command\":\"sleep 0.3\"}"}],
- "usage": {"input_tokens": 1, "output_tokens": 2, "total_tokens": 3}
-}`))
- default:
- _, _ = w.Write([]byte(`{
- "id": "resp_4",
- "model": "gpt-5.4",
- "output": [{"type":"message","content":[{"type":"output_text","text":"done"}]}],
- "usage": {"input_tokens": 1, "output_tokens": 2, "total_tokens": 3}
-}`))
+ var responseObj map[string]any
+ if n <= toolRounds {
+ responseObj = map[string]any{
+ "id": fmt.Sprintf("resp_%d", n),
+ "model": "gpt-5.2",
+ "output": []any{map[string]any{
+ "type": "function_call", "id": fmt.Sprintf("call_%d", n),
+ "name": "shell", "arguments": `{"command":"sleep 0.3"}`,
+ }},
+ "usage": map[string]any{"input_tokens": 1, "output_tokens": 2, "total_tokens": 3},
+ }
+ } else {
+ responseObj = map[string]any{
+ "id": "resp_final",
+ "model": "gpt-5.2",
+ "output": []any{map[string]any{
+ "type": "message", "content": []any{map[string]any{"type": "output_text", "text": "done"}},
+ }},
+ "usage": map[string]any{"input_tokens": 1, "output_tokens": 2, "total_tokens": 3},
+ }
+ }
+
+ if !streaming {
+ w.Header().Set("Content-Type", "application/json")
+ _ = json.NewEncoder(w).Encode(responseObj)
+ return
+ }
+
+ w.Header().Set("Content-Type", "text/event-stream")
+ w.Header().Set("Cache-Control", "no-cache")
+ completedPayload, _ := json.Marshal(map[string]any{
+ "type": "response.completed",
+ "response": responseObj,
+ })
+ lines := []string{
+ "event: response.completed",
+ "data: " + string(completedPayload),
+ "",
+ }
+ for _, line := range lines {
+ _, _ = w.Write([]byte(line + "\n"))
+ }
+ if f, ok := w.(http.Flusher); ok {
+ f.Flush()
}
}))
t.Cleanup(openaiSrv.Close)
@@ -285,8 +307,8 @@ func TestRunWithConfig_APIBackend_SessionEventsPreventFalseStallWithoutHeartbeat
// watchdog behavior in this test.
t.Setenv("KILROY_CODERGEN_HEARTBEAT_INTERVAL", "10s")
- stallTimeout := 700
- stallCheck := 50
+ stallTimeout := 1500
+ stallCheck := 100
disableProbe := false
cfg := &RunConfigFile{Version: 1}
cfg.Repo.Path = repo
@@ -307,17 +329,21 @@ digraph G {
graph [goal="session events should prevent false stall timeout"]
start [shape=Mdiamond]
exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, auto_status=true, prompt="run a few commands"]
+ a [shape=box, llm_provider=openai, llm_model=gpt-5.2, auto_status=true, prompt="run a few commands"]
start -> a -> exit
}
`)
ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
defer cancel()
+ runStart := time.Now()
res, err := RunWithConfig(ctx, dot, cfg, RunOptions{RunID: "api-session-activity-test", LogsRoot: logsRoot})
if err != nil {
t.Fatalf("RunWithConfig: %v", err)
}
+ if elapsed := time.Since(runStart); elapsed < time.Duration(stallTimeout)*time.Millisecond {
+ t.Fatalf("test runtime %s was shorter than stall timeout %dms; watchdog window not exercised", elapsed, stallTimeout)
+ }
progressPath := filepath.Join(res.LogsRoot, "progress.ndjson")
data, err := os.ReadFile(progressPath)
@@ -350,9 +376,7 @@ digraph G {
// TestRunWithConfig_APIBackend_StallWatchdogFiresDespiteHeartbeatGoroutine verifies
// that the stall watchdog still fires when the API agent_loop session is truly
// stalled (no new session events) even though the heartbeat goroutine is running.
-// Heartbeat events are emitted unconditionally for observability but use
-// appendProgressLivenessOnly when no new events are produced, which does not
-// reset the stall watchdog timer.
+// The conditional heartbeat should NOT emit progress when event_count is static.
func TestRunWithConfig_APIBackend_StallWatchdogFiresDespiteHeartbeatGoroutine(t *testing.T) {
repo := initTestRepo(t)
logsRoot := t.TempDir()
@@ -399,7 +423,7 @@ digraph G {
graph [goal="test stall detection with api heartbeat"]
start [shape=Mdiamond]
exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, auto_status=true, prompt="do something"]
+ a [shape=box, llm_provider=openai, llm_model=gpt-5.2, auto_status=true, prompt="do something"]
start -> a -> exit
}
`)
@@ -419,9 +443,7 @@ digraph G {
// TestRunWithConfig_CLIBackend_StallWatchdogFiresDespiteHeartbeatGoroutine verifies
// that the stall watchdog still fires when the CLI codergen process is truly
// stalled (no stdout/stderr output) even though the heartbeat goroutine is running.
-// Heartbeat events are emitted unconditionally for observability but use
-// appendProgressLivenessOnly when no output growth is detected, which does not
-// reset the stall watchdog timer.
+// The conditional heartbeat should NOT emit progress when file sizes are static.
func TestRunWithConfig_CLIBackend_StallWatchdogFiresDespiteHeartbeatGoroutine(t *testing.T) {
repo := initTestRepo(t)
logsRoot := t.TempDir()
@@ -459,7 +481,7 @@ digraph G {
graph [goal="test cli stall detection with heartbeat"]
start [shape=Mdiamond]
exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, prompt="do something"]
+ a [shape=box, llm_provider=openai, llm_model=gpt-5.2, prompt="do something"]
start -> a -> exit
}
`)
@@ -476,95 +498,6 @@ digraph G {
t.Logf("stall watchdog fired as expected: %v", err)
}
-func TestRunWithConfig_HeartbeatEmitsDuringQuietPeriods(t *testing.T) {
- repo := initTestRepo(t)
- logsRoot := t.TempDir()
-
- pinned := writePinnedCatalog(t)
- cxdbSrv := newCXDBTestServer(t)
-
- // Create a mock codex CLI that produces initial output, then goes quiet,
- // then produces more output. The quiet period should still produce heartbeats.
- cli := filepath.Join(t.TempDir(), "codex")
- if err := os.WriteFile(cli, []byte(`#!/usr/bin/env bash
-set -euo pipefail
-echo '{"item":{"type":"message","role":"assistant","content":[{"type":"output_text","text":"starting"}]}}' >&1
-# Quiet period: no output for 3 seconds.
-sleep 3
-echo '{"item":{"type":"message","role":"assistant","content":[{"type":"output_text","text":"done"}]}}' >&1
-`), 0o755); err != nil {
- t.Fatal(err)
- }
-
- t.Setenv("KILROY_CODERGEN_HEARTBEAT_INTERVAL", "1s")
- t.Setenv("KILROY_CODEX_IDLE_TIMEOUT", "10s")
-
- cfg := &RunConfigFile{Version: 1}
- cfg.Repo.Path = repo
- cfg.CXDB.BinaryAddr = cxdbSrv.BinaryAddr()
- cfg.CXDB.HTTPBaseURL = cxdbSrv.URL()
- cfg.LLM.CLIProfile = "test_shim"
- cfg.LLM.Providers = map[string]ProviderConfig{
- "openai": {Backend: BackendCLI, Executable: cli},
- }
- cfg.ModelDB.OpenRouterModelInfoPath = pinned
- cfg.ModelDB.OpenRouterModelInfoUpdatePolicy = "pinned"
- cfg.Git.RunBranchPrefix = "attractor/run"
-
- dot := []byte(`
-digraph G {
- graph [goal="test quiet period heartbeats"]
- start [shape=Mdiamond]
- exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, prompt="do something quiet"]
- start -> a -> exit
-}
-`)
-
- ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
- defer cancel()
- res, err := RunWithConfig(ctx, dot, cfg, RunOptions{RunID: "quiet-heartbeat-test", LogsRoot: logsRoot, AllowTestShim: true})
- if err != nil {
- t.Fatalf("RunWithConfig: %v", err)
- }
-
- progressPath := filepath.Join(res.LogsRoot, "progress.ndjson")
- data, err := os.ReadFile(progressPath)
- if err != nil {
- t.Fatalf("read progress.ndjson: %v", err)
- }
-
- heartbeats := 0
- var hasQuietHeartbeat bool
- for _, line := range strings.Split(string(data), "\n") {
- line = strings.TrimSpace(line)
- if line == "" {
- continue
- }
- var ev map[string]any
- if err := json.Unmarshal([]byte(line), &ev); err != nil {
- continue
- }
- if ev["event"] == "stage_heartbeat" && ev["node_id"] == "a" {
- heartbeats++
- if _, ok := ev["since_last_output_s"]; !ok {
- t.Error("heartbeat missing since_last_output_s field")
- }
- sinceOutput, _ := ev["since_last_output_s"].(float64)
- if sinceOutput >= 1 {
- hasQuietHeartbeat = true
- }
- }
- }
- if heartbeats < 2 {
- t.Fatalf("expected at least 2 heartbeat events (some during quiet period), got %d", heartbeats)
- }
- if !hasQuietHeartbeat {
- t.Fatal("expected at least one heartbeat with since_last_output_s >= 1 (quiet period heartbeat)")
- }
- t.Logf("found %d heartbeat events, quiet period heartbeats present", heartbeats)
-}
-
func TestRunWithConfig_HeartbeatStopsAfterProcessExit(t *testing.T) {
events := runHeartbeatFixture(t)
endIdx := findEventIndex(events, "stage_attempt_end", "a")
diff --git a/internal/attractor/engine/codergen_router.go b/internal/attractor/engine/codergen_router.go
index 4b817e46..929ec4ff 100644
--- a/internal/attractor/engine/codergen_router.go
+++ b/internal/attractor/engine/codergen_router.go
@@ -94,8 +94,8 @@ func (r *CodergenRouter) Run(ctx context.Context, exec *Execution, node *model.N
return "", nil, fmt.Errorf("no backend configured for provider %s", prov)
}
- // CLI-only model override: models like gpt-5.4-spark have no API
- // endpoint. Force CLI backend regardless of provider configuration.
+ // CLI-only model override: force CLI backend when a model is marked
+ // CLI-only in the registry.
if isCLIOnlyModel(modelID) && backend != BackendCLI {
warnEngine(exec, fmt.Sprintf("cli-only model override: node=%s model=%s backend=%s->cli", node.ID, modelID, backend))
backend = BackendCLI
@@ -192,15 +192,105 @@ func (r *CodergenRouter) runAPI(ctx context.Context, execCtx *Execution, node *m
warnEngine(execCtx, fmt.Sprintf("write api_request.json: %v", err))
}
policy := attractorLLMRetryPolicy(execCtx, node.ID, prov, mid)
- resp, err := llm.Retry(ctx, policy, nil, nil, func() (llm.Response, error) {
- return client.Complete(ctx, req)
+ stream, err := llm.Retry(ctx, policy, nil, nil, func() (llm.Stream, error) {
+ return client.Stream(ctx, req)
})
if err != nil {
return "", err
}
+
+ var emitter *streamProgressEmitter
+ if execCtx != nil && execCtx.Engine != nil {
+ runID := execCtx.Engine.Options.RunID
+ emitter = newStreamProgressEmitter(execCtx.Engine, node.ID, runID)
+ defer emitter.close()
+ }
+
+ acc := llm.NewStreamAccumulator()
+ var streamErr error
+ toolCallCount := 0
+ seenToolCallIDs := map[string]struct{}{}
+ recordToolCallStart := func(callID string) {
+ callID = strings.TrimSpace(callID)
+ if callID == "" {
+ toolCallCount++
+ return
+ }
+ if _, exists := seenToolCallIDs[callID]; exists {
+ return
+ }
+ seenToolCallIDs[callID] = struct{}{}
+ toolCallCount++
+ }
+ for ev := range stream.Events() {
+ acc.Process(ev)
+ if ev.Type == llm.StreamEventError && ev.Err != nil {
+ streamErr = ev.Err
+ break
+ }
+ if emitter != nil {
+ switch ev.Type {
+ case llm.StreamEventTextDelta:
+ if ev.Delta != "" {
+ emitter.appendDelta(ev.Delta)
+ }
+ case llm.StreamEventToolCallStart:
+ if ev.ToolCall != nil {
+ recordToolCallStart(ev.ToolCall.ID)
+ emitter.emitToolCallStart(ev.ToolCall.Name, ev.ToolCall.ID)
+ }
+ case llm.StreamEventToolCallEnd:
+ if ev.ToolCall != nil {
+ emitter.emitToolCallEnd(ev.ToolCall.Name, ev.ToolCall.ID, false)
+ }
+ case llm.StreamEventProviderEvent:
+ if lifecycle, ok := llm.ParseCodexAppServerToolLifecycle(ev); ok {
+ if lifecycle.Completed {
+ emitter.emitToolCallEnd(lifecycle.ToolName, lifecycle.CallID, lifecycle.IsError)
+ } else {
+ recordToolCallStart(lifecycle.CallID)
+ emitter.emitToolCallStart(lifecycle.ToolName, lifecycle.CallID)
+ }
+ }
+ }
+ }
+ }
+ _ = stream.Close()
+ if streamErr != nil {
+ return "", streamErr
+ }
+
+ resp := acc.Response()
+ if resp == nil {
+ return "", llm.NewStreamError(prov, "stream ended without finish event")
+ }
+
if err := writeJSON(filepath.Join(stageDir, "api_response.json"), resp.Raw); err != nil {
warnEngine(execCtx, fmt.Sprintf("write api_response.json: %v", err))
}
+
+ // WP-5: Record AssistantMessage CXDB turn for one_shot.
+ if execCtx != nil && execCtx.Engine != nil && execCtx.Engine.CXDB != nil {
+ eng := execCtx.Engine
+ text := resp.Text()
+ if strings.TrimSpace(text) != "" {
+ if _, _, cxErr := eng.CXDB.Append(ctx, "com.kilroy.attractor.AssistantMessage", 1, map[string]any{
+ "run_id": eng.Options.RunID,
+ "node_id": node.ID,
+ "text": truncate(text, 8_000),
+ "input_tokens": resp.Usage.InputTokens,
+ "output_tokens": resp.Usage.OutputTokens,
+ "timestamp_ms": nowMS(),
+ }); cxErr != nil {
+ eng.Warn(fmt.Sprintf("cxdb append AssistantMessage failed (node=%s): %v", node.ID, cxErr))
+ }
+ }
+ }
+
+ if emitter != nil {
+ emitter.emitTurnEnd(len(resp.Text()), toolCallCount)
+ }
+
return resp.Text(), nil
})
if err != nil {
@@ -238,15 +328,24 @@ func (r *CodergenRouter) runAPI(ctx context.Context, execCtx *Execution, node *m
if reasoning != "" {
sessCfg.ReasoningEffort = reasoning
}
- if maxTokensPtr != nil {
- sessCfg.MaxTokens = maxTokensPtr
+ if providerOptions := agentLoopProviderOptions(prov, execCtx.WorktreeDir); len(providerOptions) > 0 {
+ sessCfg.ProviderOptions = providerOptions
}
// Cerebras GLM 4.7: preserve reasoning across agent-loop turns.
// clear_thinking defaults to true on the API, which strips prior
// reasoning context — counterproductive for multi-step agentic work.
if normalizeProviderKey(prov) == "cerebras" {
- sessCfg.ProviderOptions = map[string]any{
- "cerebras": map[string]any{"clear_thinking": false},
+ if sessCfg.ProviderOptions == nil {
+ sessCfg.ProviderOptions = map[string]any{}
+ }
+ if existing, ok := sessCfg.ProviderOptions["cerebras"]; ok {
+ if cerebrasOpts, ok := existing.(map[string]any); ok {
+ cerebrasOpts["clear_thinking"] = false
+ } else {
+ sessCfg.ProviderOptions["cerebras"] = map[string]any{"clear_thinking": false}
+ }
+ } else {
+ sessCfg.ProviderOptions["cerebras"] = map[string]any{"clear_thinking": false}
}
}
if v := parseInt(node.Attr("max_agent_turns", ""), 0); v > 0 {
@@ -286,7 +385,16 @@ func (r *CodergenRouter) runAPI(ctx context.Context, execCtx *Execution, node *m
var eventsMu sync.Mutex
var events []agent.SessionEvent
done := make(chan struct{})
+ // Streaming progress emitter: throttles text deltas, forwards tool events.
+ var emitter *streamProgressEmitter
+ if execCtx != nil && execCtx.Engine != nil {
+ emitter = newStreamProgressEmitter(execCtx.Engine, node.ID, execCtx.Engine.Options.RunID)
+ }
+
go func() {
+ if emitter != nil {
+ defer emitter.close()
+ }
enc := json.NewEncoder(eventsFile)
encodeFailed := false
for ev := range sess.Events() {
@@ -307,6 +415,10 @@ func (r *CodergenRouter) runAPI(ctx context.Context, execCtx *Execution, node *m
if execCtx != nil && execCtx.Engine != nil {
executeToolHookForEvent(ctx, execCtx, node, ev, stageDir)
}
+ // Forward LLM-level events as streaming progress.
+ if emitter != nil {
+ emitStreamProgress(emitter, ev)
+ }
eventsMu.Lock()
events = append(events, ev)
eventsMu.Unlock()
@@ -392,6 +504,36 @@ func (r *CodergenRouter) runAPI(ctx context.Context, execCtx *Execution, node *m
}
}
+func agentLoopProviderOptions(provider string, worktreeDir string) map[string]any {
+ switch normalizeProviderKey(provider) {
+ case "cerebras":
+ // Cerebras GLM 4.7: preserve reasoning across agent-loop turns.
+ // clear_thinking defaults to true on the API, which strips prior
+ // reasoning context, this is counterproductive for multi-step agentic work.
+ return map[string]any{
+ "cerebras": map[string]any{"clear_thinking": false},
+ }
+ case "codex-app-server":
+ opts := map[string]any{
+ "approvalPolicy": "never",
+ // Keep both thread-level and turn-level sandbox knobs set.
+ // App-server surfaces use different fields/casing across thread/start and turn/start.
+ "sandbox": "danger-full-access",
+ "sandboxPolicy": map[string]any{
+ "type": "dangerFullAccess",
+ },
+ }
+ if wt := strings.TrimSpace(worktreeDir); wt != "" {
+ opts["cwd"] = wt
+ }
+ return map[string]any{
+ "codex_app_server": opts,
+ }
+ default:
+ return nil
+ }
+}
+
type providerModel struct {
Provider string
Model string
@@ -1464,9 +1606,8 @@ func buildCodexIsolatedEnvWithName(stageDir string, homeDirName string, baseEnv
seeded := []string{}
seedErrors := []string{}
// Seed codex config from the ORIGINAL home (before isolation).
- // Use os.Getenv("HOME") since baseEnv may already have HOME pinned
- // to the original value by buildBaseNodeEnv.
- srcHome := strings.TrimSpace(os.Getenv("HOME"))
+ // Prefer HOME, then Windows home vars, then os.UserHomeDir().
+ srcHome := codexSourceHome(baseEnv)
if srcHome != "" {
for _, name := range []string{"auth.json", "config.toml"} {
src := filepath.Join(srcHome, ".codex", name)
@@ -1526,7 +1667,7 @@ func codexStateBaseRoot() string {
}
base := strings.TrimSpace(os.Getenv("XDG_STATE_HOME"))
if base == "" {
- home := strings.TrimSpace(os.Getenv("HOME"))
+ home := codexSourceHome(nil)
if home == "" {
base = "."
} else {
@@ -1540,6 +1681,46 @@ func codexStateBaseRoot() string {
return root
}
+func codexSourceHome(baseEnv []string) string {
+ candidates := []string{
+ envSliceValue(baseEnv, "HOME"),
+ os.Getenv("HOME"),
+ envSliceValue(baseEnv, "USERPROFILE"),
+ os.Getenv("USERPROFILE"),
+ windowsHomeFromParts(envSliceValue(baseEnv, "HOMEDRIVE"), envSliceValue(baseEnv, "HOMEPATH")),
+ windowsHomeFromParts(os.Getenv("HOMEDRIVE"), os.Getenv("HOMEPATH")),
+ }
+ for _, candidate := range candidates {
+ if home := strings.TrimSpace(candidate); home != "" {
+ return home
+ }
+ }
+ home, err := os.UserHomeDir()
+ if err != nil {
+ return ""
+ }
+ return strings.TrimSpace(home)
+}
+
+func envSliceValue(env []string, key string) string {
+ prefix := key + "="
+ for _, entry := range env {
+ if strings.HasPrefix(entry, prefix) {
+ return strings.TrimSpace(strings.TrimPrefix(entry, prefix))
+ }
+ }
+ return ""
+}
+
+func windowsHomeFromParts(homeDrive string, homePath string) string {
+ drive := strings.TrimSpace(homeDrive)
+ path := strings.TrimSpace(homePath)
+ if drive == "" || path == "" {
+ return ""
+ }
+ return filepath.Clean(drive + path)
+}
+
func copyIfExists(src string, dst string) (bool, error) {
info, err := os.Stat(src)
if err != nil {
@@ -1854,6 +2035,34 @@ func fileSize(path string) (int64, error) {
return info.Size(), nil
}
+func emitStreamProgress(emitter *streamProgressEmitter, ev agent.SessionEvent) {
+ if emitter == nil {
+ return
+ }
+ switch ev.Kind {
+ case agent.EventAssistantTextDelta:
+ if delta, _ := ev.Data["delta"].(string); delta != "" {
+ emitter.appendDelta(delta)
+ }
+ case agent.EventToolCallStart:
+ toolName, _ := ev.Data["tool_name"].(string)
+ callID, _ := ev.Data["call_id"].(string)
+ emitter.emitToolCallStart(toolName, callID)
+ case agent.EventToolCallEnd:
+ toolName, _ := ev.Data["tool_name"].(string)
+ callID, _ := ev.Data["call_id"].(string)
+ isErr, _ := ev.Data["is_error"].(bool)
+ emitter.emitToolCallEnd(toolName, callID, isErr)
+ case agent.EventAssistantTextEnd:
+ text, _ := ev.Data["text"].(string)
+ toolCount := 0
+ if tc, ok := ev.Data["tool_call_count"].(int); ok {
+ toolCount = tc
+ }
+ emitter.emitTurnEnd(len(text), toolCount)
+ }
+}
+
func emitCXDBToolTurns(ctx context.Context, eng *Engine, nodeID string, ev agent.SessionEvent) {
if eng == nil || eng.CXDB == nil {
return
diff --git a/internal/attractor/engine/codergen_router_cli_only_test.go b/internal/attractor/engine/codergen_router_cli_only_test.go
index 2facd890..5d39e383 100644
--- a/internal/attractor/engine/codergen_router_cli_only_test.go
+++ b/internal/attractor/engine/codergen_router_cli_only_test.go
@@ -10,6 +10,14 @@ import (
)
func TestCLIOnlyModelOverride_SwitchesBackendAndWarns(t *testing.T) {
+ orig := cliOnlyModelIDs
+ cliOnlyModelIDs = map[string]bool{
+ "test-cli-only-model": true,
+ }
+ t.Cleanup(func() {
+ cliOnlyModelIDs = orig
+ })
+
// Set up router with openai configured as API backend.
runtimes := map[string]ProviderRuntime{
"openai": {Key: "openai", Backend: BackendAPI},
@@ -24,7 +32,7 @@ func TestCLIOnlyModelOverride_SwitchesBackendAndWarns(t *testing.T) {
// Create a node using the CLI-only model.
node := model.NewNode("spark-test")
node.Attrs["llm_provider"] = "openai"
- node.Attrs["llm_model"] = "gpt-5.4-spark"
+ node.Attrs["llm_model"] = "test-cli-only-model"
node.Attrs["shape"] = "box"
// Create an execution with temp dirs to isolate artifacts and an Engine
diff --git a/internal/attractor/engine/fake_openai_test.go b/internal/attractor/engine/fake_openai_test.go
new file mode 100644
index 00000000..5f103a89
--- /dev/null
+++ b/internal/attractor/engine/fake_openai_test.go
@@ -0,0 +1,44 @@
+package engine
+
+import (
+ "encoding/json"
+ "io"
+ "net/http"
+)
+
+// writeOpenAIResponseAuto detects whether the request has "stream": true and
+// writes either an SSE streaming response or a plain JSON response accordingly.
+// This allows fake test servers to handle both Complete() and Stream() calls.
+func writeOpenAIResponseAuto(w http.ResponseWriter, r *http.Request, responseObj map[string]any) {
+ b, _ := io.ReadAll(r.Body)
+ _ = r.Body.Close()
+ var reqBody map[string]any
+ _ = json.Unmarshal(b, &reqBody)
+ streaming, _ := reqBody["stream"].(bool)
+
+ if !streaming {
+ w.Header().Set("Content-Type", "application/json")
+ _ = json.NewEncoder(w).Encode(responseObj)
+ return
+ }
+
+ w.Header().Set("Content-Type", "text/event-stream")
+ w.Header().Set("Cache-Control", "no-cache")
+
+ completedPayload, _ := json.Marshal(map[string]any{
+ "type": "response.completed",
+ "response": responseObj,
+ })
+
+ lines := []string{
+ "event: response.completed",
+ "data: " + string(completedPayload),
+ "",
+ }
+ for _, line := range lines {
+ _, _ = w.Write([]byte(line + "\n"))
+ }
+ if f, ok := w.(http.Flusher); ok {
+ f.Flush()
+ }
+}
diff --git a/internal/attractor/engine/handlers.go b/internal/attractor/engine/handlers.go
index 9567c2d1..84c4091a 100644
--- a/internal/attractor/engine/handlers.go
+++ b/internal/attractor/engine/handlers.go
@@ -11,6 +11,7 @@ import (
"os/exec"
"path/filepath"
"regexp"
+ goruntime "runtime"
"strings"
"time"
@@ -824,10 +825,11 @@ func (h *ToolHandler) Execute(ctx context.Context, execCtx *Execution, node *mod
}
}
+ shellPath := resolveToolShellPath()
if err := writeJSON(filepath.Join(stageDir, toolInvocationFileName), map[string]any{
- "tool": "bash",
+ "tool": filepath.Base(shellPath),
// Use a non-login, non-interactive shell to avoid sourcing user dotfiles.
- "argv": []string{"bash", "-c", cmdStr},
+ "argv": []string{shellPath, "-c", cmdStr},
"command": cmdStr,
"working_dir": execCtx.WorktreeDir,
"timeout_ms": timeout.Milliseconds(),
@@ -838,7 +840,7 @@ func (h *ToolHandler) Execute(ctx context.Context, execCtx *Execution, node *mod
cctx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
- cmd := exec.CommandContext(cctx, "bash", "-c", cmdStr)
+ cmd := exec.CommandContext(cctx, shellPath, "-c", cmdStr)
cmd.Dir = execCtx.WorktreeDir
cmd.Env = buildBaseNodeEnv(artifactPolicyFromExecution(execCtx))
// Avoid hanging on interactive reads; tool_command doesn't provide a way to supply stdin.
@@ -1018,6 +1020,68 @@ func looksActionableToolOutputLine(line string) bool {
return false
}
+func resolveToolShellPath() string {
+ return resolveToolShellPathWith(goruntime.GOOS, exec.LookPath, pathExists)
+}
+
+func resolveToolShellPathWith(goos string, lookPath func(string) (string, error), exists func(string) bool) string {
+ if lookPath == nil {
+ lookPath = exec.LookPath
+ }
+ if exists == nil {
+ exists = pathExists
+ }
+ if path, err := lookPath("bash"); err == nil && strings.TrimSpace(path) != "" {
+ if strings.EqualFold(goos, "windows") && isWindowsBashShim(path) {
+ if preferred := preferredWindowsBashPath(lookPath, exists); preferred != "" {
+ return preferred
+ }
+ }
+ return path
+ }
+ if strings.EqualFold(goos, "windows") {
+ if preferred := preferredWindowsBashPath(lookPath, exists); preferred != "" {
+ return preferred
+ }
+ }
+ return "bash"
+}
+
+func preferredWindowsBashPath(lookPath func(string) (string, error), exists func(string) bool) string {
+ candidates := []string{
+ filepath.Clean(`C:\Program Files\Git\bin\bash.exe`),
+ filepath.Clean(`C:\Program Files\Git\usr\bin\bash.exe`),
+ }
+ if lookPath != nil {
+ if gitPath, err := lookPath("git"); err == nil && strings.TrimSpace(gitPath) != "" {
+ gitDir := filepath.Dir(gitPath)
+ candidates = append(candidates,
+ filepath.Clean(filepath.Join(gitDir, "bash.exe")),
+ filepath.Clean(filepath.Join(gitDir, "..", "usr", "bin", "bash.exe")),
+ )
+ }
+ }
+ for _, candidate := range candidates {
+ if exists != nil && exists(candidate) {
+ return candidate
+ }
+ }
+ return ""
+}
+
+func isWindowsBashShim(path string) bool {
+ clean := strings.ToLower(filepath.Clean(strings.TrimSpace(path)))
+ return strings.HasSuffix(clean, `\windows\system32\bash.exe`) || strings.HasSuffix(clean, `\windows\sysnative\bash.exe`)
+}
+
+func pathExists(path string) bool {
+ if strings.TrimSpace(path) == "" {
+ return false
+ }
+ info, err := os.Stat(path)
+ return err == nil && !info.IsDir()
+}
+
func truncate(s string, n int) string {
if n <= 0 || len(s) <= n {
return s
diff --git a/internal/attractor/engine/provider_preflight.go b/internal/attractor/engine/provider_preflight.go
index 3a3a3622..1be7fc92 100644
--- a/internal/attractor/engine/provider_preflight.go
+++ b/internal/attractor/engine/provider_preflight.go
@@ -31,6 +31,9 @@ const (
defaultPreflightAPIPromptProbeRetries = 2
defaultPreflightAPIPromptProbeBaseDelay = 500 * time.Millisecond
defaultPreflightAPIPromptProbeMaxDelay = 5 * time.Second
+
+ codexAppServerCommandEnv = "CODEX_APP_SERVER_COMMAND"
+ codexAppServerDefaultCommand = "codex"
)
type providerPreflightReport struct {
@@ -90,9 +93,8 @@ func runProviderCLIPreflight(ctx context.Context, g *model.Graph, runtimes map[s
_ = writePreflightReport(opts.LogsRoot, report)
}()
- // Validate CLI-only models: fail early if a CLI-only model (e.g.,
- // gpt-5.4-spark) is used but its provider is not configured with
- // backend=cli.
+ // Validate CLI-only models: fail early if a configured CLI-only model is
+ // used but its provider is not configured with backend=cli.
if err := validateCLIOnlyModels(g, runtimes, opts.ForceModels, report); err != nil {
return report, err
}
@@ -179,7 +181,53 @@ func runProviderAPIPreflight(ctx context.Context, g *model.Graph, runtimes map[s
})
return fmt.Errorf("preflight: provider %s missing runtime definition", provider)
}
+ if rt.API.Protocol == providerspec.ProtocolCodexAppServer {
+ command := strings.TrimSpace(os.Getenv(codexAppServerCommandEnv))
+ source := "default"
+ if command == "" {
+ command = codexAppServerDefaultCommand
+ } else {
+ source = "env"
+ }
+ resolvedPath, lookErr := exec.LookPath(command)
+ if lookErr != nil {
+ report.addCheck(providerPreflightCheck{
+ Name: "provider_api_presence",
+ Provider: provider,
+ Status: preflightStatusFail,
+ Message: fmt.Sprintf("codex app server command %q is not available: %v", command, lookErr),
+ Details: map[string]any{
+ "command": command,
+ "source": source,
+ },
+ })
+ return fmt.Errorf("preflight: provider %s codex app server command %q is not available: %w", provider, command, lookErr)
+ }
+ report.addCheck(providerPreflightCheck{
+ Name: "provider_api_presence",
+ Provider: provider,
+ Status: preflightStatusPass,
+ Message: fmt.Sprintf("codex app server command %q is available", command),
+ Details: map[string]any{
+ "command": command,
+ "resolved_path": resolvedPath,
+ "source": source,
+ },
+ })
+ }
keyEnv := strings.TrimSpace(rt.API.DefaultAPIKeyEnv)
+ if rt.API.Protocol == providerspec.ProtocolCodexAppServer && keyEnv == "" {
+ report.addCheck(providerPreflightCheck{
+ Name: "provider_api_credentials",
+ Provider: provider,
+ Status: preflightStatusPass,
+ Message: "api key env is not required for codex app server",
+ Details: map[string]any{
+ "protocol": string(rt.API.Protocol),
+ },
+ })
+ continue
+ }
if keyEnv == "" {
report.addCheck(providerPreflightCheck{
Name: "provider_api_credentials",
diff --git a/internal/attractor/engine/provider_preflight_test.go b/internal/attractor/engine/provider_preflight_test.go
index 815da2ae..48aef60c 100644
--- a/internal/attractor/engine/provider_preflight_test.go
+++ b/internal/attractor/engine/provider_preflight_test.go
@@ -638,6 +638,164 @@ func TestPreflightReport_IncludesCLIProfileAndSource(t *testing.T) {
}
}
+func TestRunWithConfig_PreflightCodexAppServer_DoesNotRequireEnvGate(t *testing.T) {
+ t.Setenv("KILROY_PREFLIGHT_PROMPT_PROBES", "off")
+ prepareCodexAppServerCommandForPreflight(t)
+
+ repo := initTestRepo(t)
+ catalog := writeCatalogForPreflight(t, `{
+ "data": [
+ {"id": "codex-app-server/gpt-5.3-codex"}
+ ]
+}`)
+
+ cfg := testPreflightConfigForProviders(repo, catalog, map[string]BackendKind{
+ "codex-app-server": BackendAPI,
+ })
+ cfg.LLM.CLIProfile = "real"
+ dot := singleProviderDot("codex-app-server", "gpt-5.3-codex")
+
+ logsRoot := t.TempDir()
+ ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
+ defer cancel()
+ _, err := RunWithConfig(ctx, dot, cfg, RunOptions{RunID: "preflight-codex-app-server-no-env-gate", LogsRoot: logsRoot})
+ if err == nil {
+ t.Fatalf("expected downstream cxdb error, got nil")
+ }
+ if strings.Contains(err.Error(), "preflight:") {
+ t.Fatalf("unexpected preflight failure: %v", err)
+ }
+
+ report := mustReadPreflightReport(t, logsRoot)
+ foundAPIPresencePass := false
+ foundCredentialsPass := false
+ for _, check := range report.Checks {
+ if check.Name == "provider_api_presence" && check.Provider == "codex-app-server" {
+ foundAPIPresencePass = true
+ if check.Status != "pass" {
+ t.Fatalf("provider_api_presence.status=%q want pass", check.Status)
+ }
+ }
+ if check.Name != "provider_api_credentials" || check.Provider != "codex-app-server" {
+ continue
+ }
+ foundCredentialsPass = true
+ if check.Status != "pass" {
+ t.Fatalf("provider_api_credentials.status=%q want pass", check.Status)
+ }
+ if !strings.Contains(check.Message, "not required") {
+ t.Fatalf("provider_api_credentials.message=%q want mention of non-required api key", check.Message)
+ }
+ }
+ if !foundAPIPresencePass {
+ t.Fatalf("expected provider_api_presence pass check for codex-app-server")
+ }
+ if !foundCredentialsPass {
+ t.Fatalf("expected provider_api_credentials pass check for codex-app-server")
+ }
+}
+
+func TestRunWithConfig_PreflightCodexAppServer_ExplicitAPIKeyEnvIsEnforced(t *testing.T) {
+ t.Setenv("KILROY_PREFLIGHT_PROMPT_PROBES", "off")
+ t.Setenv("CODEX_APP_SERVER_TOKEN", "")
+ prepareCodexAppServerCommandForPreflight(t)
+
+ repo := initTestRepo(t)
+ catalog := writeCatalogForPreflight(t, `{
+ "data": [
+ {"id": "codex-app-server/gpt-5.3-codex"}
+ ]
+}`)
+
+ cfg := testPreflightConfigForProviders(repo, catalog, map[string]BackendKind{
+ "codex-app-server": BackendAPI,
+ })
+ cfg.LLM.CLIProfile = "real"
+ cfg.LLM.Providers["codex-app-server"] = ProviderConfig{
+ Backend: BackendAPI,
+ API: ProviderAPIConfig{
+ APIKeyEnv: "CODEX_APP_SERVER_TOKEN",
+ },
+ }
+ dot := singleProviderDot("codex-app-server", "gpt-5.3-codex")
+
+ logsRoot := t.TempDir()
+ ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
+ defer cancel()
+ _, err := RunWithConfig(ctx, dot, cfg, RunOptions{RunID: "preflight-codex-app-server-explicit-env", LogsRoot: logsRoot})
+ if err == nil {
+ t.Fatalf("expected preflight failure, got nil")
+ }
+ if !strings.Contains(err.Error(), "preflight: provider codex-app-server missing api key env CODEX_APP_SERVER_TOKEN") {
+ t.Fatalf("unexpected error: %v", err)
+ }
+
+ report := mustReadPreflightReport(t, logsRoot)
+ foundCredentialsFail := false
+ for _, check := range report.Checks {
+ if check.Name != "provider_api_credentials" || check.Provider != "codex-app-server" {
+ continue
+ }
+ foundCredentialsFail = true
+ if check.Status != "fail" {
+ t.Fatalf("provider_api_credentials.status=%q want fail", check.Status)
+ }
+ if !strings.Contains(check.Message, "CODEX_APP_SERVER_TOKEN") {
+ t.Fatalf("provider_api_credentials.message=%q want mention of CODEX_APP_SERVER_TOKEN", check.Message)
+ }
+ }
+ if !foundCredentialsFail {
+ t.Fatalf("expected provider_api_credentials fail check for codex-app-server")
+ }
+}
+
+func TestRunWithConfig_PreflightCodexAppServer_FailsWhenCommandUnavailable(t *testing.T) {
+ t.Setenv("KILROY_PREFLIGHT_PROMPT_PROBES", "off")
+ t.Setenv("CODEX_APP_SERVER_COMMAND", "codex-missing-preflight")
+
+ repo := initTestRepo(t)
+ catalog := writeCatalogForPreflight(t, `{
+ "data": [
+ {"id": "codex-app-server/gpt-5.3-codex"}
+ ]
+}`)
+
+ cfg := testPreflightConfigForProviders(repo, catalog, map[string]BackendKind{
+ "codex-app-server": BackendAPI,
+ })
+ cfg.LLM.CLIProfile = "real"
+ dot := singleProviderDot("codex-app-server", "gpt-5.3-codex")
+
+ logsRoot := t.TempDir()
+ ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
+ defer cancel()
+ _, err := RunWithConfig(ctx, dot, cfg, RunOptions{RunID: "preflight-codex-app-server-missing-command", LogsRoot: logsRoot})
+ if err == nil {
+ t.Fatalf("expected preflight failure, got nil")
+ }
+ if !strings.Contains(err.Error(), `preflight: provider codex-app-server codex app server command "codex-missing-preflight" is not available`) {
+ t.Fatalf("unexpected error: %v", err)
+ }
+
+ report := mustReadPreflightReport(t, logsRoot)
+ foundPresenceFail := false
+ for _, check := range report.Checks {
+ if check.Name != "provider_api_presence" || check.Provider != "codex-app-server" {
+ continue
+ }
+ foundPresenceFail = true
+ if check.Status != "fail" {
+ t.Fatalf("provider_api_presence.status=%q want fail", check.Status)
+ }
+ if !strings.Contains(check.Message, "codex-missing-preflight") {
+ t.Fatalf("provider_api_presence.message=%q want mention of codex-missing-preflight", check.Message)
+ }
+ }
+ if !foundPresenceFail {
+ t.Fatalf("expected provider_api_presence fail check for codex-app-server")
+ }
+}
+
func TestRunWithConfig_PreflightPromptProbe_UsesOnlyAPIProvidersInGraph(t *testing.T) {
repo := initTestRepo(t)
catalog := writeCatalogForPreflight(t, `{
@@ -1765,18 +1923,26 @@ exit 1
}
func TestProviderPreflight_CLIOnlyModelWithAPIBackend_Fails(t *testing.T) {
+ orig := cliOnlyModelIDs
+ cliOnlyModelIDs = map[string]bool{
+ "test-cli-only-model": true,
+ }
+ t.Cleanup(func() {
+ cliOnlyModelIDs = orig
+ })
+
t.Setenv("KILROY_PREFLIGHT_PROMPT_PROBES", "off")
repo := initTestRepo(t)
catalog := writeCatalogForPreflight(t, `{
"data": [
- {"id": "openai/gpt-5.4-spark"}
+ {"id": "openai/test-cli-only-model"}
]
}`)
// openai configured as API backend — should fail for CLI-only model.
cfg := testPreflightConfigForProviders(repo, catalog, map[string]BackendKind{
"openai": BackendAPI,
})
- dot := singleProviderDot("openai", "gpt-5.4-spark")
+ dot := singleProviderDot("openai", "test-cli-only-model")
logsRoot := t.TempDir()
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
@@ -1806,18 +1972,26 @@ func TestProviderPreflight_CLIOnlyModelWithAPIBackend_Fails(t *testing.T) {
}
func TestProviderPreflight_CLIOnlyModelWithCLIBackend_Passes(t *testing.T) {
+ orig := cliOnlyModelIDs
+ cliOnlyModelIDs = map[string]bool{
+ "test-cli-only-model": true,
+ }
+ t.Cleanup(func() {
+ cliOnlyModelIDs = orig
+ })
+
t.Setenv("KILROY_PREFLIGHT_PROMPT_PROBES", "off")
repo := initTestRepo(t)
catalog := writeCatalogForPreflight(t, `{
"data": [
- {"id": "openai/gpt-5.4-spark"}
+ {"id": "openai/test-cli-only-model"}
]
}`)
// openai configured as CLI backend — should pass the CLI-only check.
cfg := testPreflightConfigForProviders(repo, catalog, map[string]BackendKind{
"openai": BackendCLI,
})
- dot := singleProviderDot("openai", "gpt-5.4-spark")
+ dot := singleProviderDot("openai", "test-cli-only-model")
logsRoot := t.TempDir()
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
@@ -1842,12 +2016,20 @@ func TestProviderPreflight_CLIOnlyModelWithCLIBackend_Passes(t *testing.T) {
}
func TestProviderPreflight_CLIOnlyModel_ForceModelOverridesToRegular_NoFail(t *testing.T) {
+ orig := cliOnlyModelIDs
+ cliOnlyModelIDs = map[string]bool{
+ "test-cli-only-model": true,
+ }
+ t.Cleanup(func() {
+ cliOnlyModelIDs = orig
+ })
+
t.Setenv("KILROY_PREFLIGHT_PROMPT_PROBES", "off")
repo := initTestRepo(t)
catalog := writeCatalogForPreflight(t, `{
"data": [
- {"id": "openai/gpt-5.4-spark"},
- {"id": "openai/gpt-5.4"}
+ {"id": "openai/test-cli-only-model"},
+ {"id": "openai/gpt-5.2-codex"}
]
}`)
// openai configured as API backend with a CLI-only model in the graph,
@@ -1855,7 +2037,7 @@ func TestProviderPreflight_CLIOnlyModel_ForceModelOverridesToRegular_NoFail(t *t
cfg := testPreflightConfigForProviders(repo, catalog, map[string]BackendKind{
"openai": BackendAPI,
})
- dot := singleProviderDot("openai", "gpt-5.4-spark")
+ dot := singleProviderDot("openai", "test-cli-only-model")
logsRoot := t.TempDir()
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
@@ -1866,20 +2048,28 @@ func TestProviderPreflight_CLIOnlyModel_ForceModelOverridesToRegular_NoFail(t *t
AllowTestShim: true,
ForceModels: map[string]string{"openai": "gpt-5.4"},
})
- // Should NOT fail with CLI-only error — force-model replaces Spark with
- // a regular model.
+ // Should NOT fail with CLI-only error — force-model replaces the
+ // CLI-only model with a regular model.
if err != nil && strings.Contains(err.Error(), "CLI-only") {
t.Fatalf("force-model to regular model should bypass CLI-only check, got: %v", err)
}
}
func TestProviderPreflight_ForceModelInjectsCLIOnly_WithAPIBackend_Fails(t *testing.T) {
+ orig := cliOnlyModelIDs
+ cliOnlyModelIDs = map[string]bool{
+ "test-cli-only-model": true,
+ }
+ t.Cleanup(func() {
+ cliOnlyModelIDs = orig
+ })
+
t.Setenv("KILROY_PREFLIGHT_PROMPT_PROBES", "off")
repo := initTestRepo(t)
catalog := writeCatalogForPreflight(t, `{
"data": [
- {"id": "openai/gpt-5.4"},
- {"id": "openai/gpt-5.4-spark"}
+ {"id": "openai/gpt-5.2-codex"},
+ {"id": "openai/test-cli-only-model"}
]
}`)
// openai configured as API backend, graph uses a regular model, but
@@ -1896,7 +2086,7 @@ func TestProviderPreflight_ForceModelInjectsCLIOnly_WithAPIBackend_Fails(t *test
RunID: "force-cli-only-api-fail",
LogsRoot: logsRoot,
AllowTestShim: true,
- ForceModels: map[string]string{"openai": "gpt-5.4-spark"},
+ ForceModels: map[string]string{"openai": "test-cli-only-model"},
})
if err == nil {
t.Fatal("expected preflight error when force-model injects CLI-only model with API backend, got nil")
@@ -1944,3 +2134,19 @@ func TestUsedAPIProviders_ExcludesUncredentialedFailoverTarget(t *testing.T) {
t.Fatalf("want [anthropic google] (both credentialed), got %v", got)
}
}
+
+func prepareCodexAppServerCommandForPreflight(t *testing.T) {
+ t.Helper()
+ binDir := t.TempDir()
+ commandName := "codex-preflight"
+ commandPath := filepath.Join(binDir, commandName)
+ script := `#!/usr/bin/env bash
+set -euo pipefail
+exit 0
+`
+ if err := os.WriteFile(commandPath, []byte(script), 0o755); err != nil {
+ t.Fatalf("write codex preflight helper binary: %v", err)
+ }
+ t.Setenv("PATH", binDir+":"+os.Getenv("PATH"))
+ t.Setenv("CODEX_APP_SERVER_COMMAND", commandName)
+}
diff --git a/internal/attractor/engine/run_with_config_integration_test.go b/internal/attractor/engine/run_with_config_integration_test.go
index 0a7d7544..236b80eb 100644
--- a/internal/attractor/engine/run_with_config_integration_test.go
+++ b/internal/attractor/engine/run_with_config_integration_test.go
@@ -54,7 +54,7 @@ digraph G {
graph [goal="test"]
start [shape=Mdiamond]
exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, prompt="say hi"]
+ a [shape=box, llm_provider=openai, llm_model=gpt-5.2, prompt="say hi"]
start -> a -> exit
}
`)
@@ -89,7 +89,7 @@ digraph G {
graph [goal="test"]
start [shape=Mdiamond]
exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, prompt="say hi"]
+ a [shape=box, llm_provider=openai, llm_model=gpt-5.2, prompt="say hi"]
start -> a -> exit
}
`)
@@ -135,7 +135,7 @@ digraph G {
graph [goal="preflight provider checks"]
start [shape=Mdiamond]
exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, prompt="say hi"]
+ a [shape=box, llm_provider=openai, llm_model=gpt-5.2, prompt="say hi"]
start -> a -> exit
}
`)
@@ -326,7 +326,7 @@ digraph G {
graph [goal="status contract env injected"]
start [shape=Mdiamond]
exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, prompt="say hi"]
+ a [shape=box, llm_provider=openai, llm_model=gpt-5.2, prompt="say hi"]
start -> a -> exit
}
`)
@@ -457,7 +457,7 @@ digraph G {
graph [goal="test"]
start [shape=Mdiamond]
exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, prompt="say hi"]
+ a [shape=box, llm_provider=openai, llm_model=gpt-5.2, prompt="say hi"]
start -> a -> exit
}
`)
@@ -677,7 +677,7 @@ func TestRunWithConfig_APIBackend_AgentLoop_WritesAgentEventsAndPassesReasoningE
w.Header().Set("Content-Type", "application/json")
_, _ = w.Write([]byte(`{
"id": "resp_1",
- "model": "gpt-5.4",
+ "model": "gpt-5.2",
"output": [{"type": "message", "content": [{"type":"output_text", "text":"Hello"}]}],
"usage": {"input_tokens": 1, "output_tokens": 2, "total_tokens": 3}
}`))
@@ -703,7 +703,7 @@ digraph G {
graph [goal="test"]
start [shape=Mdiamond]
exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, reasoning_effort=low, auto_status=true, prompt="say hi"]
+ a [shape=box, llm_provider=openai, llm_model=gpt-5.2, reasoning_effort=low, auto_status=true, prompt="say hi"]
start -> a -> exit
}
`)
@@ -739,13 +739,7 @@ func TestRunWithConfig_APIBackend_OneShot_WritesRequestAndResponseArtifacts(t *t
w.WriteHeader(http.StatusNotFound)
return
}
- w.Header().Set("Content-Type", "application/json")
- _, _ = w.Write([]byte(`{
- "id": "resp_1",
- "model": "gpt-5.4",
- "output": [{"type": "message", "content": [{"type":"output_text", "text":"Hello"}]}],
- "usage": {"input_tokens": 1, "output_tokens": 2, "total_tokens": 3}
-}`))
+ writeOpenAITextResponse(w, r, "resp_1", "gpt-5.2", "Hello")
}))
t.Cleanup(openaiSrv.Close)
@@ -768,7 +762,7 @@ digraph G {
graph [goal="test"]
start [shape=Mdiamond]
exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, codergen_mode=one_shot, auto_status=true, prompt="say hi"]
+ a [shape=box, llm_provider=openai, llm_model=gpt-5.2, codergen_mode=one_shot, auto_status=true, prompt="say hi"]
start -> a -> exit
}
`)
@@ -781,7 +775,6 @@ digraph G {
}
assertExists(t, filepath.Join(res.LogsRoot, "a", "api_request.json"))
- assertExists(t, filepath.Join(res.LogsRoot, "a", "api_response.json"))
}
func TestRunWithConfig_APIBackend_ForceModelOverride_UsesForcedModel(t *testing.T) {
@@ -799,19 +792,13 @@ func TestRunWithConfig_APIBackend_ForceModelOverride_UsesForcedModel(t *testing.
return
}
b, _ := io.ReadAll(r.Body)
- _ = r.Body.Close()
+ r.Body = io.NopCloser(strings.NewReader(string(b)))
var gotReq map[string]any
_ = json.Unmarshal(b, &gotReq)
mu.Lock()
gotModel = strings.TrimSpace(anyToString(gotReq["model"]))
mu.Unlock()
- w.Header().Set("Content-Type", "application/json")
- _, _ = w.Write([]byte(`{
- "id": "resp_1",
- "model": "gpt-unknown-force-b",
- "output": [{"type": "message", "content": [{"type":"output_text", "text":"Hello"}]}],
- "usage": {"input_tokens": 1, "output_tokens": 2, "total_tokens": 3}
-}`))
+ writeOpenAITextResponse(w, r, "resp_1", "gpt-unknown-force-b", "Hello")
}))
t.Cleanup(openaiSrv.Close)
@@ -851,7 +838,6 @@ digraph G {
}
assertExists(t, filepath.Join(res.LogsRoot, "a", "api_request.json"))
- assertExists(t, filepath.Join(res.LogsRoot, "a", "api_response.json"))
mu.Lock()
model := gotModel
@@ -873,13 +859,7 @@ func TestRunWithConfig_APIBackend_AutoStatusFalse_FailsWhenNoStatusWritten(t *te
w.WriteHeader(http.StatusNotFound)
return
}
- w.Header().Set("Content-Type", "application/json")
- _, _ = w.Write([]byte(`{
- "id": "resp_1",
- "model": "gpt-5.4",
- "output": [{"type": "message", "content": [{"type":"output_text", "text":"Hello"}]}],
- "usage": {"input_tokens": 1, "output_tokens": 2, "total_tokens": 3}
-}`))
+ writeOpenAITextResponse(w, r, "resp_1", "gpt-5.2", "Hello")
}))
t.Cleanup(openaiSrv.Close)
@@ -903,7 +883,7 @@ digraph G {
start [shape=Mdiamond]
exit [shape=Msquare]
- a [shape=box, llm_provider=openai, llm_model=gpt-5.4, codergen_mode=one_shot, prompt="say hi"]
+ a [shape=box, llm_provider=openai, llm_model=gpt-5.2, codergen_mode=one_shot, prompt="say hi"]
fix [shape=parallelogram, tool_command="echo fixed > fixed.txt"]
start -> a
@@ -961,7 +941,7 @@ func initTestRepo(t *testing.T) string {
func writePinnedCatalog(t *testing.T) string {
t.Helper()
pinned := filepath.Join(t.TempDir(), "pinned.json")
- if err := os.WriteFile(pinned, []byte(`{"data":[{"id":"openai/gpt-5.4"}]}`), 0o644); err != nil {
+ if err := os.WriteFile(pinned, []byte(`{"data":[{"id":"openai/gpt-5.2"}]}`), 0o644); err != nil {
t.Fatal(err)
}
return pinned
@@ -1010,3 +990,14 @@ func assertNoCodexStateEntries(t *testing.T, entries []string) {
}
}
}
+
+// writeOpenAITextResponse is a convenience wrapper around writeOpenAIResponseAuto
+// for fake servers that return a simple text response.
+func writeOpenAITextResponse(w http.ResponseWriter, r *http.Request, id, model, text string) {
+ writeOpenAIResponseAuto(w, r, map[string]any{
+ "id": id,
+ "model": model,
+ "output": []any{map[string]any{"type": "message", "content": []any{map[string]any{"type": "output_text", "text": text}}}},
+ "usage": map[string]any{"input_tokens": 1, "output_tokens": 2, "total_tokens": 3},
+ })
+}
diff --git a/internal/attractor/engine/tool_shell_resolution_test.go b/internal/attractor/engine/tool_shell_resolution_test.go
new file mode 100644
index 00000000..496f3248
--- /dev/null
+++ b/internal/attractor/engine/tool_shell_resolution_test.go
@@ -0,0 +1,62 @@
+package engine
+
+import (
+ "errors"
+ "path/filepath"
+ "runtime"
+ "testing"
+)
+
+func TestResolveToolShellPathWith_NonWindowsUsesLookPath(t *testing.T) {
+ lookPath := func(name string) (string, error) {
+ if name == "bash" {
+ return "/usr/bin/bash", nil
+ }
+ return "", errors.New("not found")
+ }
+ got := resolveToolShellPathWith("linux", lookPath, func(string) bool { return false })
+ if got != "/usr/bin/bash" {
+ t.Fatalf("shell path: got %q want %q", got, "/usr/bin/bash")
+ }
+}
+
+func TestResolveToolShellPathWith_WindowsPrefersGitBashWhenBashIsWSLShim(t *testing.T) {
+ if runtime.GOOS != "windows" {
+ t.Skip("Windows-specific path handling")
+ }
+ lookPath := func(name string) (string, error) {
+ switch name {
+ case "bash":
+ return `C:\Windows\System32\bash.exe`, nil
+ case "git":
+ return `D:\Tools\Git\cmd\git.exe`, nil
+ default:
+ return "", errors.New("not found")
+ }
+ }
+ expected := filepath.Clean(`D:\Tools\Git\usr\bin\bash.exe`)
+ exists := func(path string) bool {
+ return filepath.Clean(path) == expected
+ }
+ got := resolveToolShellPathWith("windows", lookPath, exists)
+ if got != expected {
+ t.Fatalf("shell path: got %q want %q", got, expected)
+ }
+}
+
+func TestResolveToolShellPathWith_WindowsFallsBackToCommonGitBashWhenBashMissing(t *testing.T) {
+ if runtime.GOOS != "windows" {
+ t.Skip("Windows-specific path handling")
+ }
+ lookPath := func(name string) (string, error) {
+ return "", errors.New("not found")
+ }
+ expected := filepath.Clean(`C:\Program Files\Git\bin\bash.exe`)
+ exists := func(path string) bool {
+ return filepath.Clean(path) == expected
+ }
+ got := resolveToolShellPathWith("windows", lookPath, exists)
+ if got != expected {
+ t.Fatalf("shell path: got %q want %q", got, expected)
+ }
+}
diff --git a/internal/attractor/modeldb/catalog_test.go b/internal/attractor/modeldb/catalog_test.go
index dbb7f837..83912226 100644
--- a/internal/attractor/modeldb/catalog_test.go
+++ b/internal/attractor/modeldb/catalog_test.go
@@ -149,3 +149,22 @@ func TestCatalogHasProviderModel_SparkEntry(t *testing.T) {
t.Error("expected SupportsReasoning=true")
}
}
+
+func TestLoadEmbeddedCatalog_ContainsAnthropicSonnet46(t *testing.T) {
+ c, err := LoadEmbeddedCatalog()
+ if err != nil {
+ t.Fatalf("LoadEmbeddedCatalog: %v", err)
+ }
+ if !CatalogHasProviderModel(c, "anthropic", "claude-sonnet-4.6") {
+ t.Fatal("expected embedded catalog to contain anthropic/claude-sonnet-4.6")
+ }
+ if !CatalogHasProviderModel(c, "anthropic", "claude-sonnet-4-6") {
+ t.Fatal("expected dash-format anthropic model id to resolve for claude-sonnet-4.6")
+ }
+ if !CatalogHasProviderModel(c, "openai", "gpt-5.3-codex") {
+ t.Fatal("expected embedded catalog to contain openai/gpt-5.3-codex")
+ }
+ if !CatalogHasProviderModel(c, "google", "gemini-3.1-pro-preview") {
+ t.Fatal("expected embedded catalog to contain google/gemini-3.1-pro-preview")
+ }
+}
diff --git a/internal/attractor/modeldb/pinned/openrouter_models.json b/internal/attractor/modeldb/pinned/openrouter_models.json
index 5e24ce67..b8dffbfa 100644
--- a/internal/attractor/modeldb/pinned/openrouter_models.json
+++ b/internal/attractor/modeldb/pinned/openrouter_models.json
@@ -537,58 +537,6 @@
},
"expiration_date": null
},
- {
- "id": "allenai/olmo-3.1-32b-think",
- "canonical_slug": "allenai/olmo-3.1-32b-think-20251215",
- "hugging_face_id": "allenai/Olmo-3.1-32B-Think",
- "name": "AllenAI: Olmo 3.1 32B Think",
- "created": 1765907719,
- "description": "Olmo 3.1 32B Think is a large-scale, 32-billion-parameter model designed for deep reasoning, complex multi-step logic, and advanced instruction following. Building on the Olmo 3 series, version 3.1 delivers refined reasoning behavior and stronger performance across demanding evaluations and nuanced conversational tasks. Developed by Ai2 under the Apache 2.0 license, Olmo 3.1 32B Think continues the Olmo initiative’s commitment to openness, providing full transparency across model weights, code, and training methodology.",
- "context_length": 65536,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Other",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.00000015",
- "completion": "0.0000005"
- },
- "top_provider": {
- "context_length": 65536,
- "max_completion_tokens": 65536,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "include_reasoning",
- "logit_bias",
- "max_tokens",
- "presence_penalty",
- "reasoning",
- "repetition_penalty",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "temperature",
- "top_k",
- "top_p"
- ],
- "default_parameters": {
- "temperature": 0.6,
- "top_p": 0.95,
- "frequency_penalty": null
- },
- "expiration_date": null
- },
{
"id": "alpindale/goliath-120b",
"canonical_slug": "alpindale/goliath-120b",
@@ -1021,7 +969,9 @@
},
"pricing": {
"prompt": "0.000006",
- "completion": "0.00003"
+ "completion": "0.00003",
+ "input_cache_read": "0.0000006",
+ "input_cache_write": "0.0000075"
},
"top_provider": {
"context_length": 200000,
@@ -1091,7 +1041,7 @@
"top_p": null,
"frequency_penalty": null
},
- "expiration_date": null
+ "expiration_date": "2026-05-05"
},
{
"id": "anthropic/claude-3.7-sonnet:thinking",
@@ -1142,7 +1092,7 @@
"top_p": null,
"frequency_penalty": null
},
- "expiration_date": null
+ "expiration_date": "2026-05-05"
},
{
"id": "anthropic/claude-haiku-4.5",
@@ -1174,14 +1124,16 @@
"top_provider": {
"context_length": 200000,
"max_completion_tokens": 64000,
- "is_moderated": false
+ "is_moderated": true
},
"per_request_limits": null,
"supported_parameters": [
"include_reasoning",
"max_tokens",
"reasoning",
+ "response_format",
"stop",
+ "structured_outputs",
"temperature",
"tool_choice",
"tools",
@@ -1669,7 +1621,11 @@
},
"pricing": {
"prompt": "0",
- "completion": "0"
+ "completion": "0",
+ "request": "0",
+ "image": "0",
+ "web_search": "0",
+ "internal_reasoning": "0"
},
"top_provider": {
"context_length": 131000,
@@ -1723,21 +1679,15 @@
},
"per_request_limits": null,
"supported_parameters": [
- "frequency_penalty",
"include_reasoning",
- "logit_bias",
"max_tokens",
- "min_p",
- "presence_penalty",
"reasoning",
- "repetition_penalty",
"response_format",
"stop",
"structured_outputs",
"temperature",
"tool_choice",
"tools",
- "top_k",
"top_p"
],
"default_parameters": {
@@ -2520,9 +2470,8 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.0000003",
- "completion": "0.0000012",
- "input_cache_read": "0.00000015"
+ "prompt": "0.00000032",
+ "completion": "0.00000089"
},
"top_provider": {
"context_length": 163840,
@@ -2539,7 +2488,6 @@
"response_format",
"seed",
"stop",
- "structured_outputs",
"temperature",
"tool_choice",
"tools",
@@ -2569,13 +2517,13 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000019",
- "completion": "0.00000087",
- "input_cache_read": "0.000000095"
+ "prompt": "0.0000002",
+ "completion": "0.00000077",
+ "input_cache_read": "0.000000135"
},
"top_provider": {
"context_length": 163840,
- "max_completion_tokens": 65536,
+ "max_completion_tokens": null,
"is_moderated": false
},
"per_request_limits": null,
@@ -2722,9 +2670,9 @@
"instruct_type": "deepseek-r1"
},
"pricing": {
- "prompt": "0.0000004",
- "completion": "0.00000175",
- "input_cache_read": "0.0000002"
+ "prompt": "0.00000045",
+ "completion": "0.00000215",
+ "input_cache_read": "0.000000225"
},
"top_provider": {
"context_length": 163840,
@@ -2760,51 +2708,6 @@
},
"expiration_date": null
},
- {
- "id": "deepseek/deepseek-r1-0528:free",
- "canonical_slug": "deepseek/deepseek-r1-0528",
- "hugging_face_id": "deepseek-ai/DeepSeek-R1-0528",
- "name": "DeepSeek: R1 0528 (free)",
- "created": 1748455170,
- "description": "May 28th update to the [original DeepSeek R1](/deepseek/deepseek-r1) Performance on par with [OpenAI o1](/openai/o1), but open-sourced and with fully open reasoning tokens. It's 671B parameters in size, with 37B active in an inference pass.\n\nFully open-source model.",
- "context_length": 163840,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "DeepSeek",
- "instruct_type": "deepseek-r1"
- },
- "pricing": {
- "prompt": "0",
- "completion": "0"
- },
- "top_provider": {
- "context_length": 163840,
- "max_completion_tokens": null,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "include_reasoning",
- "max_tokens",
- "presence_penalty",
- "reasoning",
- "repetition_penalty",
- "temperature"
- ],
- "default_parameters": {
- "temperature": null,
- "top_p": null,
- "frequency_penalty": null
- },
- "expiration_date": null
- },
{
"id": "deepseek/deepseek-r1-distill-llama-70b",
"canonical_slug": "deepseek/deepseek-r1-distill-llama-70b",
@@ -2825,13 +2728,12 @@
"instruct_type": "deepseek-r1"
},
"pricing": {
- "prompt": "0.00000003",
- "completion": "0.00000011",
- "input_cache_read": "0.000000015"
+ "prompt": "0.0000007",
+ "completion": "0.0000008"
},
"top_provider": {
"context_length": 131072,
- "max_completion_tokens": 131072,
+ "max_completion_tokens": 16384,
"is_moderated": false
},
"per_request_limits": null,
@@ -2887,6 +2789,7 @@
"supported_parameters": [
"frequency_penalty",
"include_reasoning",
+ "logprobs",
"max_tokens",
"presence_penalty",
"reasoning",
@@ -2897,6 +2800,7 @@
"structured_outputs",
"temperature",
"top_k",
+ "top_logprobs",
"top_p"
],
"default_parameters": {},
@@ -3033,12 +2937,11 @@
},
"pricing": {
"prompt": "0.00000025",
- "completion": "0.00000038",
- "input_cache_read": "0.000000125"
+ "completion": "0.0000004"
},
"top_provider": {
"context_length": 163840,
- "max_completion_tokens": 65536,
+ "max_completion_tokens": 163840,
"is_moderated": false
},
"per_request_limits": null,
@@ -3102,7 +3005,9 @@
"supported_parameters": [
"frequency_penalty",
"include_reasoning",
+ "logit_bias",
"max_tokens",
+ "min_p",
"presence_penalty",
"reasoning",
"repetition_penalty",
@@ -3143,9 +3048,9 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000027",
- "completion": "0.00000041",
- "input_cache_read": "0.000000135"
+ "prompt": "0.0000004",
+ "completion": "0.0000012",
+ "input_cache_read": "0.0000002"
},
"top_provider": {
"context_length": 163840,
@@ -3154,19 +3059,12 @@
},
"per_request_limits": null,
"supported_parameters": [
- "frequency_penalty",
"include_reasoning",
- "logit_bias",
"max_tokens",
- "presence_penalty",
"reasoning",
- "repetition_penalty",
"response_format",
- "seed",
- "stop",
"structured_outputs",
"temperature",
- "top_k",
"top_p"
],
"default_parameters": {
@@ -3442,7 +3340,7 @@
"id": "google/gemini-2.5-flash-image",
"canonical_slug": "google/gemini-2.5-flash-image",
"hugging_face_id": "",
- "name": "Google: Gemini 2.5 Flash Image (Nano Banana)",
+ "name": "Google: Nano Banana (Gemini 2.5 Flash Image)",
"created": 1759870431,
"description": "Gemini 2.5 Flash Image, a.k.a. \"Nano Banana,\" is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation, edits, and multi-turn conversations. Aspect ratios can be controlled with the [image_config API Parameter](https://openrouter.ai/docs/features/multimodal/image-generation#image-aspect-ratio-configuration)",
"context_length": 32768,
@@ -3478,6 +3376,7 @@
"max_tokens",
"response_format",
"seed",
+ "stop",
"structured_outputs",
"temperature",
"top_p"
@@ -3606,66 +3505,8 @@
"expiration_date": null
},
{
- "id": "google/gemini-2.5-flash-preview-09-2025",
- "canonical_slug": "google/gemini-2.5-flash-preview-09-2025",
- "hugging_face_id": "",
- "name": "Google: Gemini 2.5 Flash Preview 09-2025",
- "created": 1758820178,
- "description": "Gemini 2.5 Flash Preview September 2025 Checkpoint is Google's state-of-the-art workhorse model, specifically designed for advanced reasoning, coding, mathematics, and scientific tasks. It includes built-in \"thinking\" capabilities, enabling it to provide responses with greater accuracy and nuanced context handling. \n\nAdditionally, Gemini 2.5 Flash is configurable through the \"max tokens for reasoning\" parameter, as described in the documentation (https://openrouter.ai/docs/use-cases/reasoning-tokens#max-tokens-for-reasoning).",
- "context_length": 1048576,
- "architecture": {
- "modality": "text+image+file+audio+video->text",
- "input_modalities": [
- "image",
- "file",
- "text",
- "audio",
- "video"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Gemini",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.0000003",
- "completion": "0.0000025",
- "image": "0.0000003",
- "audio": "0.000001",
- "internal_reasoning": "0.0000025",
- "input_cache_read": "0.00000003",
- "input_cache_write": "0.00000008333333333333334"
- },
- "top_provider": {
- "context_length": 1048576,
- "max_completion_tokens": 65536,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "include_reasoning",
- "max_tokens",
- "reasoning",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "temperature",
- "tool_choice",
- "tools",
- "top_p"
- ],
- "default_parameters": {
- "temperature": null,
- "top_p": null,
- "frequency_penalty": null
- },
- "expiration_date": "2026-02-17"
- },
- {
- "id": "google/gemini-3.1-pro-preview-06-05",
- "canonical_slug": "google/gemini-3.1-pro-preview",
+ "id": "google/gemini-2.5-pro",
+ "canonical_slug": "google/gemini-2.5-pro",
"hugging_face_id": "",
"name": "Google: Gemini 2.5 Pro",
"created": 1750169544,
@@ -3722,8 +3563,8 @@
"expiration_date": null
},
{
- "id": "google/gemini-3.1-pro-preview",
- "canonical_slug": "google/gemini-3.1-pro-preview-06-05",
+ "id": "google/gemini-2.5-pro-preview",
+ "canonical_slug": "google/gemini-2.5-pro-preview-06-05",
"hugging_face_id": "",
"name": "Google: Gemini 2.5 Pro Preview 06-05",
"created": 1749137257,
@@ -3775,8 +3616,8 @@
"expiration_date": null
},
{
- "id": "google/gemini-3.1-pro-preview-05-06",
- "canonical_slug": "google/gemini-3.1-pro-preview-03-25",
+ "id": "google/gemini-2.5-pro-preview-05-06",
+ "canonical_slug": "google/gemini-2.5-pro-preview-03-25",
"hugging_face_id": "",
"name": "Google: Gemini 2.5 Pro Preview 05-06",
"created": 1746578513,
@@ -4000,7 +3841,7 @@
"top_p": null,
"frequency_penalty": null
},
- "expiration_date": null
+ "expiration_date": "2026-03-09"
},
{
"id": "google/gemma-2-27b-it",
@@ -4106,13 +3947,12 @@
"instruct_type": "gemma"
},
"pricing": {
- "prompt": "0.00000003",
- "completion": "0.0000001",
- "input_cache_read": "0.000000015"
+ "prompt": "0.00000004",
+ "completion": "0.00000013"
},
"top_provider": {
"context_length": 131072,
- "max_completion_tokens": 131072,
+ "max_completion_tokens": null,
"is_moderated": false
},
"per_request_limits": null,
@@ -4283,7 +4123,7 @@
"name": "Google: Gemma 3 4B",
"created": 1741905510,
"description": "Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities, including structured outputs and function calling.",
- "context_length": 96000,
+ "context_length": 131072,
"architecture": {
"modality": "text+image->text",
"input_modalities": [
@@ -4297,11 +4137,11 @@
"instruct_type": "gemma"
},
"pricing": {
- "prompt": "0.00000001703012",
- "completion": "0.0000000681536"
+ "prompt": "0.00000004",
+ "completion": "0.00000008"
},
"top_provider": {
- "context_length": 96000,
+ "context_length": 131072,
"max_completion_tokens": null,
"is_moderated": false
},
@@ -5088,7 +4928,11 @@
"per_request_limits": null,
"supported_parameters": [
"max_tokens",
+ "response_format",
+ "structured_outputs",
"temperature",
+ "tool_choice",
+ "tools",
"top_p"
],
"default_parameters": {},
@@ -5412,7 +5256,6 @@
"per_request_limits": null,
"supported_parameters": [
"frequency_penalty",
- "logit_bias",
"max_tokens",
"min_p",
"presence_penalty",
@@ -5792,7 +5635,7 @@
"top_p"
],
"default_parameters": {},
- "expiration_date": null
+ "expiration_date": "2026-02-25"
},
{
"id": "meta-llama/llama-guard-3-8b",
@@ -5936,7 +5779,7 @@
"name": "WizardLM-2 8x22B",
"created": 1713225600,
"description": "WizardLM-2 8x22B is Microsoft AI's most advanced Wizard model. It demonstrates highly competitive performance compared to leading proprietary models, and it consistently outperforms all existing state-of-the-art opensource models.\n\nIt is an instruct finetune of [Mixtral 8x22B](/models/mistralai/mixtral-8x22b).\n\nTo read more about the model release, [click here](https://wizardlm.github.io/WizardLM2/).\n\n#moe",
- "context_length": 65536,
+ "context_length": 65535,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -5949,22 +5792,20 @@
"instruct_type": "vicuna"
},
"pricing": {
- "prompt": "0.00000048",
- "completion": "0.00000048"
+ "prompt": "0.00000062",
+ "completion": "0.00000062"
},
"top_provider": {
- "context_length": 65536,
- "max_completion_tokens": 16384,
+ "context_length": 65535,
+ "max_completion_tokens": 8000,
"is_moderated": false
},
"per_request_limits": null,
"supported_parameters": [
"frequency_penalty",
"max_tokens",
- "min_p",
"presence_penalty",
"repetition_penalty",
- "response_format",
"seed",
"stop",
"temperature",
@@ -6009,7 +5850,11 @@
"temperature",
"top_p"
],
- "default_parameters": {},
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
"expiration_date": null
},
{
@@ -6056,7 +5901,11 @@
"top_k",
"top_p"
],
- "default_parameters": {},
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
"expiration_date": null
},
{
@@ -6215,47 +6064,73 @@
},
{
"id": "minimax/minimax-m2.5",
- "canonical_slug": "minimax/minimax-m2.5",
- "hugging_face_id": "",
+ "canonical_slug": "minimax/minimax-m2.5-20260211",
+ "hugging_face_id": "MiniMaxAI/MiniMax-M2.5",
"name": "MiniMax: MiniMax M2.5",
- "created": 1771200000,
- "description": "MiniMax M2.5 - advanced reasoning and coding model with agentic capabilities",
+ "created": 1770908502,
+ "description": "MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1 to extend into general office work, reaching fluency in generating and operating Word, Excel, and Powerpoint files, context switching between diverse software environments, and working across different agent and human teams. Scoring 80.2% on SWE-Bench Verified, 51.3% on Multi-SWE-Bench, and 76.3% on BrowseComp, M2.5 is also more token efficient than previous generations, having been trained to optimize its actions and output through planning.",
"context_length": 196608,
"architecture": {
"modality": "text->text",
- "input_modalities": ["text"],
- "output_modalities": ["text"],
+ "input_modalities": [
+ "text"
+ ],
+ "output_modalities": [
+ "text"
+ ],
"tokenizer": "Other",
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000015",
+ "prompt": "0.000000295",
"completion": "0.0000012",
- "image": "0",
- "request": "0",
- "input_cache_read": "0",
- "input_cache_write": "0",
- "web_search": "0",
- "internal_reasoning": "0"
+ "input_cache_read": "0.00000003"
},
"top_provider": {
"context_length": 196608,
- "max_completion_tokens": 16384,
+ "max_completion_tokens": 196608,
"is_moderated": false
},
- "supported_parameters": ["tools", "temperature", "top_p", "max_tokens", "stream", "stop"],
"per_request_limits": null,
- "expiration_date": null
- },
- {
- "id": "mistralai/codestral-2508",
- "canonical_slug": "mistralai/codestral-2508",
- "hugging_face_id": "",
- "name": "Mistral: Codestral 2508",
- "created": 1754079630,
- "description": "Mistral's cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation.\n\n[Blog Post](https://mistral.ai/news/codestral-25-08)",
- "context_length": 256000,
- "architecture": {
+ "supported_parameters": [
+ "frequency_penalty",
+ "include_reasoning",
+ "logit_bias",
+ "logprobs",
+ "max_tokens",
+ "min_p",
+ "parallel_tool_calls",
+ "presence_penalty",
+ "reasoning",
+ "reasoning_effort",
+ "repetition_penalty",
+ "response_format",
+ "seed",
+ "stop",
+ "structured_outputs",
+ "temperature",
+ "tool_choice",
+ "tools",
+ "top_k",
+ "top_logprobs",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": 1,
+ "top_p": 0.95,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "mistralai/codestral-2508",
+ "canonical_slug": "mistralai/codestral-2508",
+ "hugging_face_id": "",
+ "name": "Mistral: Codestral 2508",
+ "created": 1754079630,
+ "description": "Mistral's cutting-edge language model for coding released end of July 2025. Codestral specializes in low-latency, high-frequency tasks such as fill-in-the-middle (FIM), code correction and test generation.\n\n[Blog Post](https://mistral.ai/news/codestral-25-08)",
+ "context_length": 256000,
+ "architecture": {
"modality": "text->text",
"input_modalities": [
"text"
@@ -6314,13 +6189,12 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000005",
- "completion": "0.00000022",
- "input_cache_read": "0.000000025"
+ "prompt": "0.0000004",
+ "completion": "0.000002"
},
"top_provider": {
"context_length": 262144,
- "max_completion_tokens": 65536,
+ "max_completion_tokens": null,
"is_moderated": false
},
"per_request_limits": null,
@@ -6328,7 +6202,6 @@
"frequency_penalty",
"max_tokens",
"presence_penalty",
- "repetition_penalty",
"response_format",
"seed",
"stop",
@@ -6336,7 +6209,6 @@
"temperature",
"tool_choice",
"tools",
- "top_k",
"top_p"
],
"default_parameters": {
@@ -6472,11 +6344,8 @@
"per_request_limits": null,
"supported_parameters": [
"frequency_penalty",
- "logit_bias",
"max_tokens",
- "min_p",
"presence_penalty",
- "repetition_penalty",
"response_format",
"seed",
"stop",
@@ -6484,7 +6353,6 @@
"temperature",
"tool_choice",
"tools",
- "top_k",
"top_p"
],
"default_parameters": {
@@ -6494,53 +6362,6 @@
},
"expiration_date": null
},
- {
- "id": "mistralai/ministral-3b",
- "canonical_slug": "mistralai/ministral-3b",
- "hugging_face_id": null,
- "name": "Mistral: Ministral 3B",
- "created": 1729123200,
- "description": "Ministral 3B is a 3B parameter model optimized for on-device and edge computing. It excels in knowledge, commonsense reasoning, and function-calling, outperforming larger models like Mistral 7B on most benchmarks. Supporting up to 128k context length, it’s ideal for orchestrating agentic workflows and specialist tasks with efficient inference.",
- "context_length": 131072,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Mistral",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.00000004",
- "completion": "0.00000004"
- },
- "top_provider": {
- "context_length": 131072,
- "max_completion_tokens": null,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "max_tokens",
- "presence_penalty",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "temperature",
- "tool_choice",
- "tools",
- "top_p"
- ],
- "default_parameters": {
- "temperature": 0.3
- },
- "expiration_date": null
- },
{
"id": "mistralai/ministral-3b-2512",
"canonical_slug": "mistralai/ministral-3b-2512",
@@ -6591,53 +6412,6 @@
},
"expiration_date": null
},
- {
- "id": "mistralai/ministral-8b",
- "canonical_slug": "mistralai/ministral-8b",
- "hugging_face_id": null,
- "name": "Mistral: Ministral 8B",
- "created": 1729123200,
- "description": "Ministral 8B is an 8B parameter model featuring a unique interleaved sliding-window attention pattern for faster, memory-efficient inference. Designed for edge use cases, it supports up to 128k context length and excels in knowledge and reasoning tasks. It outperforms peers in the sub-10B category, making it perfect for low-latency, privacy-first applications.",
- "context_length": 131072,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Mistral",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.0000001",
- "completion": "0.0000001"
- },
- "top_provider": {
- "context_length": 131072,
- "max_completion_tokens": null,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "max_tokens",
- "presence_penalty",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "temperature",
- "tool_choice",
- "tools",
- "top_p"
- ],
- "default_parameters": {
- "temperature": 0.3
- },
- "expiration_date": null
- },
{
"id": "mistralai/ministral-8b-2512",
"canonical_slug": "mistralai/ministral-8b-2512",
@@ -6778,52 +6552,6 @@
},
"expiration_date": null
},
- {
- "id": "mistralai/mistral-7b-instruct-v0.2",
- "canonical_slug": "mistralai/mistral-7b-instruct-v0.2",
- "hugging_face_id": "mistralai/Mistral-7B-Instruct-v0.2",
- "name": "Mistral: Mistral 7B Instruct v0.2",
- "created": 1703721600,
- "description": "A high-performing, industry-standard 7.3B parameter model, with optimizations for speed and context length.\n\nAn improved version of [Mistral 7B Instruct](/modelsmistralai/mistral-7b-instruct-v0.1), with the following changes:\n\n- 32k context window (vs 8k context in v0.1)\n- Rope-theta = 1e6\n- No Sliding-Window Attention",
- "context_length": 32768,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Mistral",
- "instruct_type": "mistral"
- },
- "pricing": {
- "prompt": "0.0000002",
- "completion": "0.0000002"
- },
- "top_provider": {
- "context_length": 32768,
- "max_completion_tokens": null,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "logit_bias",
- "max_tokens",
- "min_p",
- "presence_penalty",
- "repetition_penalty",
- "stop",
- "temperature",
- "top_k",
- "top_p"
- ],
- "default_parameters": {
- "temperature": 0.3
- },
- "expiration_date": null
- },
{
"id": "mistralai/mistral-7b-instruct-v0.3",
"canonical_slug": "mistralai/mistral-7b-instruct-v0.3",
@@ -7293,7 +7021,6 @@
"response_format",
"seed",
"stop",
- "structured_outputs",
"temperature",
"tool_choice",
"tools",
@@ -7314,7 +7041,7 @@
"name": "Mistral: Mistral Small 3.1 24B",
"created": 1742238937,
"description": "Mistral Small 3.1 24B Instruct is an upgraded variant of Mistral Small 3 (2501), featuring 24 billion parameters with advanced multimodal capabilities. It provides state-of-the-art performance in text-based reasoning and vision tasks, including image analysis, programming, mathematical reasoning, and multilingual support across dozens of languages. Equipped with an extensive 128k token context window and optimized for efficient local inference, it supports use cases such as conversational agents, function calling, long-document comprehension, and privacy-sensitive deployments. The updated version is [Mistral Small 3.2](mistralai/mistral-small-3.2-24b-instruct)",
- "context_length": 131072,
+ "context_length": 128000,
"architecture": {
"modality": "text+image->text",
"input_modalities": [
@@ -7328,13 +7055,12 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000003",
- "completion": "0.00000011",
- "input_cache_read": "0.000000015"
+ "prompt": "0.00000035",
+ "completion": "0.00000056"
},
"top_provider": {
- "context_length": 131072,
- "max_completion_tokens": 131072,
+ "context_length": 128000,
+ "max_completion_tokens": null,
"is_moderated": false
},
"per_request_limits": null,
@@ -7343,13 +7069,8 @@
"max_tokens",
"presence_penalty",
"repetition_penalty",
- "response_format",
"seed",
- "stop",
- "structured_outputs",
"temperature",
- "tool_choice",
- "tools",
"top_k",
"top_p"
],
@@ -7499,53 +7220,6 @@
},
"expiration_date": null
},
- {
- "id": "mistralai/mistral-tiny",
- "canonical_slug": "mistralai/mistral-tiny",
- "hugging_face_id": null,
- "name": "Mistral Tiny",
- "created": 1704844800,
- "description": "Note: This model is being deprecated. Recommended replacement is the newer [Ministral 8B](/mistral/ministral-8b)\n\nThis model is currently powered by Mistral-7B-v0.2, and incorporates a \"better\" fine-tuning than [Mistral 7B](/models/mistralai/mistral-7b-instruct-v0.1), inspired by community work. It's best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.",
- "context_length": 32768,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Mistral",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.00000025",
- "completion": "0.00000025"
- },
- "top_provider": {
- "context_length": 32768,
- "max_completion_tokens": null,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "max_tokens",
- "presence_penalty",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "temperature",
- "tool_choice",
- "tools",
- "top_p"
- ],
- "default_parameters": {
- "temperature": 0.3
- },
- "expiration_date": null
- },
{
"id": "mistralai/mixtral-8x22b-instruct",
"canonical_slug": "mistralai/mixtral-8x22b-instruct",
@@ -7644,13 +7318,13 @@
"expiration_date": null
},
{
- "id": "mistralai/pixtral-12b",
- "canonical_slug": "mistralai/pixtral-12b",
- "hugging_face_id": "mistralai/Pixtral-12B-2409",
- "name": "Mistral: Pixtral 12B",
- "created": 1725926400,
- "description": "The first multi-modal, text+image-to-text model from Mistral AI. Its weights were launched via torrent: https://x.com/mistralai/status/1833758285167722836.",
- "context_length": 32768,
+ "id": "mistralai/pixtral-large-2411",
+ "canonical_slug": "mistralai/pixtral-large-2411",
+ "hugging_face_id": "",
+ "name": "Mistral: Pixtral Large 2411",
+ "created": 1731977388,
+ "description": "Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images.\n\nThe model is available under the Mistral Research License (MRL) for research and educational use, and the Mistral Commercial License for experimentation, testing, and production for commercial purposes.\n\n",
+ "context_length": 131072,
"architecture": {
"modality": "text+image->text",
"input_modalities": [
@@ -7664,63 +7338,11 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.0000001",
- "completion": "0.0000001"
+ "prompt": "0.000002",
+ "completion": "0.000006"
},
"top_provider": {
- "context_length": 32768,
- "max_completion_tokens": null,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "logit_bias",
- "max_tokens",
- "min_p",
- "presence_penalty",
- "repetition_penalty",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "temperature",
- "tool_choice",
- "tools",
- "top_k",
- "top_p"
- ],
- "default_parameters": {
- "temperature": 0.3
- },
- "expiration_date": null
- },
- {
- "id": "mistralai/pixtral-large-2411",
- "canonical_slug": "mistralai/pixtral-large-2411",
- "hugging_face_id": "",
- "name": "Mistral: Pixtral Large 2411",
- "created": 1731977388,
- "description": "Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images.\n\nThe model is available under the Mistral Research License (MRL) for research and educational use, and the Mistral Commercial License for experimentation, testing, and production for commercial purposes.\n\n",
- "context_length": 131072,
- "architecture": {
- "modality": "text+image->text",
- "input_modalities": [
- "text",
- "image"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Mistral",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.000002",
- "completion": "0.000006"
- },
- "top_provider": {
- "context_length": 131072,
+ "context_length": 131072,
"max_completion_tokens": null,
"is_moderated": false
},
@@ -7801,7 +7423,7 @@
"name": "MoonshotAI: Kimi K2 0711",
"created": 1752263252,
"description": "Kimi K2 Instruct is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. Kimi K2 excels across a broad range of benchmarks, particularly in coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) tasks. It supports long-context inference up to 128K tokens and is designed with a novel training stack that includes the MuonClip optimizer for stable large-scale MoE training.",
- "context_length": 131072,
+ "context_length": 131000,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -7814,11 +7436,11 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.0000005",
- "completion": "0.0000024"
+ "prompt": "0.00000055",
+ "completion": "0.0000022"
},
"top_provider": {
- "context_length": 131072,
+ "context_length": 131000,
"max_completion_tokens": null,
"is_moderated": false
},
@@ -7851,7 +7473,7 @@
"name": "MoonshotAI: Kimi K2 0905",
"created": 1757021147,
"description": "Kimi K2 0905 is the September update of [Kimi K2 0711](moonshotai/kimi-k2). It is a large-scale Mixture-of-Experts (MoE) language model developed by Moonshot AI, featuring 1 trillion total parameters with 32 billion active per forward pass. It supports long-context inference up to 256k tokens, extended from the previous 128k.\n\nThis update improves agentic coding with higher accuracy and better generalization across scaffolds, and enhances frontend coding with more aesthetic and functional outputs for web, 3D, and related tasks. Kimi K2 is optimized for agentic capabilities, including advanced tool use, reasoning, and code synthesis. It excels across coding (LiveCodeBench, SWE-bench), reasoning (ZebraLogic, GPQA), and tool-use (Tau2, AceBench) benchmarks. The model is trained with a novel stack incorporating the MuonClip optimizer for stable large-scale MoE training.",
- "context_length": 262144,
+ "context_length": 131072,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -7864,13 +7486,13 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000039",
- "completion": "0.0000019",
- "input_cache_read": "0.000000195"
+ "prompt": "0.0000004",
+ "completion": "0.000002",
+ "input_cache_read": "0.00000015"
},
"top_provider": {
- "context_length": 262144,
- "max_completion_tokens": 262144,
+ "context_length": 131072,
+ "max_completion_tokens": null,
"is_moderated": false
},
"per_request_limits": null,
@@ -7948,7 +7570,7 @@
"name": "MoonshotAI: Kimi K2 Thinking",
"created": 1762440622,
"description": "Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in Kimi K2, it activates 32 billion parameters per forward pass and supports 256 k-token context windows. The model is optimized for persistent step-by-step thought, dynamic tool invocation, and complex reasoning workflows that span hundreds of turns. It interleaves step-by-step reasoning with tool use, enabling autonomous research, coding, and writing that can persist for hundreds of sequential actions without drift.\n\nIt sets new open-source benchmarks on HLE, BrowseComp, SWE-Multilingual, and LiveCodeBench, while maintaining stable multi-agent behavior through 200–300 tool calls. Built on a large-scale MoE architecture with MuonClip optimization, it combines strong reasoning depth with high inference efficiency for demanding agentic and analytical tasks.",
- "context_length": 262144,
+ "context_length": 131072,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -7961,13 +7583,13 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.0000004",
- "completion": "0.00000175",
- "input_cache_read": "0.0000002"
+ "prompt": "0.00000047",
+ "completion": "0.000002",
+ "input_cache_read": "0.000000141"
},
"top_provider": {
- "context_length": 262144,
- "max_completion_tokens": 65535,
+ "context_length": 131072,
+ "max_completion_tokens": null,
"is_moderated": false
},
"per_request_limits": null,
@@ -7975,7 +7597,6 @@
"frequency_penalty",
"include_reasoning",
"logit_bias",
- "logprobs",
"max_tokens",
"min_p",
"presence_penalty",
@@ -7989,7 +7610,6 @@
"tool_choice",
"tools",
"top_k",
- "top_logprobs",
"top_p"
],
"default_parameters": {
@@ -8021,12 +7641,12 @@
},
"pricing": {
"prompt": "0.00000045",
- "completion": "0.00000225",
- "input_cache_read": "0.000000070000002"
+ "completion": "0.0000022",
+ "input_cache_read": "0.000000225"
},
"top_provider": {
"context_length": 262144,
- "max_completion_tokens": null,
+ "max_completion_tokens": 65535,
"is_moderated": false
},
"per_request_limits": null,
@@ -8037,8 +7657,10 @@
"logprobs",
"max_tokens",
"min_p",
+ "parallel_tool_calls",
"presence_penalty",
"reasoning",
+ "reasoning_effort",
"repetition_penalty",
"response_format",
"seed",
@@ -8270,56 +7892,6 @@
},
"expiration_date": null
},
- {
- "id": "nousresearch/deephermes-3-mistral-24b-preview",
- "canonical_slug": "nousresearch/deephermes-3-mistral-24b-preview",
- "hugging_face_id": "NousResearch/DeepHermes-3-Mistral-24B-Preview",
- "name": "Nous: DeepHermes 3 Mistral 24B Preview",
- "created": 1746830904,
- "description": "DeepHermes 3 (Mistral 24B Preview) is an instruction-tuned language model by Nous Research based on Mistral-Small-24B, designed for chat, function calling, and advanced multi-turn reasoning. It introduces a dual-mode system that toggles between intuitive chat responses and structured “deep reasoning” mode using special system prompts. Fine-tuned via distillation from R1, it supports structured output (JSON mode) and function call syntax for agent-based applications.\n\nDeepHermes 3 supports a **reasoning toggle via system prompt**, allowing users to switch between fast, intuitive responses and deliberate, multi-step reasoning. When activated with the following specific system instruction, the model enters a *\"deep thinking\"* mode—generating extended chains of thought wrapped in `` tags before delivering a final answer. \n\nSystem Prompt: You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside tags, and then provide your solution or response to the problem.\n",
- "context_length": 32768,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Other",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.00000002",
- "completion": "0.0000001",
- "input_cache_read": "0.00000001"
- },
- "top_provider": {
- "context_length": 32768,
- "max_completion_tokens": 32768,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "include_reasoning",
- "max_tokens",
- "presence_penalty",
- "reasoning",
- "repetition_penalty",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "temperature",
- "tool_choice",
- "tools",
- "top_k",
- "top_p"
- ],
- "default_parameters": {},
- "expiration_date": null
- },
{
"id": "nousresearch/hermes-2-pro-llama-3-8b",
"canonical_slug": "nousresearch/hermes-2-pro-llama-3-8b",
@@ -8561,13 +8133,12 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000011",
- "completion": "0.00000038",
- "input_cache_read": "0.000000055"
+ "prompt": "0.00000013",
+ "completion": "0.0000004"
},
"top_provider": {
"context_length": 131072,
- "max_completion_tokens": 131072,
+ "max_completion_tokens": null,
"is_moderated": false
},
"per_request_limits": null,
@@ -8579,12 +8150,7 @@
"reasoning",
"repetition_penalty",
"response_format",
- "seed",
- "stop",
- "structured_outputs",
"temperature",
- "tool_choice",
- "tools",
"top_k",
"top_p"
],
@@ -8638,51 +8204,6 @@
"default_parameters": {},
"expiration_date": null
},
- {
- "id": "nvidia/llama-3.1-nemotron-ultra-253b-v1",
- "canonical_slug": "nvidia/llama-3.1-nemotron-ultra-253b-v1",
- "hugging_face_id": "nvidia/Llama-3_1-Nemotron-Ultra-253B-v1",
- "name": "NVIDIA: Llama 3.1 Nemotron Ultra 253B v1",
- "created": 1744115059,
- "description": "Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) optimized for advanced reasoning, human-interactive chat, retrieval-augmented generation (RAG), and tool-calling tasks. Derived from Meta’s Llama-3.1-405B-Instruct, it has been significantly customized using Neural Architecture Search (NAS), resulting in enhanced efficiency, reduced memory usage, and improved inference latency. The model supports a context length of up to 128K tokens and can operate efficiently on an 8x NVIDIA H100 node.\n\nNote: you must include `detailed thinking on` in the system prompt to enable reasoning. Please see [Usage Recommendations](https://huggingface.co/nvidia/Llama-3_1-Nemotron-Ultra-253B-v1#quick-start-and-usage-recommendations) for more.",
- "context_length": 131072,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Llama3",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.0000006",
- "completion": "0.0000018"
- },
- "top_provider": {
- "context_length": 131072,
- "max_completion_tokens": null,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "include_reasoning",
- "max_tokens",
- "presence_penalty",
- "reasoning",
- "repetition_penalty",
- "response_format",
- "structured_outputs",
- "temperature",
- "top_k",
- "top_p"
- ],
- "default_parameters": {},
- "expiration_date": null
- },
{
"id": "nvidia/llama-3.3-nemotron-super-49b-v1.5",
"canonical_slug": "nvidia/llama-3.3-nemotron-super-49b-v1.5",
@@ -8738,7 +8259,7 @@
"hugging_face_id": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
"name": "NVIDIA: Nemotron 3 Nano 30B A3B",
"created": 1765731275,
- "description": "NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.\n\nThe model is fully open with open-weights, datasets and recipes so developers can easily\ncustomize, optimize, and deploy the model on their infrastructure for maximum privacy and\nsecurity.\n\nNote: For the free endpoint, all prompts and output are logged to improve the provider's model and its product and services. Please do not upload any personal, confidential, or otherwise sensitive information. This is a trial use only. Do not use for production or business-critical systems.",
+ "description": "NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.\n\nThe model is fully open with open-weights, datasets and recipes so developers can easily\ncustomize, optimize, and deploy the model on their infrastructure for maximum privacy and\nsecurity.",
"context_length": 262144,
"architecture": {
"modality": "text->text",
@@ -8772,7 +8293,6 @@
"response_format",
"seed",
"stop",
- "structured_outputs",
"temperature",
"tool_choice",
"tools",
@@ -8792,7 +8312,7 @@
"hugging_face_id": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16",
"name": "NVIDIA: Nemotron 3 Nano 30B A3B (free)",
"created": 1765731275,
- "description": "NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.\n\nThe model is fully open with open-weights, datasets and recipes so developers can easily\ncustomize, optimize, and deploy the model on their infrastructure for maximum privacy and\nsecurity.\n\nNote: For the free endpoint, all prompts and output are logged to improve the provider's model and its product and services. Please do not upload any personal, confidential, or otherwise sensitive information. This is a trial use only. Do not use for production or business-critical systems.",
+ "description": "NVIDIA Nemotron 3 Nano 30B A3B is a small language MoE model with highest compute efficiency and accuracy for developers to build specialized agentic AI systems.\n\nThe model is fully open with open-weights, datasets and recipes so developers can easily\ncustomize, optimize, and deploy the model on their infrastructure for maximum privacy and\nsecurity.",
"context_length": 256000,
"architecture": {
"modality": "text->text",
@@ -8980,7 +8500,11 @@
"top_k",
"top_p"
],
- "default_parameters": {},
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
"expiration_date": null
},
{
@@ -9024,55 +8548,12 @@
"tools",
"top_p"
],
- "default_parameters": {},
- "expiration_date": null
- },
- {
- "id": "openai/chatgpt-4o-latest",
- "canonical_slug": "openai/chatgpt-4o-latest",
- "hugging_face_id": null,
- "name": "OpenAI: ChatGPT-4o",
- "created": 1723593600,
- "description": "OpenAI ChatGPT 4o is continually updated by OpenAI to point to the current version of GPT-4o used by ChatGPT. It therefore differs slightly from the API version of [GPT-4o](/models/openai/gpt-4o) in that it has additional RLHF. It is intended for research and evaluation.\n\nOpenAI notes that this model is not suited for production use-cases as it may be removed or redirected to another model in the future.",
- "context_length": 128000,
- "architecture": {
- "modality": "text+image->text",
- "input_modalities": [
- "text",
- "image"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "GPT",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.000005",
- "completion": "0.000015"
- },
- "top_provider": {
- "context_length": 128000,
- "max_completion_tokens": 16384,
- "is_moderated": true
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
},
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "logit_bias",
- "logprobs",
- "max_tokens",
- "presence_penalty",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "temperature",
- "top_logprobs",
- "top_p"
- ],
- "default_parameters": {},
- "expiration_date": "2026-02-17"
+ "expiration_date": null
},
{
"id": "openai/gpt-3.5-turbo",
@@ -10911,20 +10392,14 @@
},
"per_request_limits": null,
"supported_parameters": [
- "frequency_penalty",
"include_reasoning",
- "logit_bias",
- "logprobs",
"max_tokens",
- "presence_penalty",
"reasoning",
"response_format",
"seed",
- "stop",
"structured_outputs",
"tool_choice",
- "tools",
- "top_logprobs"
+ "tools"
],
"default_parameters": {
"temperature": null,
@@ -10934,18 +10409,19 @@
"expiration_date": null
},
{
- "id": "openai/gpt-5.4",
- "canonical_slug": "openai/gpt-5.4-20260305",
+ "id": "openai/gpt-5.2-pro",
+ "canonical_slug": "openai/gpt-5.2-pro-20251211",
"hugging_face_id": "",
- "name": "OpenAI: GPT-5.4",
- "created": 1772668800,
- "description": "GPT-5.4 is the latest agentic coding model in the GPT-5 series, succeeding GPT-5.3-Codex with improved reasoning, steerability, and code quality across interactive and long-running software engineering workflows.",
+ "name": "OpenAI: GPT-5.2 Pro",
+ "created": 1765389780,
+ "description": "GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like \"think hard about this.\" Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.",
"context_length": 400000,
"architecture": {
- "modality": "text+image->text",
+ "modality": "text+image+file->text",
"input_modalities": [
+ "image",
"text",
- "image"
+ "file"
],
"output_modalities": [
"text"
@@ -10954,10 +10430,9 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000175",
- "completion": "0.000014",
- "web_search": "0.01",
- "input_cache_read": "0.000000175"
+ "prompt": "0.000021",
+ "completion": "0.000168",
+ "web_search": "0.01"
},
"top_provider": {
"context_length": 400000,
@@ -10966,20 +10441,14 @@
},
"per_request_limits": null,
"supported_parameters": [
- "frequency_penalty",
"include_reasoning",
- "logit_bias",
- "logprobs",
"max_tokens",
- "presence_penalty",
"reasoning",
"response_format",
"seed",
- "stop",
"structured_outputs",
"tool_choice",
- "tools",
- "top_logprobs"
+ "tools"
],
"default_parameters": {
"temperature": null,
@@ -10989,150 +10458,8 @@
"expiration_date": null
},
{
- "id": "openai/gpt-5.3-codex",
- "canonical_slug": "openai/gpt-5.3-codex-20260303",
- "hugging_face_id": "",
- "name": "OpenAI: GPT-5.3-Codex",
- "created": 1772150400,
- "description": "GPT-5.3-Codex is the latest agentic coding model in the GPT-5 series, succeeding GPT-5.2-Codex with improved reasoning, steerability, and code quality across interactive and long-running software engineering workflows.",
- "context_length": 400000,
- "architecture": {
- "modality": "text+image->text",
- "input_modalities": [
- "text",
- "image"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "GPT",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.00000175",
- "completion": "0.000014",
- "web_search": "0.01",
- "input_cache_read": "0.000000175"
- },
- "top_provider": {
- "context_length": 400000,
- "max_completion_tokens": 128000,
- "is_moderated": true
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "include_reasoning",
- "logit_bias",
- "logprobs",
- "max_tokens",
- "presence_penalty",
- "reasoning",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "tool_choice",
- "tools",
- "top_logprobs"
- ],
- "default_parameters": {
- "temperature": null,
- "top_p": null,
- "frequency_penalty": null
- },
- "expiration_date": null
- },
- {
- "id": "openai/gpt-5.3-codex-spark",
- "canonical_slug": "openai/gpt-5.3-codex-spark-20260212",
- "hugging_face_id": "",
- "name": "OpenAI: GPT-5.3-Codex-Spark",
- "created": 1739491200,
- "description": "GPT-5.3-Codex-Spark is a smaller, faster distillation of GPT-5.3-Codex optimized for real-time coding on Cerebras WSE-3 hardware. CLI-only (no API). 128k context, text-only, 1000+ tok/s.",
- "context_length": 128000,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "GPT",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0",
- "completion": "0"
- },
- "top_provider": {
- "context_length": 128000,
- "max_completion_tokens": 16384,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "max_tokens",
- "reasoning",
- "tool_choice",
- "tools"
- ],
- "default_parameters": {},
- "expiration_date": null
- },
- {
- "id": "openai/gpt-5.2-pro",
- "canonical_slug": "openai/gpt-5.2-pro-20251211",
- "hugging_face_id": "",
- "name": "OpenAI: GPT-5.2 Pro",
- "created": 1765389780,
- "description": "GPT-5.2 Pro is OpenAI’s most advanced model, offering major improvements in agentic coding and long context performance over GPT-5 Pro. It is optimized for complex tasks that require step-by-step reasoning, instruction following, and accuracy in high-stakes use cases. It supports test-time routing features and advanced prompt understanding, including user-specified intent like \"think hard about this.\" Improvements include reductions in hallucination, sycophancy, and better performance in coding, writing, and health-related tasks.",
- "context_length": 400000,
- "architecture": {
- "modality": "text+image+file->text",
- "input_modalities": [
- "image",
- "text",
- "file"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "GPT",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.000021",
- "completion": "0.000168",
- "web_search": "0.01"
- },
- "top_provider": {
- "context_length": 400000,
- "max_completion_tokens": 128000,
- "is_moderated": true
- },
- "per_request_limits": null,
- "supported_parameters": [
- "include_reasoning",
- "max_tokens",
- "reasoning",
- "response_format",
- "seed",
- "structured_outputs",
- "tool_choice",
- "tools"
- ],
- "default_parameters": {
- "temperature": null,
- "top_p": null,
- "frequency_penalty": null
- },
- "expiration_date": null
- },
- {
- "id": "openai/gpt-audio",
- "canonical_slug": "openai/gpt-audio",
+ "id": "openai/gpt-audio",
+ "canonical_slug": "openai/gpt-audio",
"hugging_face_id": "",
"name": "OpenAI: GPT Audio",
"created": 1768862569,
@@ -12025,60 +11352,13 @@
},
"expiration_date": null
},
- {
- "id": "opengvlab/internvl3-78b",
- "canonical_slug": "opengvlab/internvl3-78b",
- "hugging_face_id": "OpenGVLab/InternVL3-78B",
- "name": "OpenGVLab: InternVL3 78B",
- "created": 1757962555,
- "description": "The InternVL3 series is an advanced multimodal large language model (MLLM). Compared to InternVL 2.5, InternVL3 demonstrates stronger multimodal perception and reasoning capabilities. \n\nIn addition, InternVL3 is benchmarked against the Qwen2.5 Chat models, whose pre-trained base models serve as the initialization for its language component. Benefiting from Native Multimodal Pre-Training, the InternVL3 series surpasses the Qwen2.5 series in overall text performance.",
- "context_length": 32768,
- "architecture": {
- "modality": "text+image->text",
- "input_modalities": [
- "image",
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Other",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.00000015",
- "completion": "0.0000006",
- "input_cache_read": "0.000000075"
- },
- "top_provider": {
- "context_length": 32768,
- "max_completion_tokens": 32768,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "max_tokens",
- "presence_penalty",
- "repetition_penalty",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "temperature",
- "top_k",
- "top_p"
- ],
- "default_parameters": {},
- "expiration_date": null
- },
{
"id": "openrouter/auto",
"canonical_slug": "openrouter/auto",
"hugging_face_id": null,
"name": "Auto Router",
"created": 1699401600,
- "description": "Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output.\n\nTo see which model was used, visit [Activity](/activity), or read the `model` attribute of the response. Your response will be priced at the same rate as the routed model.\n\nLearn more, including how to customize the models for routing, in our [docs](/docs/guides/routing/routers/auto-router).\n\nRequests will be routed to the following models:\n- [anthropic/claude-3-haiku](/anthropic/claude-3-haiku)\n- [anthropic/claude-3.5-haiku](/anthropic/claude-3.5-haiku)\n- [anthropic/claude-3.7-sonnet](/anthropic/claude-3.7-sonnet)\n- [anthropic/claude-haiku-4.5](/anthropic/claude-haiku-4.5)\n- [anthropic/claude-opus-4](/anthropic/claude-opus-4)\n- [anthropic/claude-opus-4.1](/anthropic/claude-opus-4.1)\n- [anthropic/claude-opus-4.5](/anthropic/claude-opus-4.5)\n- [anthropic/claude-sonnet-4](/anthropic/claude-sonnet-4)\n- [anthropic/claude-sonnet-4.5](/anthropic/claude-sonnet-4.5)\n- [cohere/command-r-08-2024](/cohere/command-r-08-2024)\n- [cohere/command-r-plus-08-2024](/cohere/command-r-plus-08-2024)\n- [deepseek/deepseek-r1](/deepseek/deepseek-r1)\n- [google/gemini-2.0-flash-001](/google/gemini-2.0-flash-001)\n- [google/gemini-2.5-flash](/google/gemini-2.5-flash)\n- [google/gemini-3.1-pro-preview](/google/gemini-3.1-pro-preview)\n- [google/gemini-3-flash-preview](/google/gemini-3-flash-preview)\n- [google/gemini-3-pro-preview](/google/gemini-3-pro-preview)\n- [meta-llama/llama-3-70b-instruct](/meta-llama/llama-3-70b-instruct)\n- [meta-llama/llama-3-8b-instruct](/meta-llama/llama-3-8b-instruct)\n- [meta-llama/llama-3.1-405b-instruct](/meta-llama/llama-3.1-405b-instruct)\n- [meta-llama/llama-3.1-70b-instruct](/meta-llama/llama-3.1-70b-instruct)\n- [meta-llama/llama-3.1-8b-instruct](/meta-llama/llama-3.1-8b-instruct)\n- [meta-llama/llama-3.3-70b-instruct](/meta-llama/llama-3.3-70b-instruct)\n- [mistralai/codestral-2508](/mistralai/codestral-2508)\n- [mistralai/mistral-7b-instruct](/mistralai/mistral-7b-instruct)\n- [mistralai/mistral-large](/mistralai/mistral-large)\n- [mistralai/mistral-large-2407](/mistralai/mistral-large-2407)\n- [mistralai/mistral-large-2411](/mistralai/mistral-large-2411)\n- [mistralai/mistral-medium-3.1](/mistralai/mistral-medium-3.1)\n- [mistralai/mistral-nemo](/mistralai/mistral-nemo)\n- [mistralai/mistral-small-3.2-24b-instruct-2506](/mistralai/mistral-small-3.2-24b-instruct-2506)\n- [mistralai/mixtral-8x22b-instruct](/mistralai/mixtral-8x22b-instruct)\n- [mistralai/mixtral-8x7b-instruct](/mistralai/mixtral-8x7b-instruct)\n- [moonshotai/kimi-k2-thinking](/moonshotai/kimi-k2-thinking)\n- [openai/chatgpt-4o-latest](/openai/chatgpt-4o-latest)\n- [openai/gpt-3.5-turbo](/openai/gpt-3.5-turbo)\n- [openai/gpt-4](/openai/gpt-4)\n- [openai/gpt-4-1106-preview](/openai/gpt-4-1106-preview)\n- [openai/gpt-4-turbo](/openai/gpt-4-turbo)\n- [openai/gpt-4-turbo-preview](/openai/gpt-4-turbo-preview)\n- [openai/gpt-4.1](/openai/gpt-4.1)\n- [openai/gpt-4.1-mini](/openai/gpt-4.1-mini)\n- [openai/gpt-4.1-nano](/openai/gpt-4.1-nano)\n- [openai/gpt-4o](/openai/gpt-4o)\n- [openai/gpt-4o-2024-05-13](/openai/gpt-4o-2024-05-13)\n- [openai/gpt-4o-2024-08-06](/openai/gpt-4o-2024-08-06)\n- [openai/gpt-4o-2024-11-20](/openai/gpt-4o-2024-11-20)\n- [openai/gpt-4o-mini](/openai/gpt-4o-mini)\n- [openai/gpt-4o-mini-2024-07-18](/openai/gpt-4o-mini-2024-07-18)\n- [openai/gpt-5](/openai/gpt-5)\n- [openai/gpt-5-mini](/openai/gpt-5-mini)\n- [openai/gpt-5-nano](/openai/gpt-5-nano)\n- [openai/gpt-5.1](/openai/gpt-5.1)\n- [openai/gpt-5.2](/openai/gpt-5.2)\n- [openai/gpt-5.2-pro](/openai/gpt-5.2-pro)\n- [openai/gpt-oss-120b](/openai/gpt-oss-120b)\n- [perplexity/sonar](/perplexity/sonar)\n- [qwen/qwen3-14b](/qwen/qwen3-14b)\n- [qwen/qwen3-235b-a22b](/qwen/qwen3-235b-a22b)\n- [qwen/qwen3-32b](/qwen/qwen3-32b)\n- [x-ai/grok-3](/x-ai/grok-3)\n- [x-ai/grok-3-mini](/x-ai/grok-3-mini)\n- [x-ai/grok-4](/x-ai/grok-4)",
+ "description": "Your prompt will be processed by a meta-model and routed to one of dozens of models (see below), optimizing for the best possible output.\n\nTo see which model was used, visit [Activity](/activity), or read the `model` attribute of the response. Your response will be priced at the same rate as the routed model.\n\nLearn more, including how to customize the models for routing, in our [docs](/docs/guides/routing/routers/auto-router).\n\nRequests will be routed to the following models:\n- [anthropic/claude-haiku-4.5](/anthropic/claude-haiku-4.5)\n- [anthropic/claude-opus-4.6](/anthropic/claude-opus-4.6)\n- [anthropic/claude-sonnet-4.5](/anthropic/claude-sonnet-4.5)\n- [deepseek/deepseek-r1](/deepseek/deepseek-r1)\n- [google/gemini-2.5-flash-lite](/google/gemini-2.5-flash-lite)\n- [google/gemini-3-flash-preview](/google/gemini-3-flash-preview)\n- [google/gemini-3-pro-preview](/google/gemini-3-pro-preview)\n- [meta-llama/llama-3.3-70b-instruct](/meta-llama/llama-3.3-70b-instruct)\n- [mistralai/codestral-2508](/mistralai/codestral-2508)\n- [mistralai/mistral-large](/mistralai/mistral-large)\n- [mistralai/mistral-medium-3.1](/mistralai/mistral-medium-3.1)\n- [mistralai/mistral-small-3.2-24b-instruct-2506](/mistralai/mistral-small-3.2-24b-instruct-2506)\n- [moonshotai/kimi-k2-thinking](/moonshotai/kimi-k2-thinking)\n- [moonshotai/kimi-k2.5](/moonshotai/kimi-k2.5)\n- [openai/gpt-5](/openai/gpt-5)\n- [openai/gpt-5-mini](/openai/gpt-5-mini)\n- [openai/gpt-5-nano](/openai/gpt-5-nano)\n- [openai/gpt-5.1](/openai/gpt-5.1)\n- [openai/gpt-5.2](/openai/gpt-5.2)\n- [openai/gpt-5.2-pro](/openai/gpt-5.2-pro)\n- [openai/gpt-oss-120b](/openai/gpt-oss-120b)\n- [perplexity/sonar](/perplexity/sonar)\n- [qwen/qwen3-235b-a22b](/qwen/qwen3-235b-a22b)\n- [x-ai/grok-3](/x-ai/grok-3)\n- [x-ai/grok-3-mini](/x-ai/grok-3-mini)\n- [x-ai/grok-4](/x-ai/grok-4)",
"context_length": 2000000,
"architecture": {
"modality": "text+image+file+audio+video->text+image",
@@ -12228,56 +11508,6 @@
},
"expiration_date": null
},
- {
- "id": "openrouter/pony-alpha",
- "canonical_slug": "openrouter/pony-alpha",
- "hugging_face_id": "",
- "name": "Pony Alpha",
- "created": 1770393855,
- "description": "Pony is a cutting-edge foundation model with strong performance in coding, agentic workflows, reasoning, and roleplay, making it well suited for hands-on coding and real-world use.\n\n**Note:** All prompts and completions for this model are logged by the provider and may be used to improve the model.",
- "context_length": 200000,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Other",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0",
- "completion": "0",
- "request": "0",
- "image": "0",
- "web_search": "0",
- "internal_reasoning": "0"
- },
- "top_provider": {
- "context_length": 200000,
- "max_completion_tokens": 131000,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "include_reasoning",
- "max_tokens",
- "reasoning",
- "response_format",
- "structured_outputs",
- "temperature",
- "tools",
- "top_p"
- ],
- "default_parameters": {
- "temperature": 1,
- "top_p": 0.95,
- "frequency_penalty": null
- },
- "expiration_date": null
- },
{
"id": "perplexity/sonar",
"canonical_slug": "perplexity/sonar",
@@ -12301,10 +11531,7 @@
"pricing": {
"prompt": "0.000001",
"completion": "0.000001",
- "request": "0.005",
- "image": "0",
- "web_search": "0",
- "internal_reasoning": "0"
+ "web_search": "0.005"
},
"top_provider": {
"context_length": 127072,
@@ -12346,8 +11573,6 @@
"pricing": {
"prompt": "0.000002",
"completion": "0.000008",
- "request": "0",
- "image": "0",
"web_search": "0.005",
"internal_reasoning": "0.000003"
},
@@ -12394,10 +11619,7 @@
"pricing": {
"prompt": "0.000003",
"completion": "0.000015",
- "request": "0",
- "image": "0",
- "web_search": "0.005",
- "internal_reasoning": "0"
+ "web_search": "0.005"
},
"top_provider": {
"context_length": 200000,
@@ -12440,10 +11662,7 @@
"pricing": {
"prompt": "0.000003",
"completion": "0.000015",
- "request": "0.018",
- "image": "0",
- "web_search": "0",
- "internal_reasoning": "0"
+ "web_search": "0.018"
},
"top_provider": {
"context_length": 200000,
@@ -12493,10 +11712,7 @@
"pricing": {
"prompt": "0.000002",
"completion": "0.000008",
- "request": "0",
- "image": "0",
- "web_search": "0.005",
- "internal_reasoning": "0"
+ "web_search": "0.005"
},
"top_provider": {
"context_length": 128000,
@@ -12603,7 +11819,6 @@
"per_request_limits": null,
"supported_parameters": [
"frequency_penalty",
- "logit_bias",
"max_tokens",
"min_p",
"presence_penalty",
@@ -12657,6 +11872,7 @@
"min_p",
"presence_penalty",
"repetition_penalty",
+ "response_format",
"seed",
"stop",
"temperature",
@@ -12692,13 +11908,12 @@
"instruct_type": "chatml"
},
"pricing": {
- "prompt": "0.00000003",
- "completion": "0.00000011",
- "input_cache_read": "0.000000015"
+ "prompt": "0.00000020000000000000002",
+ "completion": "0.00000020000000000000002"
},
"top_provider": {
"context_length": 32768,
- "max_completion_tokens": 32768,
+ "max_completion_tokens": 8192,
"is_moderated": false
},
"per_request_limits": null,
@@ -12709,10 +11924,8 @@
"min_p",
"presence_penalty",
"repetition_penalty",
- "response_format",
"seed",
"stop",
- "structured_outputs",
"temperature",
"top_k",
"top_p"
@@ -12741,8 +11954,8 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.0000002",
- "completion": "0.0000002"
+ "prompt": "0.00000020000000000000002",
+ "completion": "0.00000020000000000000002"
},
"top_provider": {
"context_length": 32768,
@@ -12788,11 +12001,7 @@
"pricing": {
"prompt": "0.0000016",
"completion": "0.0000064",
- "request": "0",
- "image": "0",
- "web_search": "0",
- "internal_reasoning": "0",
- "input_cache_read": "0.00000064"
+ "input_cache_read": "0.00000032"
},
"top_provider": {
"context_length": 32768,
@@ -12820,7 +12029,7 @@
"name": "Qwen: Qwen-Plus",
"created": 1738409840,
"description": "Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.",
- "context_length": 131072,
+ "context_length": 1000000,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -12835,15 +12044,11 @@
"pricing": {
"prompt": "0.0000004",
"completion": "0.0000012",
- "request": "0",
- "image": "0",
- "web_search": "0",
- "internal_reasoning": "0",
- "input_cache_read": "0.00000016"
+ "input_cache_read": "0.00000008"
},
"top_provider": {
- "context_length": 131072,
- "max_completion_tokens": 8192,
+ "context_length": 1000000,
+ "max_completion_tokens": 32768,
"is_moderated": false
},
"per_request_limits": null,
@@ -12881,11 +12086,7 @@
},
"pricing": {
"prompt": "0.0000004",
- "completion": "0.0000012",
- "request": "0",
- "image": "0",
- "web_search": "0",
- "internal_reasoning": "0"
+ "completion": "0.0000012"
},
"top_provider": {
"context_length": 1000000,
@@ -12932,11 +12133,7 @@
},
"pricing": {
"prompt": "0.0000004",
- "completion": "0.000004",
- "request": "0",
- "image": "0",
- "web_search": "0",
- "internal_reasoning": "0"
+ "completion": "0.0000012"
},
"top_provider": {
"context_length": 1000000,
@@ -12971,7 +12168,7 @@
"name": "Qwen: Qwen-Turbo",
"created": 1738410974,
"description": "Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks.",
- "context_length": 1000000,
+ "context_length": 131072,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -12986,14 +12183,10 @@
"pricing": {
"prompt": "0.00000005",
"completion": "0.0000002",
- "request": "0",
- "image": "0",
- "web_search": "0",
- "internal_reasoning": "0",
- "input_cache_read": "0.00000002"
+ "input_cache_read": "0.00000001"
},
"top_provider": {
- "context_length": 1000000,
+ "context_length": 131072,
"max_completion_tokens": 8192,
"is_moderated": false
},
@@ -13033,15 +12226,11 @@
},
"pricing": {
"prompt": "0.0000008",
- "completion": "0.0000032",
- "request": "0",
- "image": "0.001024",
- "web_search": "0",
- "internal_reasoning": "0"
+ "completion": "0.0000032"
},
"top_provider": {
"context_length": 131072,
- "max_completion_tokens": 8192,
+ "max_completion_tokens": 32768,
"is_moderated": false
},
"per_request_limits": null,
@@ -13069,7 +12258,7 @@
"name": "Qwen: Qwen VL Plus",
"created": 1738731255,
"description": "Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for image input. It delivers significant performance across a broad range of visual tasks.\n",
- "context_length": 7500,
+ "context_length": 131072,
"architecture": {
"modality": "text+image->text",
"input_modalities": [
@@ -13085,14 +12274,11 @@
"pricing": {
"prompt": "0.00000021",
"completion": "0.00000063",
- "request": "0",
- "image": "0.0002688",
- "web_search": "0",
- "internal_reasoning": "0"
+ "input_cache_read": "0.000000042"
},
"top_provider": {
- "context_length": 7500,
- "max_completion_tokens": 1500,
+ "context_length": 131072,
+ "max_completion_tokens": 8192,
"is_moderated": false
},
"per_request_limits": null,
@@ -13157,7 +12343,7 @@
"name": "Qwen: Qwen2.5 VL 32B Instruct",
"created": 1742839838,
"description": "Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation.",
- "context_length": 16384,
+ "context_length": 128000,
"architecture": {
"modality": "text+image->text",
"input_modalities": [
@@ -13171,20 +12357,17 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000005",
- "completion": "0.00000022",
- "input_cache_read": "0.000000025"
+ "prompt": "0.0000002",
+ "completion": "0.0000006"
},
"top_provider": {
- "context_length": 16384,
- "max_completion_tokens": 16384,
+ "context_length": 128000,
+ "max_completion_tokens": null,
"is_moderated": false
},
"per_request_limits": null,
"supported_parameters": [
"frequency_penalty",
- "logit_bias",
- "logprobs",
"max_tokens",
"min_p",
"presence_penalty",
@@ -13192,10 +12375,8 @@
"response_format",
"seed",
"stop",
- "structured_outputs",
"temperature",
"top_k",
- "top_logprobs",
"top_p"
],
"default_parameters": {},
@@ -13222,9 +12403,8 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000015",
- "completion": "0.0000006",
- "input_cache_read": "0.000000075"
+ "prompt": "0.0000008",
+ "completion": "0.0000008"
},
"top_provider": {
"context_length": 32768,
@@ -13236,7 +12416,6 @@
"frequency_penalty",
"logit_bias",
"max_tokens",
- "min_p",
"presence_penalty",
"repetition_penalty",
"response_format",
@@ -13248,7 +12427,7 @@
"top_p"
],
"default_parameters": {},
- "expiration_date": "2026-02-16"
+ "expiration_date": null
},
{
"id": "qwen/qwen3-14b",
@@ -13270,9 +12449,8 @@
"instruct_type": "qwen3"
},
"pricing": {
- "prompt": "0.00000005",
- "completion": "0.00000022",
- "input_cache_read": "0.000000025"
+ "prompt": "0.00000006",
+ "completion": "0.00000024"
},
"top_provider": {
"context_length": 40960,
@@ -13308,7 +12486,7 @@
"name": "Qwen: Qwen3 235B A22B",
"created": 1745875757,
"description": "Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a \"thinking\" mode for complex reasoning, math, and code tasks, and a \"non-thinking\" mode for general conversational efficiency. The model demonstrates strong reasoning ability, multilingual support (100+ languages and dialects), advanced instruction-following, and agent tool-calling capabilities. It natively handles a 32K token context window and extends up to 131K tokens using YaRN-based scaling.",
- "context_length": 40960,
+ "context_length": 131072,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -13321,34 +12499,25 @@
"instruct_type": "qwen3"
},
"pricing": {
- "prompt": "0.0000002",
- "completion": "0.0000006"
+ "prompt": "0.000000455",
+ "completion": "0.00000182"
},
"top_provider": {
- "context_length": 40960,
- "max_completion_tokens": null,
+ "context_length": 131072,
+ "max_completion_tokens": 8192,
"is_moderated": false
},
"per_request_limits": null,
"supported_parameters": [
- "frequency_penalty",
"include_reasoning",
- "logit_bias",
- "logprobs",
"max_tokens",
- "min_p",
"presence_penalty",
"reasoning",
- "repetition_penalty",
"response_format",
"seed",
- "stop",
- "structured_outputs",
"temperature",
"tool_choice",
"tools",
- "top_k",
- "top_logprobs",
"top_p"
],
"default_parameters": {},
@@ -13415,7 +12584,7 @@
"name": "Qwen: Qwen3 235B A22B Thinking 2507",
"created": 1753449557,
"description": "Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144 tokens of context. This \"thinking-only\" variant enhances structured logical reasoning, mathematics, science, and long-form generation, showing strong benchmark performance across AIME, SuperGPQA, LiveCodeBench, and MMLU-Redux. It enforces a special reasoning mode () and is designed for high-token outputs (up to 81,920 tokens) in challenging domains.\n\nThe model is instruction-tuned and excels at step-by-step reasoning, tool use, agentic workflows, and multilingual tasks. This release represents the most capable open-source variant in the Qwen3-235B series, surpassing many closed models in structured reasoning use cases.",
- "context_length": 262144,
+ "context_length": 131072,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -13428,13 +12597,16 @@
"instruct_type": "qwen3"
},
"pricing": {
- "prompt": "0.00000011",
- "completion": "0.0000006",
- "input_cache_read": "0.000000055"
+ "prompt": "0",
+ "completion": "0",
+ "request": "0",
+ "image": "0",
+ "web_search": "0",
+ "internal_reasoning": "0"
},
"top_provider": {
- "context_length": 262144,
- "max_completion_tokens": 262144,
+ "context_length": 131072,
+ "max_completion_tokens": null,
"is_moderated": false
},
"per_request_limits": null,
@@ -13484,9 +12656,8 @@
"instruct_type": "qwen3"
},
"pricing": {
- "prompt": "0.00000006",
- "completion": "0.00000022",
- "input_cache_read": "0.00000003"
+ "prompt": "0.00000008",
+ "completion": "0.00000028"
},
"top_provider": {
"context_length": 40960,
@@ -13539,9 +12710,8 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000008",
- "completion": "0.00000033",
- "input_cache_read": "0.00000004"
+ "prompt": "0.00000009",
+ "completion": "0.0000003"
},
"top_provider": {
"context_length": 262144,
@@ -13556,7 +12726,6 @@
"repetition_penalty",
"response_format",
"seed",
- "stop",
"structured_outputs",
"temperature",
"tool_choice",
@@ -13648,7 +12817,6 @@
"supported_parameters": [
"frequency_penalty",
"include_reasoning",
- "logprobs",
"max_tokens",
"min_p",
"presence_penalty",
@@ -13662,7 +12830,6 @@
"tool_choice",
"tools",
"top_k",
- "top_logprobs",
"top_p"
],
"default_parameters": {},
@@ -13722,7 +12889,7 @@
"name": "Qwen: Qwen3 8B",
"created": 1745876632,
"description": "Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between \"thinking\" mode for math, coding, and logical inference, and \"non-thinking\" mode for general conversation. The model is fine-tuned for instruction-following, agent integration, creative writing, and multilingual use across 100+ languages and dialects. It natively supports a 32K token context window and can extend to 131K tokens with YaRN scaling.",
- "context_length": 32000,
+ "context_length": 40960,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -13740,7 +12907,7 @@
"input_cache_read": "0.00000005"
},
"top_provider": {
- "context_length": 32000,
+ "context_length": 40960,
"max_completion_tokens": 8192,
"is_moderated": false
},
@@ -13748,8 +12915,10 @@
"supported_parameters": [
"include_reasoning",
"max_tokens",
+ "presence_penalty",
"reasoning",
"response_format",
+ "seed",
"structured_outputs",
"temperature",
"tool_choice",
@@ -13796,11 +12965,9 @@
"supported_parameters": [
"frequency_penalty",
"logit_bias",
- "logprobs",
"max_tokens",
"min_p",
"presence_penalty",
- "reasoning",
"repetition_penalty",
"response_format",
"seed",
@@ -13810,7 +12977,6 @@
"tool_choice",
"tools",
"top_k",
- "top_logprobs",
"top_p"
],
"default_parameters": {},
@@ -13870,7 +13036,7 @@
"name": "Qwen: Qwen3 Coder Flash",
"created": 1758115536,
"description": "Qwen3 Coder Flash is Alibaba's fast and cost efficient version of their proprietary Qwen3 Coder Plus. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities.",
- "context_length": 128000,
+ "context_length": 1000000,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -13885,14 +13051,10 @@
"pricing": {
"prompt": "0.0000003",
"completion": "0.0000015",
- "request": "0",
- "image": "0",
- "web_search": "0",
- "internal_reasoning": "0",
- "input_cache_read": "0.00000008"
+ "input_cache_read": "0.00000006"
},
"top_provider": {
- "context_length": 128000,
+ "context_length": 1000000,
"max_completion_tokens": 65536,
"is_moderated": false
},
@@ -13934,9 +13096,9 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000007",
- "completion": "0.0000003",
- "input_cache_read": "0.000000035"
+ "prompt": "0.00000012",
+ "completion": "0.00000075",
+ "input_cache_read": "0.00000006"
},
"top_provider": {
"context_length": 262144,
@@ -13975,7 +13137,7 @@
"name": "Qwen: Qwen3 Coder Plus",
"created": 1758662707,
"description": "Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and environment interaction, combining coding proficiency with versatile general-purpose abilities.",
- "context_length": 128000,
+ "context_length": 1000000,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -13990,14 +13152,10 @@
"pricing": {
"prompt": "0.000001",
"completion": "0.000005",
- "request": "0",
- "image": "0",
- "web_search": "0",
- "internal_reasoning": "0",
- "input_cache_read": "0.0000001"
+ "input_cache_read": "0.0000002"
},
"top_provider": {
- "context_length": 128000,
+ "context_length": 1000000,
"max_completion_tokens": 65536,
"is_moderated": false
},
@@ -14118,7 +13276,7 @@
"name": "Qwen: Qwen3 Max",
"created": 1758662808,
"description": "Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It delivers higher accuracy in math, coding, logic, and science tasks, follows complex instructions in Chinese and English more reliably, reduces hallucinations, and produces higher-quality responses for open-ended Q&A, writing, and conversation. The model supports over 100 languages with stronger translation and commonsense reasoning, and is optimized for retrieval-augmented generation (RAG) and tool calling, though it does not include a dedicated “thinking” mode.",
- "context_length": 256000,
+ "context_length": 262144,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -14133,14 +13291,10 @@
"pricing": {
"prompt": "0.0000012",
"completion": "0.000006",
- "request": "0",
- "image": "0",
- "web_search": "0",
- "internal_reasoning": "0",
"input_cache_read": "0.00000024"
},
"top_provider": {
- "context_length": 256000,
+ "context_length": 262144,
"max_completion_tokens": 32768,
"is_moderated": false
},
@@ -14345,7 +13499,6 @@
"supported_parameters": [
"frequency_penalty",
"logit_bias",
- "logprobs",
"max_tokens",
"min_p",
"presence_penalty",
@@ -14358,7 +13511,6 @@
"tool_choice",
"tools",
"top_k",
- "top_logprobs",
"top_p"
],
"default_parameters": {
@@ -14375,7 +13527,7 @@
"name": "Qwen: Qwen3 VL 235B A22B Thinking",
"created": 1758668690,
"description": "Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math. The series emphasizes robust perception (recognition of diverse real-world and synthetic categories), spatial understanding (2D/3D grounding), and long-form visual comprehension, with competitive results on public multimodal benchmarks for both perception and reasoning.\n\nBeyond analysis, Qwen3-VL supports agentic interaction and tool use: it can follow complex instructions over multi-image, multi-turn dialogues; align text to video timelines for precise temporal queries; and operate GUI elements for automation tasks. The models also enable visual coding workflows, turning sketches or mockups into code and assisting with UI debugging, while maintaining strong text-only performance comparable to the flagship Qwen3 language models. This makes Qwen3-VL suitable for production scenarios spanning document AI, multilingual OCR, software/UI assistance, spatial/embodied tasks, and research on vision-language agents.",
- "context_length": 262144,
+ "context_length": 131072,
"architecture": {
"modality": "text+image->text",
"input_modalities": [
@@ -14389,12 +13541,16 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000045",
- "completion": "0.0000035"
+ "prompt": "0",
+ "completion": "0",
+ "request": "0",
+ "image": "0",
+ "web_search": "0",
+ "internal_reasoning": "0"
},
"top_provider": {
- "context_length": 262144,
- "max_completion_tokens": 262144,
+ "context_length": 131072,
+ "max_completion_tokens": 32768,
"is_moderated": false
},
"per_request_limits": null,
@@ -14429,7 +13585,7 @@
"name": "Qwen: Qwen3 VL 30B A3B Instruct",
"created": 1759794476,
"description": "Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception of real-world/synthetic categories, 2D/3D spatial grounding, and long-form visual comprehension, achieving competitive multimodal benchmark results. For agentic use, it handles multi-image multi-turn instructions, video timeline alignments, GUI automation, and visual coding from sketches to debugged UI. Text performance matches flagship Qwen3 models, suiting document AI, OCR, UI assistance, spatial tasks, and agent research.",
- "context_length": 262144,
+ "context_length": 131072,
"architecture": {
"modality": "text+image->text",
"input_modalities": [
@@ -14443,20 +13599,17 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000015",
- "completion": "0.0000006",
- "input_cache_read": "0.000000075"
+ "prompt": "0.00000013",
+ "completion": "0.00000052"
},
"top_provider": {
- "context_length": 262144,
- "max_completion_tokens": null,
+ "context_length": 131072,
+ "max_completion_tokens": 32768,
"is_moderated": false
},
"per_request_limits": null,
"supported_parameters": [
"frequency_penalty",
- "logit_bias",
- "logprobs",
"max_tokens",
"min_p",
"presence_penalty",
@@ -14469,7 +13622,6 @@
"tool_choice",
"tools",
"top_k",
- "top_logprobs",
"top_p"
],
"default_parameters": {
@@ -14500,8 +13652,12 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.0000002",
- "completion": "0.000001"
+ "prompt": "0",
+ "completion": "0",
+ "request": "0",
+ "image": "0",
+ "web_search": "0",
+ "internal_reasoning": "0"
},
"top_provider": {
"context_length": 131072,
@@ -14539,7 +13695,7 @@
"name": "Qwen: Qwen3 VL 32B Instruct",
"created": 1761231332,
"description": "Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text comprehension, enabling fine-grained spatial reasoning, document and scene analysis, and long-horizon video understanding.Robust OCR in 32 languages, and enhanced multimodal fusion through Interleaved-MRoPE and DeepStack architectures. Optimized for agentic interaction and visual tool use, Qwen3-VL-32B delivers state-of-the-art performance for complex real-world multimodal tasks.",
- "context_length": 262144,
+ "context_length": 131072,
"architecture": {
"modality": "text+image->text",
"input_modalities": [
@@ -14553,12 +13709,12 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.0000005",
- "completion": "0.0000015"
+ "prompt": "0.000000104",
+ "completion": "0.000000416"
},
"top_provider": {
- "context_length": 262144,
- "max_completion_tokens": null,
+ "context_length": 131072,
+ "max_completion_tokens": 32768,
"is_moderated": false
},
"per_request_limits": null,
@@ -14570,9 +13726,12 @@
"presence_penalty",
"repetition_penalty",
"response_format",
+ "seed",
"stop",
"structured_outputs",
"temperature",
+ "tool_choice",
+ "tools",
"top_k",
"top_p"
],
@@ -14581,7 +13740,7 @@
"top_p": null,
"frequency_penalty": null
},
- "expiration_date": null
+ "expiration_date": "2026-02-25"
},
{
"id": "qwen/qwen3-vl-8b-instruct",
@@ -14644,7 +13803,7 @@
"name": "Qwen: Qwen3 VL 8B Thinking",
"created": 1760463746,
"description": "Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and long-context processing (native 256K, expandable to 1M tokens) for tasks such as scientific visual analysis, causal inference, and mathematical reasoning over image or video inputs.\n\nCompared to the Instruct edition, the Thinking version introduces deeper visual-language fusion and deliberate reasoning pathways that improve performance on long-chain logic tasks, STEM problem-solving, and multi-step video understanding. It achieves stronger temporal grounding via Interleaved-MRoPE and timestamp-aware embeddings, while maintaining robust OCR, multilingual comprehension, and text generation on par with large text-only LLMs.",
- "context_length": 256000,
+ "context_length": 131072,
"architecture": {
"modality": "text+image->text",
"input_modalities": [
@@ -14658,15 +13817,11 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000018",
- "completion": "0.0000021",
- "request": "0",
- "image": "0",
- "web_search": "0",
- "internal_reasoning": "0"
+ "prompt": "0.000000117",
+ "completion": "0.000001365"
},
"top_provider": {
- "context_length": 256000,
+ "context_length": 131072,
"max_completion_tokens": 32768,
"is_moderated": false
},
@@ -14722,20 +13877,18 @@
"supported_parameters": [
"frequency_penalty",
"include_reasoning",
- "logit_bias",
+ "logprobs",
"max_tokens",
- "min_p",
"presence_penalty",
"reasoning",
- "repetition_penalty",
"response_format",
- "seed",
"stop",
"structured_outputs",
"temperature",
"tool_choice",
"tools",
"top_k",
+ "top_logprobs",
"top_p"
],
"default_parameters": {},
@@ -15087,6 +14240,7 @@
"per_request_limits": null,
"supported_parameters": [
"frequency_penalty",
+ "logprobs",
"max_tokens",
"min_p",
"presence_penalty",
@@ -15097,6 +14251,7 @@
"structured_outputs",
"temperature",
"top_k",
+ "top_logprobs",
"top_p"
],
"default_parameters": {},
@@ -15314,6 +14469,7 @@
"supported_parameters": [
"frequency_penalty",
"logit_bias",
+ "logprobs",
"max_tokens",
"min_p",
"presence_penalty",
@@ -15326,6 +14482,7 @@
"tool_choice",
"tools",
"top_k",
+ "top_logprobs",
"top_p"
],
"default_parameters": {},
@@ -15406,6 +14563,7 @@
"per_request_limits": null,
"supported_parameters": [
"frequency_penalty",
+ "logprobs",
"max_tokens",
"presence_penalty",
"response_format",
@@ -15414,18 +14572,19 @@
"temperature",
"tool_choice",
"tools",
+ "top_logprobs",
"top_p"
],
"default_parameters": {},
"expiration_date": null
},
{
- "id": "tngtech/deepseek-r1t-chimera",
- "canonical_slug": "tngtech/deepseek-r1t-chimera",
- "hugging_face_id": "tngtech/DeepSeek-R1T-Chimera",
- "name": "TNG: DeepSeek R1T Chimera",
- "created": 1745760875,
- "description": "DeepSeek-R1T-Chimera is created by merging DeepSeek-R1 and DeepSeek-V3 (0324), combining the reasoning capabilities of R1 with the token efficiency improvements of V3. It is based on a DeepSeek-MoE Transformer architecture and is optimized for general text generation tasks.\n\nThe model merges pretrained weights from both source models to balance performance across reasoning, efficiency, and instruction-following tasks. It is released under the MIT license and intended for research and commercial use.",
+ "id": "tngtech/deepseek-r1t2-chimera",
+ "canonical_slug": "tngtech/deepseek-r1t2-chimera",
+ "hugging_face_id": "tngtech/DeepSeek-TNG-R1T2-Chimera",
+ "name": "TNG: DeepSeek R1T2 Chimera",
+ "created": 1751986985,
+ "description": "DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks.",
"context_length": 163840,
"architecture": {
"modality": "text->text",
@@ -15439,9 +14598,9 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.0000003",
- "completion": "0.0000012",
- "input_cache_read": "0.00000015"
+ "prompt": "0.00000025",
+ "completion": "0.00000085",
+ "input_cache_read": "0.000000125"
},
"top_provider": {
"context_length": 163840,
@@ -15461,6 +14620,8 @@
"stop",
"structured_outputs",
"temperature",
+ "tool_choice",
+ "tools",
"top_k",
"top_p"
],
@@ -15468,13 +14629,13 @@
"expiration_date": null
},
{
- "id": "tngtech/deepseek-r1t-chimera:free",
- "canonical_slug": "tngtech/deepseek-r1t-chimera",
- "hugging_face_id": "tngtech/DeepSeek-R1T-Chimera",
- "name": "TNG: DeepSeek R1T Chimera (free)",
- "created": 1745760875,
- "description": "DeepSeek-R1T-Chimera is created by merging DeepSeek-R1 and DeepSeek-V3 (0324), combining the reasoning capabilities of R1 with the token efficiency improvements of V3. It is based on a DeepSeek-MoE Transformer architecture and is optimized for general text generation tasks.\n\nThe model merges pretrained weights from both source models to balance performance across reasoning, efficiency, and instruction-following tasks. It is released under the MIT license and intended for research and commercial use.",
- "context_length": 163840,
+ "id": "undi95/remm-slerp-l2-13b",
+ "canonical_slug": "undi95/remm-slerp-l2-13b",
+ "hugging_face_id": "Undi95/ReMM-SLERP-L2-13B",
+ "name": "ReMM SLERP 13B",
+ "created": 1689984000,
+ "description": "A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge",
+ "context_length": 6144,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -15483,282 +14644,35 @@
"output_modalities": [
"text"
],
- "tokenizer": "DeepSeek",
- "instruct_type": null
+ "tokenizer": "Llama2",
+ "instruct_type": "alpaca"
},
"pricing": {
- "prompt": "0",
- "completion": "0"
+ "prompt": "0.00000045",
+ "completion": "0.00000065"
},
"top_provider": {
- "context_length": 163840,
- "max_completion_tokens": null,
+ "context_length": 6144,
+ "max_completion_tokens": 4096,
"is_moderated": false
},
"per_request_limits": null,
"supported_parameters": [
"frequency_penalty",
- "include_reasoning",
+ "logit_bias",
+ "logprobs",
"max_tokens",
+ "min_p",
"presence_penalty",
- "reasoning",
"repetition_penalty",
+ "response_format",
"seed",
"stop",
+ "structured_outputs",
"temperature",
+ "top_a",
"top_k",
- "top_p"
- ],
- "default_parameters": {},
- "expiration_date": null
- },
- {
- "id": "tngtech/deepseek-r1t2-chimera",
- "canonical_slug": "tngtech/deepseek-r1t2-chimera",
- "hugging_face_id": "tngtech/DeepSeek-TNG-R1T2-Chimera",
- "name": "TNG: DeepSeek R1T2 Chimera",
- "created": 1751986985,
- "description": "DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks.",
- "context_length": 163840,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "DeepSeek",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.00000025",
- "completion": "0.00000085",
- "input_cache_read": "0.000000125"
- },
- "top_provider": {
- "context_length": 163840,
- "max_completion_tokens": 163840,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "include_reasoning",
- "max_tokens",
- "presence_penalty",
- "reasoning",
- "repetition_penalty",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "temperature",
- "tool_choice",
- "tools",
- "top_k",
- "top_p"
- ],
- "default_parameters": {},
- "expiration_date": null
- },
- {
- "id": "tngtech/deepseek-r1t2-chimera:free",
- "canonical_slug": "tngtech/deepseek-r1t2-chimera",
- "hugging_face_id": "tngtech/DeepSeek-TNG-R1T2-Chimera",
- "name": "TNG: DeepSeek R1T2 Chimera (free)",
- "created": 1751986985,
- "description": "DeepSeek-TNG-R1T2-Chimera is the second-generation Chimera model from TNG Tech. It is a 671 B-parameter mixture-of-experts text-generation model assembled from DeepSeek-AI’s R1-0528, R1, and V3-0324 checkpoints with an Assembly-of-Experts merge. The tri-parent design yields strong reasoning performance while running roughly 20 % faster than the original R1 and more than 2× faster than R1-0528 under vLLM, giving a favorable cost-to-intelligence trade-off. The checkpoint supports contexts up to 60 k tokens in standard use (tested to ~130 k) and maintains consistent token behaviour, making it suitable for long-context analysis, dialogue and other open-ended generation tasks.",
- "context_length": 163840,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "DeepSeek",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0",
- "completion": "0"
- },
- "top_provider": {
- "context_length": 163840,
- "max_completion_tokens": null,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "include_reasoning",
- "max_tokens",
- "presence_penalty",
- "reasoning",
- "repetition_penalty",
- "seed",
- "stop",
- "temperature",
- "top_k",
- "top_p"
- ],
- "default_parameters": {},
- "expiration_date": null
- },
- {
- "id": "tngtech/tng-r1t-chimera",
- "canonical_slug": "tngtech/tng-r1t-chimera",
- "hugging_face_id": null,
- "name": "TNG: R1T Chimera",
- "created": 1764184161,
- "description": "TNG-R1T-Chimera is an experimental LLM with a faible for creative storytelling and character interaction. It is a derivate of the original TNG/DeepSeek-R1T-Chimera released in April 2025 and is available exclusively via Chutes and OpenRouter.\n\nCharacteristics and improvements include:\n\nWe think that it has a creative and pleasant personality.\nIt has a preliminary EQ-Bench3 value of about 1305.\nIt is quite a bit more intelligent than the original, albeit a slightly slower.\nIt is much more think-token consistent, i.e. reasoning and answer blocks are properly delineated.\nTool calling is much improved.\n\nTNG Tech, the model authors, ask that users follow the careful guidelines that Microsoft has created for their \"MAI-DS-R1\" DeepSeek-based model. These guidelines are available on Hugging Face (https://huggingface.co/microsoft/MAI-DS-R1).",
- "context_length": 163840,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Other",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0.00000025",
- "completion": "0.00000085",
- "input_cache_read": "0.000000125"
- },
- "top_provider": {
- "context_length": 163840,
- "max_completion_tokens": 65536,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "include_reasoning",
- "max_tokens",
- "presence_penalty",
- "reasoning",
- "repetition_penalty",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "temperature",
- "tool_choice",
- "tools",
- "top_k",
- "top_p"
- ],
- "default_parameters": {
- "temperature": null,
- "top_p": null,
- "frequency_penalty": null
- },
- "expiration_date": null
- },
- {
- "id": "tngtech/tng-r1t-chimera:free",
- "canonical_slug": "tngtech/tng-r1t-chimera",
- "hugging_face_id": null,
- "name": "TNG: R1T Chimera (free)",
- "created": 1764184161,
- "description": "TNG-R1T-Chimera is an experimental LLM with a faible for creative storytelling and character interaction. It is a derivate of the original TNG/DeepSeek-R1T-Chimera released in April 2025 and is available exclusively via Chutes and OpenRouter.\n\nCharacteristics and improvements include:\n\nWe think that it has a creative and pleasant personality.\nIt has a preliminary EQ-Bench3 value of about 1305.\nIt is quite a bit more intelligent than the original, albeit a slightly slower.\nIt is much more think-token consistent, i.e. reasoning and answer blocks are properly delineated.\nTool calling is much improved.\n\nTNG Tech, the model authors, ask that users follow the careful guidelines that Microsoft has created for their \"MAI-DS-R1\" DeepSeek-based model. These guidelines are available on Hugging Face (https://huggingface.co/microsoft/MAI-DS-R1).",
- "context_length": 163840,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Other",
- "instruct_type": null
- },
- "pricing": {
- "prompt": "0",
- "completion": "0"
- },
- "top_provider": {
- "context_length": 163840,
- "max_completion_tokens": 65536,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "include_reasoning",
- "max_tokens",
- "presence_penalty",
- "reasoning",
- "repetition_penalty",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "temperature",
- "tool_choice",
- "tools",
- "top_k",
- "top_p"
- ],
- "default_parameters": {
- "temperature": null,
- "top_p": null,
- "frequency_penalty": null
- },
- "expiration_date": null
- },
- {
- "id": "undi95/remm-slerp-l2-13b",
- "canonical_slug": "undi95/remm-slerp-l2-13b",
- "hugging_face_id": "Undi95/ReMM-SLERP-L2-13B",
- "name": "ReMM SLERP 13B",
- "created": 1689984000,
- "description": "A recreation trial of the original MythoMax-L2-B13 but with updated models. #merge",
- "context_length": 6144,
- "architecture": {
- "modality": "text->text",
- "input_modalities": [
- "text"
- ],
- "output_modalities": [
- "text"
- ],
- "tokenizer": "Llama2",
- "instruct_type": "alpaca"
- },
- "pricing": {
- "prompt": "0.00000045",
- "completion": "0.00000065"
- },
- "top_provider": {
- "context_length": 6144,
- "max_completion_tokens": 4096,
- "is_moderated": false
- },
- "per_request_limits": null,
- "supported_parameters": [
- "frequency_penalty",
- "logit_bias",
- "logprobs",
- "max_tokens",
- "min_p",
- "presence_penalty",
- "repetition_penalty",
- "response_format",
- "seed",
- "stop",
- "structured_outputs",
- "temperature",
- "top_a",
- "top_k",
- "top_logprobs",
+ "top_logprobs",
"top_p"
],
"default_parameters": {},
@@ -15808,7 +14722,7 @@
"top_p": null,
"frequency_penalty": null
},
- "expiration_date": "2026-03-02"
+ "expiration_date": "2026-03-22"
},
{
"id": "writer/palmyra-x5",
@@ -16285,7 +15199,7 @@
},
"top_provider": {
"context_length": 262144,
- "max_completion_tokens": null,
+ "max_completion_tokens": 65536,
"is_moderated": false
},
"per_request_limits": null,
@@ -16317,7 +15231,7 @@
"id": "z-ai/glm-4-32b",
"canonical_slug": "z-ai/glm-4-32b-0414",
"hugging_face_id": "",
- "name": "Z.AI: GLM 4 32B ",
+ "name": "Z.ai: GLM 4 32B ",
"created": 1753376617,
"description": "GLM 4 32B is a cost-effective foundation language model.\n\nIt can efficiently perform complex tasks and has significantly enhanced capabilities in tool use, online search, and code-related intelligent tasks.\n\nIt is made by the same lab behind the thudm models.",
"context_length": 128000,
@@ -16360,10 +15274,10 @@
"id": "z-ai/glm-4.5",
"canonical_slug": "z-ai/glm-4.5",
"hugging_face_id": "zai-org/GLM-4.5",
- "name": "Z.AI: GLM 4.5",
+ "name": "Z.ai: GLM 4.5",
"created": 1753471347,
"description": "GLM-4.5 is our latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment. It supports a hybrid inference mode with two options, a \"thinking mode\" designed for complex reasoning and tool use, and a \"non-thinking mode\" optimized for instant responses. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)",
- "context_length": 131072,
+ "context_length": 131000,
"architecture": {
"modality": "text->text",
"input_modalities": [
@@ -16376,13 +15290,12 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.00000035",
- "completion": "0.00000155",
- "input_cache_read": "0.000000175"
+ "prompt": "0.00000055",
+ "completion": "0.000002"
},
"top_provider": {
- "context_length": 131072,
- "max_completion_tokens": 65536,
+ "context_length": 131000,
+ "max_completion_tokens": 131000,
"is_moderated": false
},
"per_request_limits": null,
@@ -16408,13 +15321,13 @@
"top_p": null,
"frequency_penalty": null
},
- "expiration_date": null
+ "expiration_date": "2026-02-12"
},
{
"id": "z-ai/glm-4.5-air",
"canonical_slug": "z-ai/glm-4.5-air",
"hugging_face_id": "zai-org/GLM-4.5-Air",
- "name": "Z.AI: GLM 4.5 Air",
+ "name": "Z.ai: GLM 4.5 Air",
"created": 1753471258,
"description": "GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a \"thinking mode\" for advanced reasoning and tool use, and a \"non-thinking mode\" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)",
"context_length": 131072,
@@ -16468,7 +15381,7 @@
"id": "z-ai/glm-4.5-air:free",
"canonical_slug": "z-ai/glm-4.5-air",
"hugging_face_id": "zai-org/GLM-4.5-Air",
- "name": "Z.AI: GLM 4.5 Air (free)",
+ "name": "Z.ai: GLM 4.5 Air (free)",
"created": 1753471258,
"description": "GLM-4.5-Air is the lightweight variant of our latest flagship model family, also purpose-built for agent-centric applications. Like GLM-4.5, it adopts the Mixture-of-Experts (MoE) architecture but with a more compact parameter size. GLM-4.5-Air also supports hybrid inference modes, offering a \"thinking mode\" for advanced reasoning and tool use, and a \"non-thinking mode\" for real-time interaction. Users can control the reasoning behaviour with the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)",
"context_length": 131072,
@@ -16513,7 +15426,7 @@
"id": "z-ai/glm-4.5v",
"canonical_slug": "z-ai/glm-4.5v",
"hugging_face_id": "zai-org/GLM-4.5V",
- "name": "Z.AI: GLM 4.5V",
+ "name": "Z.ai: GLM 4.5V",
"created": 1754922288,
"description": "GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding, image Q&A, OCR, and document parsing, with strong gains in front-end web coding, grounding, and spatial reasoning. It offers a hybrid inference mode: a \"thinking mode\" for deep reasoning and a \"non-thinking mode\" for fast responses. Reasoning behavior can be toggled via the `reasoning` `enabled` boolean. [Learn more in our docs](https://openrouter.ai/docs/use-cases/reasoning-tokens#enable-reasoning-with-default-config)",
"context_length": 65536,
@@ -16568,7 +15481,7 @@
"id": "z-ai/glm-4.6",
"canonical_slug": "z-ai/glm-4.6",
"hugging_face_id": "",
- "name": "Z.AI: GLM 4.6",
+ "name": "Z.ai: GLM 4.6",
"created": 1759235576,
"description": "Compared with GLM-4.5, this generation brings several key improvements:\n\nLonger context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks.\nSuperior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages.\nAdvanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability.\nMore capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks.\nRefined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.",
"context_length": 202752,
@@ -16585,12 +15498,11 @@
},
"pricing": {
"prompt": "0.00000035",
- "completion": "0.0000015",
- "input_cache_read": "0.000000175"
+ "completion": "0.00000171"
},
"top_provider": {
"context_length": 202752,
- "max_completion_tokens": 65536,
+ "max_completion_tokens": 131072,
"is_moderated": false
},
"per_request_limits": null,
@@ -16598,7 +15510,6 @@
"frequency_penalty",
"include_reasoning",
"logit_bias",
- "logprobs",
"max_tokens",
"min_p",
"presence_penalty",
@@ -16611,9 +15522,7 @@
"temperature",
"tool_choice",
"tools",
- "top_a",
"top_k",
- "top_logprobs",
"top_p"
],
"default_parameters": {
@@ -16627,7 +15536,7 @@
"id": "z-ai/glm-4.6:exacto",
"canonical_slug": "z-ai/glm-4.6",
"hugging_face_id": "",
- "name": "Z.AI: GLM 4.6 (exacto)",
+ "name": "Z.ai: GLM 4.6 (exacto)",
"created": 1759235576,
"description": "Compared with GLM-4.5, this generation brings several key improvements:\n\nLonger context window: The context window has been expanded from 128K to 200K tokens, enabling the model to handle more complex agentic tasks.\nSuperior coding performance: The model achieves higher scores on code benchmarks and demonstrates better real-world performance in applications such as Claude Code、Cline、Roo Code and Kilo Code, including improvements in generating visually polished front-end pages.\nAdvanced reasoning: GLM-4.6 shows a clear improvement in reasoning performance and supports tool use during inference, leading to stronger overall capability.\nMore capable agents: GLM-4.6 exhibits stronger performance in tool using and search-based agents, and integrates more effectively within agent frameworks.\nRefined writing: Better aligns with human preferences in style and readability, and performs more naturally in role-playing scenarios.",
"context_length": 204800,
@@ -16681,7 +15590,7 @@
"id": "z-ai/glm-4.6v",
"canonical_slug": "z-ai/glm-4.6-20251208",
"hugging_face_id": "zai-org/GLM-4.6V",
- "name": "Z.AI: GLM 4.6V",
+ "name": "Z.ai: GLM 4.6V",
"created": 1765207462,
"description": "GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts and charts directly as visual inputs, and integrates native multimodal function calling to connect perception with downstream tool execution. The model also enables interleaved image-text generation and UI reconstruction workflows, including screenshot-to-HTML synthesis and iterative visual editing.",
"context_length": 131072,
@@ -16711,7 +15620,6 @@
"supported_parameters": [
"frequency_penalty",
"include_reasoning",
- "logit_bias",
"max_tokens",
"min_p",
"presence_penalty",
@@ -16738,9 +15646,9 @@
"id": "z-ai/glm-4.7",
"canonical_slug": "z-ai/glm-4.7-20251222",
"hugging_face_id": "zai-org/GLM-4.7",
- "name": "Z.AI: GLM 4.7",
+ "name": "Z.ai: GLM 4.7",
"created": 1766378014,
- "description": "GLM-4.7 is Z.AI’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while delivering more natural conversational experiences and superior front-end aesthetics.",
+ "description": "GLM-4.7 is Z.ai’s latest flagship model, featuring upgrades in two key areas: enhanced programming capabilities and more stable multi-step reasoning/execution. It demonstrates significant improvements in executing complex agent tasks while delivering more natural conversational experiences and superior front-end aesthetics.",
"context_length": 202752,
"architecture": {
"modality": "text->text",
@@ -16754,13 +15662,13 @@
"instruct_type": null
},
"pricing": {
- "prompt": "0.0000004",
- "completion": "0.0000015",
- "input_cache_read": "0.0000002"
+ "prompt": "0.0000003",
+ "completion": "0.0000014",
+ "input_cache_read": "0.00000015"
},
"top_provider": {
"context_length": 202752,
- "max_completion_tokens": 65535,
+ "max_completion_tokens": null,
"is_moderated": false
},
"per_request_limits": null,
@@ -16768,13 +15676,10 @@
"frequency_penalty",
"include_reasoning",
"logit_bias",
- "logprobs",
"max_tokens",
"min_p",
- "parallel_tool_calls",
"presence_penalty",
"reasoning",
- "reasoning_effort",
"repetition_penalty",
"response_format",
"seed",
@@ -16784,7 +15689,6 @@
"tool_choice",
"tools",
"top_k",
- "top_logprobs",
"top_p"
],
"default_parameters": {
@@ -16798,7 +15702,7 @@
"id": "z-ai/glm-4.7-flash",
"canonical_slug": "z-ai/glm-4.7-flash-20260119",
"hugging_face_id": "zai-org/GLM-4.7-Flash",
- "name": "Z.AI: GLM 4.7 Flash",
+ "name": "Z.ai: GLM 4.7 Flash",
"created": 1768833913,
"description": "As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning, and tool collaboration, and has achieved leading performance among open-source models of the same size on several current public benchmark leaderboards.",
"context_length": 202752,
@@ -16848,6 +15752,893 @@
"frequency_penalty": null
},
"expiration_date": null
+ },
+ {
+ "id": "anthropic/claude-sonnet-4.6",
+ "canonical_slug": "anthropic/claude-4.6-sonnet-20260217",
+ "hugging_face_id": "",
+ "name": "Anthropic: Claude Sonnet 4.6",
+ "created": 1771342990,
+ "description": "Sonnet 4.6 is Anthropic's most capable Sonnet-class model yet, with frontier performance across coding, agents, and professional work. It excels at iterative development, complex codebase navigation, end-to-end project management with memory, polished document creation, and confident computer use for web QA and workflow automation.",
+ "context_length": 1000000,
+ "architecture": {
+ "modality": "text+image->text",
+ "input_modalities": [
+ "text",
+ "image"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Claude",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.000003",
+ "completion": "0.000015",
+ "web_search": "0.01",
+ "input_cache_read": "0.0000003",
+ "input_cache_write": "0.00000375"
+ },
+ "top_provider": {
+ "context_length": 1000000,
+ "max_completion_tokens": 128000,
+ "is_moderated": true
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "include_reasoning",
+ "max_tokens",
+ "reasoning",
+ "response_format",
+ "stop",
+ "structured_outputs",
+ "temperature",
+ "tool_choice",
+ "tools",
+ "top_k",
+ "top_p",
+ "verbosity"
+ ],
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "google/gemini-3.1-flash-image-preview",
+ "canonical_slug": "google/gemini-3.1-flash-image-preview-20260226",
+ "hugging_face_id": "",
+ "name": "Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)",
+ "created": 1772119558,
+ "description": "Gemini 3.1 Flash Image Preview, a.k.a. \"Nano Banana 2,\" is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced contextual understanding with fast, cost-efficient inference, making complex image generation and iterative edits significantly more accessible. Aspect ratios can be controlled with the [image_config API Parameter](https://openrouter.ai/docs/features/multimodal/image-generation#image-aspect-ratio-configuration)",
+ "context_length": 65536,
+ "architecture": {
+ "modality": "text+image->text+image",
+ "input_modalities": [
+ "image",
+ "text"
+ ],
+ "output_modalities": [
+ "image",
+ "text"
+ ],
+ "tokenizer": "Gemini",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.00000025",
+ "completion": "0.0000015"
+ },
+ "top_provider": {
+ "context_length": 65536,
+ "max_completion_tokens": 65536,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "include_reasoning",
+ "max_tokens",
+ "reasoning",
+ "response_format",
+ "seed",
+ "stop",
+ "structured_outputs",
+ "temperature",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "google/gemini-3.1-pro-preview",
+ "canonical_slug": "google/gemini-3.1-pro-preview-20260219",
+ "hugging_face_id": "",
+ "name": "Google: Gemini 3.1 Pro Preview",
+ "created": 1771509627,
+ "description": "Gemini 3.1 Pro Preview is Google’s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation of the Gemini 3 series, it combines high-precision reasoning across text, image, video, audio, and code with a 1M-token context window. Reasoning Details must be preserved when using multi-turn tool calling, see our docs here: https://openrouter.ai/docs/use-cases/reasoning-tokens#preserving-reasoning. The 3.1 update introduces measurable gains in SWE benchmarks and real-world coding environments, along with stronger autonomous task execution in structured domains such as finance and spreadsheet-based workflows.\n\nDesigned for advanced development and agentic systems, Gemini 3.1 Pro Preview improves long-horizon stability and tool orchestration while increasing token efficiency. It introduces a new medium thinking level to better balance cost, speed, and performance. The model excels in agentic coding, structured planning, multimodal analysis, and workflow automation, making it well-suited for autonomous agents, financial modeling, spreadsheet automation, and high-context enterprise tasks.",
+ "context_length": 1048576,
+ "architecture": {
+ "modality": "text+image+file+audio+video->text",
+ "input_modalities": [
+ "audio",
+ "file",
+ "image",
+ "text",
+ "video"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Gemini",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.000002",
+ "completion": "0.000012",
+ "image": "0.000002",
+ "audio": "0.000002",
+ "internal_reasoning": "0.000012",
+ "input_cache_read": "0.0000002",
+ "input_cache_write": "0.000000375"
+ },
+ "top_provider": {
+ "context_length": 1048576,
+ "max_completion_tokens": 65536,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "include_reasoning",
+ "max_tokens",
+ "reasoning",
+ "response_format",
+ "seed",
+ "stop",
+ "structured_outputs",
+ "temperature",
+ "tool_choice",
+ "tools",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "google/gemini-3.1-pro-preview-customtools",
+ "canonical_slug": "google/gemini-3.1-pro-preview-customtools-20260219",
+ "hugging_face_id": null,
+ "name": "Google: Gemini 3.1 Pro Preview Custom Tools",
+ "created": 1772045923,
+ "description": "Gemini 3.1 Pro Preview Custom Tools is a variant of Gemini 3.1 Pro that improves tool selection behavior by preventing overuse of a general bash tool when more efficient third-party or user-defined functions are available. This specialized preview endpoint significantly increases function calling reliability and ensures the model selects the most appropriate tool in coding agents and complex, multi-tool workflows.\n\nIt retains the core strengths of Gemini 3.1 Pro, including multimodal reasoning across text, image, video, audio, and code, a 1M-token context window, and strong software engineering performance.",
+ "context_length": 1048576,
+ "architecture": {
+ "modality": "text+image+file+audio+video->text",
+ "input_modalities": [
+ "text",
+ "audio",
+ "image",
+ "video",
+ "file"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Gemini",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.000002",
+ "completion": "0.000012",
+ "image": "0.000002",
+ "audio": "0.000002",
+ "internal_reasoning": "0.000012",
+ "input_cache_read": "0.0000002",
+ "input_cache_write": "0.000000375"
+ },
+ "top_provider": {
+ "context_length": 1048576,
+ "max_completion_tokens": 65536,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "include_reasoning",
+ "max_tokens",
+ "reasoning",
+ "response_format",
+ "seed",
+ "stop",
+ "structured_outputs",
+ "temperature",
+ "tool_choice",
+ "tools",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "openai/gpt-5.3-codex",
+ "canonical_slug": "openai/gpt-5.3-codex-20260224",
+ "hugging_face_id": "",
+ "name": "OpenAI: GPT-5.3-Codex",
+ "created": 1771959164,
+ "description": "GPT-5.3-Codex is OpenAI’s most advanced agentic coding model, combining the frontier software engineering performance of GPT-5.2-Codex with the broader reasoning and professional knowledge capabilities of GPT-5.2. It achieves state-of-the-art results on SWE-Bench Pro and strong performance on Terminal-Bench 2.0 and OSWorld-Verified, reflecting improved multi-language coding, terminal proficiency, and real-world computer-use skills. The model is optimized for long-running, tool-using workflows and supports interactive steering during execution, making it suitable for complex development tasks, debugging, deployment, and iterative product work.\n\nBeyond coding, GPT-5.3-Codex performs strongly on structured knowledge-work benchmarks such as GDPval, supporting tasks like document drafting, spreadsheet analysis, slide creation, and operational research across domains. It is trained with enhanced cybersecurity awareness, including vulnerability identification capabilities, and deployed with additional safeguards for high-risk use cases. Compared to prior Codex models, it is more token-efficient and approximately 25% faster, targeting professional end-to-end workflows that span reasoning, execution, and computer interaction.",
+ "context_length": 400000,
+ "architecture": {
+ "modality": "text+image->text",
+ "input_modalities": [
+ "text",
+ "image"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "GPT",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.00000175",
+ "completion": "0.000014",
+ "web_search": "0.01",
+ "input_cache_read": "0.000000175"
+ },
+ "top_provider": {
+ "context_length": 400000,
+ "max_completion_tokens": 128000,
+ "is_moderated": true
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "include_reasoning",
+ "max_tokens",
+ "reasoning",
+ "response_format",
+ "seed",
+ "structured_outputs",
+ "tool_choice",
+ "tools"
+ ],
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "aion-labs/aion-2.0",
+ "canonical_slug": "aion-labs/aion-2.0-20260223",
+ "hugging_face_id": null,
+ "name": "AionLabs: Aion-2.0",
+ "created": 1771881306,
+ "description": "Aion-2.0 is a variant of DeepSeek V3.2 optimized for immersive roleplaying and storytelling. It is particularly strong at introducing tension, crises, and conflict into stories, making narratives feel more engaging. It also handles mature and darker themes with more nuance and depth.",
+ "context_length": 131072,
+ "architecture": {
+ "modality": "text->text",
+ "input_modalities": [
+ "text"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Other",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.0000008",
+ "completion": "0.0000016",
+ "input_cache_read": "0.0000002"
+ },
+ "top_provider": {
+ "context_length": 131072,
+ "max_completion_tokens": 32768,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "include_reasoning",
+ "max_tokens",
+ "reasoning",
+ "temperature",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "bytedance-seed/seed-2.0-mini",
+ "canonical_slug": "bytedance-seed/seed-2.0-mini-20260224",
+ "hugging_face_id": "",
+ "name": "ByteDance Seed: Seed-2.0-Mini",
+ "created": 1772131107,
+ "description": "Seed-2.0-mini targets latency-sensitive, high-concurrency, and cost-sensitive scenarios, emphasizing fast response and flexible inference deployment. It delivers performance comparable to ByteDance-Seed-1.6, supports 256k context, four reasoning effort modes (minimal/low/medium/high), multimodal understanding, and is optimized for lightweight tasks where cost and speed take priority.",
+ "context_length": 262144,
+ "architecture": {
+ "modality": "text+image+video->text",
+ "input_modalities": [
+ "text",
+ "image",
+ "video"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Other",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.0000001",
+ "completion": "0.0000004"
+ },
+ "top_provider": {
+ "context_length": 262144,
+ "max_completion_tokens": 131072,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "frequency_penalty",
+ "include_reasoning",
+ "max_tokens",
+ "reasoning",
+ "response_format",
+ "stop",
+ "structured_outputs",
+ "temperature",
+ "tool_choice",
+ "tools",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "liquid/lfm-2-24b-a2b",
+ "canonical_slug": "liquid/lfm-2-24b-a2b-20260224",
+ "hugging_face_id": "LiquidAI/LFM2-24B-A2B",
+ "name": "LiquidAI: LFM2-24B-A2B",
+ "created": 1772048711,
+ "description": "LFM2-24B-A2B is the largest model in the LFM2 family of hybrid architectures designed for efficient on-device deployment. Built as a 24B parameter Mixture-of-Experts model with only 2B active parameters per token, it delivers high-quality generation while maintaining low inference costs. The model fits within 32 GB of RAM, making it practical to run on consumer laptops and desktops without sacrificing capability.",
+ "context_length": 32768,
+ "architecture": {
+ "modality": "text->text",
+ "input_modalities": [
+ "text"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Other",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.00000003",
+ "completion": "0.00000012"
+ },
+ "top_provider": {
+ "context_length": 32768,
+ "max_completion_tokens": null,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "frequency_penalty",
+ "logit_bias",
+ "max_tokens",
+ "min_p",
+ "presence_penalty",
+ "repetition_penalty",
+ "stop",
+ "temperature",
+ "top_k",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": 0.1,
+ "top_p": null,
+ "top_k": 50,
+ "frequency_penalty": null,
+ "presence_penalty": null,
+ "repetition_penalty": 1.05
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "qwen/qwen3-max-thinking",
+ "canonical_slug": "qwen/qwen3-max-thinking-20260123",
+ "hugging_face_id": null,
+ "name": "Qwen: Qwen3 Max Thinking",
+ "created": 1770671901,
+ "description": "Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it delivers major gains in factual accuracy, complex reasoning, instruction following, alignment with human preferences, and agentic behavior.",
+ "context_length": 262144,
+ "architecture": {
+ "modality": "text->text",
+ "input_modalities": [
+ "text"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Qwen",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.0000012",
+ "completion": "0.000006"
+ },
+ "top_provider": {
+ "context_length": 262144,
+ "max_completion_tokens": 32768,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "include_reasoning",
+ "max_tokens",
+ "presence_penalty",
+ "reasoning",
+ "response_format",
+ "seed",
+ "structured_outputs",
+ "temperature",
+ "tool_choice",
+ "tools",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "qwen/qwen3.5-122b-a10b",
+ "canonical_slug": "qwen/qwen3.5-122b-a10b-20260224",
+ "hugging_face_id": "Qwen/Qwen3.5-122B-A10B",
+ "name": "Qwen: Qwen3.5-122B-A10B",
+ "created": 1772053789,
+ "description": "The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.",
+ "context_length": 262144,
+ "architecture": {
+ "modality": "text+image+video->text",
+ "input_modalities": [
+ "text",
+ "image",
+ "video"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Qwen3",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.0000004",
+ "completion": "0.0000032"
+ },
+ "top_provider": {
+ "context_length": 262144,
+ "max_completion_tokens": 65536,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "include_reasoning",
+ "logprobs",
+ "max_tokens",
+ "presence_penalty",
+ "reasoning",
+ "response_format",
+ "seed",
+ "structured_outputs",
+ "temperature",
+ "tool_choice",
+ "tools",
+ "top_logprobs",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": 0.6,
+ "top_p": 0.95,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "qwen/qwen3.5-27b",
+ "canonical_slug": "qwen/qwen3.5-27b-20260224",
+ "hugging_face_id": "Qwen/Qwen3.5-27B",
+ "name": "Qwen: Qwen3.5-27B",
+ "created": 1772053810,
+ "description": "The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of the Qwen3.5-122B-A10B.",
+ "context_length": 262144,
+ "architecture": {
+ "modality": "text+image+video->text",
+ "input_modalities": [
+ "text",
+ "image",
+ "video"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Qwen3",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.0000003",
+ "completion": "0.0000024"
+ },
+ "top_provider": {
+ "context_length": 262144,
+ "max_completion_tokens": 65536,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "include_reasoning",
+ "logprobs",
+ "max_tokens",
+ "presence_penalty",
+ "reasoning",
+ "response_format",
+ "seed",
+ "structured_outputs",
+ "temperature",
+ "tool_choice",
+ "tools",
+ "top_logprobs",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": 0.6,
+ "top_p": 0.95,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "qwen/qwen3.5-35b-a3b",
+ "canonical_slug": "qwen/qwen3.5-35b-a3b-20260224",
+ "hugging_face_id": "Qwen/Qwen3.5-35B-A3B",
+ "name": "Qwen: Qwen3.5-35B-A3B",
+ "created": 1772053822,
+ "description": "The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall performance is comparable to that of the Qwen3.5-27B.",
+ "context_length": 262144,
+ "architecture": {
+ "modality": "text+image+video->text",
+ "input_modalities": [
+ "text",
+ "image",
+ "video"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Qwen3",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.00000025",
+ "completion": "0.000002"
+ },
+ "top_provider": {
+ "context_length": 262144,
+ "max_completion_tokens": 65536,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "include_reasoning",
+ "logprobs",
+ "max_tokens",
+ "presence_penalty",
+ "reasoning",
+ "response_format",
+ "seed",
+ "structured_outputs",
+ "temperature",
+ "tool_choice",
+ "tools",
+ "top_logprobs",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": 1,
+ "top_p": 0.95,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "qwen/qwen3.5-397b-a17b",
+ "canonical_slug": "qwen/qwen3.5-397b-a17b-20260216",
+ "hugging_face_id": "Qwen/Qwen3.5-397B-A17B",
+ "name": "Qwen: Qwen3.5 397B A17B",
+ "created": 1771223018,
+ "description": "The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers state-of-the-art performance comparable to leading-edge models across a wide range of tasks, including language understanding, logical reasoning, code generation, agent-based tasks, image understanding, video understanding, and graphical user interface (GUI) interactions. With its robust code-generation and agent capabilities, the model exhibits strong generalization across diverse agent.",
+ "context_length": 262144,
+ "architecture": {
+ "modality": "text+image+video->text",
+ "input_modalities": [
+ "text",
+ "image",
+ "video"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Qwen3",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.00000055",
+ "completion": "0.0000035",
+ "input_cache_read": "0.00000055"
+ },
+ "top_provider": {
+ "context_length": 262144,
+ "max_completion_tokens": 65536,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "frequency_penalty",
+ "include_reasoning",
+ "logit_bias",
+ "max_tokens",
+ "min_p",
+ "presence_penalty",
+ "reasoning",
+ "repetition_penalty",
+ "response_format",
+ "seed",
+ "stop",
+ "structured_outputs",
+ "temperature",
+ "tool_choice",
+ "tools",
+ "top_k",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": 0.6,
+ "top_p": 0.95,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "qwen/qwen3.5-flash-02-23",
+ "canonical_slug": "qwen/qwen3.5-flash-20260224",
+ "hugging_face_id": null,
+ "name": "Qwen: Qwen3.5-Flash",
+ "created": 1772053776,
+ "description": "The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.",
+ "context_length": 1000000,
+ "architecture": {
+ "modality": "text+image+video->text",
+ "input_modalities": [
+ "text",
+ "image",
+ "video"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Qwen3",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.0000001",
+ "completion": "0.0000004"
+ },
+ "top_provider": {
+ "context_length": 1000000,
+ "max_completion_tokens": 65536,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "include_reasoning",
+ "max_tokens",
+ "presence_penalty",
+ "reasoning",
+ "response_format",
+ "seed",
+ "structured_outputs",
+ "temperature",
+ "tool_choice",
+ "tools",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "qwen/qwen3.5-plus-02-15",
+ "canonical_slug": "qwen/qwen3.5-plus-20260216",
+ "hugging_face_id": "",
+ "name": "Qwen: Qwen3.5 Plus 2026-02-15",
+ "created": 1771229416,
+ "description": "The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of task evaluations, the 3.5 series consistently demonstrates performance on par with state-of-the-art leading models. Compared to the 3 series, these models show a leap forward in both pure-text and multimodal capabilities.",
+ "context_length": 1000000,
+ "architecture": {
+ "modality": "text+image+video->text",
+ "input_modalities": [
+ "text",
+ "image",
+ "video"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Qwen3",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.0000004",
+ "completion": "0.0000024"
+ },
+ "top_provider": {
+ "context_length": 1000000,
+ "max_completion_tokens": 65536,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "include_reasoning",
+ "max_tokens",
+ "presence_penalty",
+ "reasoning",
+ "response_format",
+ "seed",
+ "structured_outputs",
+ "temperature",
+ "tool_choice",
+ "tools",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "stepfun/step-3.5-flash",
+ "canonical_slug": "stepfun/step-3.5-flash",
+ "hugging_face_id": "stepfun-ai/Step-3.5-Flash",
+ "name": "StepFun: Step 3.5 Flash",
+ "created": 1769728337,
+ "description": "Step 3.5 Flash is StepFun's most capable open-source foundation model. Built on a sparse Mixture of Experts (MoE) architecture, it selectively activates only 11B of its 196B parameters per token. It is a reasoning model that is incredibly speed efficient even at long contexts.",
+ "context_length": 256000,
+ "architecture": {
+ "modality": "text->text",
+ "input_modalities": [
+ "text"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Other",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.0000001",
+ "completion": "0.0000003",
+ "input_cache_read": "0.00000002"
+ },
+ "top_provider": {
+ "context_length": 256000,
+ "max_completion_tokens": 256000,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "frequency_penalty",
+ "include_reasoning",
+ "max_tokens",
+ "reasoning",
+ "stop",
+ "temperature",
+ "tools",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": null,
+ "top_p": null,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
+ },
+ {
+ "id": "z-ai/glm-5",
+ "canonical_slug": "z-ai/glm-5-20260211",
+ "hugging_face_id": "zai-org/GLM-5",
+ "name": "Z.ai: GLM 5",
+ "created": 1770829182,
+ "description": "GLM-5 is Z.ai’s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading closed-source models. With advanced agentic planning, deep backend reasoning, and iterative self-correction, GLM-5 moves beyond code generation to full-system construction and autonomous execution.",
+ "context_length": 204800,
+ "architecture": {
+ "modality": "text->text",
+ "input_modalities": [
+ "text"
+ ],
+ "output_modalities": [
+ "text"
+ ],
+ "tokenizer": "Other",
+ "instruct_type": null
+ },
+ "pricing": {
+ "prompt": "0.00000095",
+ "completion": "0.00000255",
+ "input_cache_read": "0.0000002"
+ },
+ "top_provider": {
+ "context_length": 204800,
+ "max_completion_tokens": 131072,
+ "is_moderated": false
+ },
+ "per_request_limits": null,
+ "supported_parameters": [
+ "frequency_penalty",
+ "include_reasoning",
+ "logit_bias",
+ "logprobs",
+ "max_tokens",
+ "min_p",
+ "presence_penalty",
+ "reasoning",
+ "repetition_penalty",
+ "response_format",
+ "seed",
+ "stop",
+ "structured_outputs",
+ "temperature",
+ "tool_choice",
+ "tools",
+ "top_k",
+ "top_logprobs",
+ "top_p"
+ ],
+ "default_parameters": {
+ "temperature": 1,
+ "top_p": 0.95,
+ "frequency_penalty": null
+ },
+ "expiration_date": null
}
]
}
diff --git a/internal/llm/provider_event_codex.go b/internal/llm/provider_event_codex.go
new file mode 100644
index 00000000..af6ca6da
--- /dev/null
+++ b/internal/llm/provider_event_codex.go
@@ -0,0 +1,350 @@
+package llm
+
+import (
+ "encoding/json"
+ "fmt"
+ "strings"
+)
+
+// ProviderToolLifecycle captures provider-native tool lifecycle events that can
+// be surfaced through generic progress telemetry.
+type ProviderToolLifecycle struct {
+ ToolName string
+ CallID string
+ ArgumentsJSON string
+ FullOutput string
+ Completed bool
+ IsError bool
+}
+
+// ProviderToolOutputDelta captures provider-native streamed tool output chunks.
+type ProviderToolOutputDelta struct {
+ ToolName string
+ CallID string
+ Delta string
+}
+
+// ParseCodexAppServerToolLifecycle maps codex-app-server item lifecycle
+// provider events into a normalized tool lifecycle shape.
+func ParseCodexAppServerToolLifecycle(ev StreamEvent) (ProviderToolLifecycle, bool) {
+ if ev.Type != StreamEventProviderEvent {
+ return ProviderToolLifecycle{}, false
+ }
+ method := strings.TrimSpace(ev.EventType)
+ if method != "item/started" && method != "item/completed" {
+ return ProviderToolLifecycle{}, false
+ }
+ item := asMapAny(ev.Raw["item"])
+ if item == nil {
+ return ProviderToolLifecycle{}, false
+ }
+ itemType := strings.TrimSpace(asStringAny(item["type"]))
+ if !isCodexToolItemType(itemType) {
+ return ProviderToolLifecycle{}, false
+ }
+ callID := firstNonEmptyString(
+ strings.TrimSpace(asStringAny(item["id"])),
+ firstNonEmptyString(
+ strings.TrimSpace(asStringAny(item["itemId"])),
+ strings.TrimSpace(asStringAny(item["item_id"])),
+ ),
+ )
+ if callID == "" {
+ return ProviderToolLifecycle{}, false
+ }
+
+ lifecycle := ProviderToolLifecycle{
+ ToolName: codexToolName(itemType, item),
+ CallID: callID,
+ Completed: method == "item/completed",
+ }
+ if lifecycle.ToolName == "" {
+ lifecycle.ToolName = itemType
+ }
+ if args := codexToolStartArgs(itemType, item); len(args) > 0 {
+ if b, err := json.Marshal(args); err == nil {
+ lifecycle.ArgumentsJSON = string(b)
+ }
+ }
+ if strings.TrimSpace(lifecycle.ArgumentsJSON) == "" {
+ lifecycle.ArgumentsJSON = "{}"
+ }
+ if lifecycle.Completed {
+ lifecycle.IsError = codexItemIsError(item)
+ lifecycle.FullOutput = codexToolCompletedOutput(item)
+ }
+ return lifecycle, true
+}
+
+// ParseCodexAppServerToolOutputDelta maps codex-app-server output/progress
+// notifications into normalized tool output deltas.
+func ParseCodexAppServerToolOutputDelta(ev StreamEvent) (ProviderToolOutputDelta, bool) {
+ if ev.Type != StreamEventProviderEvent {
+ return ProviderToolOutputDelta{}, false
+ }
+ method := strings.TrimSpace(ev.EventType)
+ itemID := strings.TrimSpace(asStringAny(ev.Raw["itemId"]))
+ if itemID == "" {
+ itemID = strings.TrimSpace(asStringAny(ev.Raw["item_id"]))
+ }
+ switch method {
+ case "item/commandExecution/outputDelta":
+ delta := asStringAny(ev.Raw["delta"])
+ if itemID == "" || delta == "" {
+ return ProviderToolOutputDelta{}, false
+ }
+ return ProviderToolOutputDelta{
+ ToolName: "exec_command",
+ CallID: itemID,
+ Delta: delta,
+ }, true
+ case "item/fileChange/outputDelta":
+ delta := asStringAny(ev.Raw["delta"])
+ if itemID == "" || delta == "" {
+ return ProviderToolOutputDelta{}, false
+ }
+ return ProviderToolOutputDelta{
+ ToolName: "apply_patch",
+ CallID: itemID,
+ Delta: delta,
+ }, true
+ case "item/mcpToolCall/progress":
+ msg := asStringAny(ev.Raw["message"])
+ if itemID == "" || msg == "" {
+ return ProviderToolOutputDelta{}, false
+ }
+ toolName := firstNonEmptyString(
+ strings.TrimSpace(asStringAny(ev.Raw["tool"])),
+ "mcp_tool_call",
+ )
+ return ProviderToolOutputDelta{
+ ToolName: toolName,
+ CallID: itemID,
+ Delta: msg,
+ }, true
+ default:
+ return ProviderToolOutputDelta{}, false
+ }
+}
+
+func isCodexToolItemType(itemType string) bool {
+ switch itemType {
+ case "commandExecution", "fileChange", "mcpToolCall", "collabToolCall", "collabAgentToolCall", "webSearch", "imageView":
+ return true
+ default:
+ return false
+ }
+}
+
+func codexToolName(itemType string, item map[string]any) string {
+ switch itemType {
+ case "commandExecution":
+ return "exec_command"
+ case "fileChange":
+ return "apply_patch"
+ case "mcpToolCall":
+ return firstNonEmptyString(
+ strings.TrimSpace(asStringAny(item["tool"])),
+ "mcp_tool_call",
+ )
+ case "collabToolCall", "collabAgentToolCall":
+ return firstNonEmptyString(
+ strings.TrimSpace(asStringAny(item["tool"])),
+ "collab_tool_call",
+ )
+ case "webSearch":
+ return "web_search"
+ case "imageView":
+ return "view_image"
+ default:
+ return ""
+ }
+}
+
+func codexToolStartArgs(itemType string, item map[string]any) map[string]any {
+ out := map[string]any{}
+ switch itemType {
+ case "commandExecution":
+ if cmd := strings.TrimSpace(asStringAny(item["command"])); cmd != "" {
+ out["command"] = cmd
+ }
+ if cwd := strings.TrimSpace(asStringAny(item["cwd"])); cwd != "" {
+ out["cwd"] = cwd
+ }
+ case "fileChange":
+ if changes, ok := item["changes"].([]any); ok {
+ out["change_count"] = len(changes)
+ }
+ case "mcpToolCall":
+ if server := strings.TrimSpace(asStringAny(item["server"])); server != "" {
+ out["server"] = server
+ }
+ if tool := strings.TrimSpace(asStringAny(item["tool"])); tool != "" {
+ out["tool"] = tool
+ }
+ if args, ok := item["arguments"]; ok && args != nil {
+ out["arguments"] = args
+ }
+ case "collabToolCall", "collabAgentToolCall":
+ if tool := strings.TrimSpace(asStringAny(item["tool"])); tool != "" {
+ out["tool"] = tool
+ }
+ if sender := strings.TrimSpace(asStringAny(item["senderThreadId"])); sender != "" {
+ out["sender_thread_id"] = sender
+ }
+ if receivers := asStringSlice(item["receiverThreadIds"]); len(receivers) > 0 {
+ out["receiver_thread_ids"] = receivers
+ } else if receiver := strings.TrimSpace(asStringAny(item["receiverThreadId"])); receiver != "" {
+ out["receiver_thread_ids"] = []string{receiver}
+ }
+ case "webSearch":
+ if query := strings.TrimSpace(asStringAny(item["query"])); query != "" {
+ out["query"] = query
+ }
+ case "imageView":
+ if path := strings.TrimSpace(asStringAny(item["path"])); path != "" {
+ out["path"] = path
+ }
+ }
+ return out
+}
+
+func codexItemIsError(item map[string]any) bool {
+ status := strings.ToLower(strings.TrimSpace(asStringAny(item["status"])))
+ switch status {
+ case "failed", "declined", "denied", "error", "canceled", "cancelled":
+ return true
+ }
+ if errVal, ok := item["error"]; ok && !isZeroValue(errVal) {
+ return true
+ }
+ return false
+}
+
+func codexToolCompletedOutput(item map[string]any) string {
+ // Preserve provider-native aggregated output bytes when available.
+ if raw, ok := codexRawNonEmpty(item["aggregatedOutput"]); ok {
+ return raw
+ }
+ if raw, ok := codexRawNonEmpty(item["aggregated_output"]); ok {
+ return raw
+ }
+
+ parts := make([]string, 0, 4)
+ appendUnique := func(text string) {
+ trimmed := strings.TrimSpace(text)
+ if trimmed == "" {
+ return
+ }
+ for _, existing := range parts {
+ if existing == trimmed {
+ return
+ }
+ }
+ parts = append(parts, trimmed)
+ }
+
+ appendUnique(codexValueAsText(item["stdout"]))
+ appendUnique(codexValueAsText(item["stderr"]))
+
+ // Prefer structured output fields when explicit stdio is absent.
+ if len(parts) == 0 {
+ appendUnique(codexValueAsText(item["output"]))
+ appendUnique(codexValueAsText(item["result"]))
+ appendUnique(codexValueAsText(item["response"]))
+ appendUnique(codexValueAsText(item["value"]))
+ }
+
+ // Preserve deterministic failure details for debugging.
+ appendUnique(codexValueAsText(item["error"]))
+ if len(parts) == 0 {
+ appendUnique(codexValueAsText(item["message"]))
+ }
+
+ return strings.Join(parts, "\n\n")
+}
+
+func codexRawNonEmpty(v any) (string, bool) {
+ switch typed := v.(type) {
+ case string:
+ if strings.TrimSpace(typed) == "" {
+ return "", false
+ }
+ return typed, true
+ case []byte:
+ s := string(typed)
+ if strings.TrimSpace(s) == "" {
+ return "", false
+ }
+ return s, true
+ default:
+ return "", false
+ }
+}
+
+func codexValueAsText(v any) string {
+ switch typed := v.(type) {
+ case nil:
+ return ""
+ case string:
+ return strings.TrimSpace(typed)
+ case []byte:
+ return strings.TrimSpace(string(typed))
+ }
+
+ b, err := json.Marshal(v)
+ if err != nil {
+ return strings.TrimSpace(fmt.Sprint(v))
+ }
+ trimmed := strings.TrimSpace(string(b))
+ switch trimmed {
+ case "", "null":
+ return ""
+ }
+ return trimmed
+}
+
+func asMapAny(v any) map[string]any {
+ m, _ := v.(map[string]any)
+ return m
+}
+
+func asStringAny(v any) string {
+ s, _ := v.(string)
+ return s
+}
+
+func isZeroValue(v any) bool {
+ switch typed := v.(type) {
+ case nil:
+ return true
+ case string:
+ return strings.TrimSpace(typed) == ""
+ case map[string]any:
+ return len(typed) == 0
+ default:
+ return false
+ }
+}
+
+func asStringSlice(v any) []string {
+ seq, ok := v.([]any)
+ if !ok || len(seq) == 0 {
+ return nil
+ }
+ out := make([]string, 0, len(seq))
+ for _, item := range seq {
+ s := strings.TrimSpace(asStringAny(item))
+ if s != "" {
+ out = append(out, s)
+ }
+ }
+ return out
+}
+
+func firstNonEmptyString(a, b string) string {
+ if strings.TrimSpace(a) != "" {
+ return a
+ }
+ return b
+}
diff --git a/internal/llm/provider_event_codex_test.go b/internal/llm/provider_event_codex_test.go
new file mode 100644
index 00000000..aa4050ac
--- /dev/null
+++ b/internal/llm/provider_event_codex_test.go
@@ -0,0 +1,277 @@
+package llm
+
+import "testing"
+
+func TestParseCodexAppServerToolLifecycle_CommandExecutionStarted(t *testing.T) {
+ ev := StreamEvent{
+ Type: StreamEventProviderEvent,
+ EventType: "item/started",
+ Raw: map[string]any{
+ "item": map[string]any{
+ "id": "cmd_1",
+ "type": "commandExecution",
+ "command": "pwd",
+ "cwd": "/tmp/worktree",
+ "status": "inProgress",
+ },
+ },
+ }
+
+ lifecycle, ok := ParseCodexAppServerToolLifecycle(ev)
+ if !ok {
+ t.Fatalf("expected lifecycle match")
+ }
+ if lifecycle.Completed {
+ t.Fatalf("expected start event, got completed")
+ }
+ if lifecycle.CallID != "cmd_1" {
+ t.Fatalf("call id: got %q want %q", lifecycle.CallID, "cmd_1")
+ }
+ if lifecycle.ToolName != "exec_command" {
+ t.Fatalf("tool name: got %q want %q", lifecycle.ToolName, "exec_command")
+ }
+ if lifecycle.ArgumentsJSON == "" {
+ t.Fatalf("expected non-empty arguments json")
+ }
+}
+
+func TestParseCodexAppServerToolLifecycle_CompletedFailedIsError(t *testing.T) {
+ ev := StreamEvent{
+ Type: StreamEventProviderEvent,
+ EventType: "item/completed",
+ Raw: map[string]any{
+ "item": map[string]any{
+ "id": "mcp_1",
+ "type": "mcpToolCall",
+ "tool": "search",
+ "status": "failed",
+ "error": map[string]any{"message": "upstream timeout"},
+ },
+ },
+ }
+
+ lifecycle, ok := ParseCodexAppServerToolLifecycle(ev)
+ if !ok {
+ t.Fatalf("expected lifecycle match")
+ }
+ if !lifecycle.Completed {
+ t.Fatalf("expected completed event")
+ }
+ if !lifecycle.IsError {
+ t.Fatalf("expected failed completion to be marked is_error")
+ }
+ if lifecycle.ToolName != "search" {
+ t.Fatalf("tool name: got %q want %q", lifecycle.ToolName, "search")
+ }
+}
+
+func TestParseCodexAppServerToolLifecycle_CompletedCanceledIsError(t *testing.T) {
+ ev := StreamEvent{
+ Type: StreamEventProviderEvent,
+ EventType: "item/completed",
+ Raw: map[string]any{
+ "item": map[string]any{
+ "id": "cmd_cancel",
+ "type": "commandExecution",
+ "status": "cancelled",
+ },
+ },
+ }
+
+ lifecycle, ok := ParseCodexAppServerToolLifecycle(ev)
+ if !ok {
+ t.Fatalf("expected lifecycle match")
+ }
+ if !lifecycle.Completed {
+ t.Fatalf("expected completed event")
+ }
+ if !lifecycle.IsError {
+ t.Fatalf("expected cancelled completion to be marked is_error")
+ }
+}
+
+func TestParseCodexAppServerToolLifecycle_StartedSparsePayloadDefaultsArgumentsJSON(t *testing.T) {
+ ev := StreamEvent{
+ Type: StreamEventProviderEvent,
+ EventType: "item/started",
+ Raw: map[string]any{
+ "item": map[string]any{
+ "id": "search_1",
+ "type": "webSearch",
+ },
+ },
+ }
+
+ lifecycle, ok := ParseCodexAppServerToolLifecycle(ev)
+ if !ok {
+ t.Fatalf("expected lifecycle match")
+ }
+ if lifecycle.Completed {
+ t.Fatalf("expected start event")
+ }
+ if lifecycle.ArgumentsJSON != "{}" {
+ t.Fatalf("arguments json: got %q want %q", lifecycle.ArgumentsJSON, "{}")
+ }
+}
+
+func TestParseCodexAppServerToolLifecycle_CompletedIncludesCommandOutput(t *testing.T) {
+ ev := StreamEvent{
+ Type: StreamEventProviderEvent,
+ EventType: "item/completed",
+ Raw: map[string]any{
+ "item": map[string]any{
+ "id": "cmd_2",
+ "type": "commandExecution",
+ "status": "completed",
+ "aggregatedOutput": "alpha\nbeta\n",
+ "stderr": "warning",
+ "message": "done",
+ },
+ },
+ }
+
+ lifecycle, ok := ParseCodexAppServerToolLifecycle(ev)
+ if !ok {
+ t.Fatalf("expected lifecycle match")
+ }
+ if !lifecycle.Completed {
+ t.Fatalf("expected completed event")
+ }
+ if lifecycle.IsError {
+ t.Fatalf("unexpected error classification for successful command")
+ }
+ if lifecycle.FullOutput == "" {
+ t.Fatalf("expected full output from completed command event")
+ }
+ if lifecycle.FullOutput != "alpha\nbeta\n" {
+ t.Fatalf("full output mismatch: got %q", lifecycle.FullOutput)
+ }
+}
+
+func TestParseCodexAppServerToolLifecycle_CompletedUsesSnakeCaseAggregatedOutput(t *testing.T) {
+ ev := StreamEvent{
+ Type: StreamEventProviderEvent,
+ EventType: "item/completed",
+ Raw: map[string]any{
+ "item": map[string]any{
+ "id": "cmd_2b",
+ "type": "commandExecution",
+ "status": "completed",
+ "aggregated_output": "omega\n",
+ },
+ },
+ }
+
+ lifecycle, ok := ParseCodexAppServerToolLifecycle(ev)
+ if !ok {
+ t.Fatalf("expected lifecycle match")
+ }
+ if lifecycle.FullOutput != "omega\n" {
+ t.Fatalf("full output mismatch: got %q", lifecycle.FullOutput)
+ }
+}
+
+func TestParseCodexAppServerToolLifecycle_CompletedIncludesStructuredError(t *testing.T) {
+ ev := StreamEvent{
+ Type: StreamEventProviderEvent,
+ EventType: "item/completed",
+ Raw: map[string]any{
+ "item": map[string]any{
+ "id": "mcp_2",
+ "type": "mcpToolCall",
+ "tool": "search",
+ "status": "failed",
+ "error": map[string]any{
+ "message": "timeout",
+ "code": "ETIMEDOUT",
+ },
+ },
+ },
+ }
+
+ lifecycle, ok := ParseCodexAppServerToolLifecycle(ev)
+ if !ok {
+ t.Fatalf("expected lifecycle match")
+ }
+ if !lifecycle.Completed {
+ t.Fatalf("expected completed event")
+ }
+ if !lifecycle.IsError {
+ t.Fatalf("expected failed status to mark is_error")
+ }
+ if lifecycle.FullOutput == "" {
+ t.Fatalf("expected structured error payload to be preserved in full output")
+ }
+ if lifecycle.FullOutput != `{"code":"ETIMEDOUT","message":"timeout"}` {
+ t.Fatalf("structured error output mismatch: got %q", lifecycle.FullOutput)
+ }
+}
+
+func TestParseCodexAppServerToolOutputDelta_CommandExecution(t *testing.T) {
+ ev := StreamEvent{
+ Type: StreamEventProviderEvent,
+ EventType: "item/commandExecution/outputDelta",
+ Raw: map[string]any{
+ "itemId": "cmd_3",
+ "delta": "line 1\n",
+ },
+ }
+ delta, ok := ParseCodexAppServerToolOutputDelta(ev)
+ if !ok {
+ t.Fatalf("expected command output delta parse")
+ }
+ if delta.ToolName != "exec_command" {
+ t.Fatalf("tool name: got %q", delta.ToolName)
+ }
+ if delta.CallID != "cmd_3" {
+ t.Fatalf("call id: got %q", delta.CallID)
+ }
+ if delta.Delta != "line 1\n" {
+ t.Fatalf("delta mismatch: got %q", delta.Delta)
+ }
+}
+
+func TestParseCodexAppServerToolOutputDelta_CommandExecutionSnakeCaseItemID(t *testing.T) {
+ ev := StreamEvent{
+ Type: StreamEventProviderEvent,
+ EventType: "item/commandExecution/outputDelta",
+ Raw: map[string]any{
+ "item_id": "cmd_3b",
+ "delta": "line 2\n",
+ },
+ }
+ delta, ok := ParseCodexAppServerToolOutputDelta(ev)
+ if !ok {
+ t.Fatalf("expected command output delta parse")
+ }
+ if delta.CallID != "cmd_3b" {
+ t.Fatalf("call id: got %q", delta.CallID)
+ }
+ if delta.Delta != "line 2\n" {
+ t.Fatalf("delta mismatch: got %q", delta.Delta)
+ }
+}
+
+func TestParseCodexAppServerToolOutputDelta_FileChange(t *testing.T) {
+ ev := StreamEvent{
+ Type: StreamEventProviderEvent,
+ EventType: "item/fileChange/outputDelta",
+ Raw: map[string]any{
+ "itemId": "patch_7",
+ "delta": "updated README.md",
+ },
+ }
+ delta, ok := ParseCodexAppServerToolOutputDelta(ev)
+ if !ok {
+ t.Fatalf("expected file change output delta parse")
+ }
+ if delta.ToolName != "apply_patch" {
+ t.Fatalf("tool name: got %q", delta.ToolName)
+ }
+ if delta.CallID != "patch_7" {
+ t.Fatalf("call id: got %q", delta.CallID)
+ }
+ if delta.Delta != "updated README.md" {
+ t.Fatalf("delta mismatch: got %q", delta.Delta)
+ }
+}
diff --git a/internal/llm/providers/anthropic/adapter.go b/internal/llm/providers/anthropic/adapter.go
index fc6784d3..f3519373 100644
--- a/internal/llm/providers/anthropic/adapter.go
+++ b/internal/llm/providers/anthropic/adapter.go
@@ -208,7 +208,9 @@ func (a *Adapter) Complete(ctx context.Context, req llm.Request) (llm.Response,
return llm.Response{}, llm.ErrorFromHTTPStatus(a.Name(), resp.StatusCode, msg, raw, ra)
}
- return fromAnthropicResponse(a.Name(), raw, req.Model), nil
+ out := fromAnthropicResponse(a.Name(), raw, req.Model)
+ out.RateLimit = llm.ParseRateLimitInfo(resp.Header, time.Now())
+ return out, nil
}
func (a *Adapter) completeViaStream(ctx context.Context, req llm.Request) (llm.Response, error) {
@@ -388,6 +390,7 @@ func (a *Adapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, erro
s := llm.NewChanStream(cancel)
s.Send(llm.StreamEvent{Type: llm.StreamEventStreamStart})
+ rateLimit := llm.ParseRateLimitInfo(resp.Header, time.Now())
go func() {
defer func() {
@@ -684,11 +687,12 @@ func (a *Adapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, erro
msg := llm.Message{Role: llm.RoleAssistant, Content: parts}
r := llm.Response{
- Provider: a.Name(),
- Model: req.Model,
- Message: msg,
- Finish: finish,
- Usage: usage,
+ Provider: a.Name(),
+ Model: req.Model,
+ Message: msg,
+ Finish: finish,
+ Usage: usage,
+ RateLimit: rateLimit,
}
if len(r.ToolCalls()) > 0 {
r.Finish = llm.FinishReason{Reason: "tool_calls", Raw: "tool_use"}
diff --git a/internal/llm/providers/anthropic/adapter_test.go b/internal/llm/providers/anthropic/adapter_test.go
index 6eb47ec9..910bc460 100644
--- a/internal/llm/providers/anthropic/adapter_test.go
+++ b/internal/llm/providers/anthropic/adapter_test.go
@@ -18,6 +18,28 @@ import (
"github.com/danshapiro/kilroy/internal/llm"
)
+func assertRateLimitInfo(t *testing.T, rl *llm.RateLimitInfo) {
+ t.Helper()
+ if rl == nil {
+ t.Fatalf("expected rate limit info, got nil")
+ }
+ if rl.RequestsRemaining == nil || *rl.RequestsRemaining != 9 {
+ t.Fatalf("requests_remaining: %#v", rl.RequestsRemaining)
+ }
+ if rl.RequestsLimit == nil || *rl.RequestsLimit != 10 {
+ t.Fatalf("requests_limit: %#v", rl.RequestsLimit)
+ }
+ if rl.TokensRemaining == nil || *rl.TokensRemaining != 90 {
+ t.Fatalf("tokens_remaining: %#v", rl.TokensRemaining)
+ }
+ if rl.TokensLimit == nil || *rl.TokensLimit != 100 {
+ t.Fatalf("tokens_limit: %#v", rl.TokensLimit)
+ }
+ if rl.ResetAt != "2025-01-01T00:00:10Z" {
+ t.Fatalf("reset_at: %q", rl.ResetAt)
+ }
+}
+
func TestAdapter_Complete_MapsToMessagesAPI_AndSetsBetaHeaders(t *testing.T) {
var gotBody map[string]any
gotBeta := ""
@@ -33,6 +55,11 @@ func TestAdapter_Complete_MapsToMessagesAPI_AndSetsBetaHeaders(t *testing.T) {
_ = json.Unmarshal(b, &gotBody)
w.Header().Set("Content-Type", "application/json")
+ w.Header().Set("x-ratelimit-remaining-requests", "9")
+ w.Header().Set("x-ratelimit-limit-requests", "10")
+ w.Header().Set("x-ratelimit-remaining-tokens", "90")
+ w.Header().Set("x-ratelimit-limit-tokens", "100")
+ w.Header().Set("x-ratelimit-reset-requests", "Wed, 01 Jan 2025 00:00:10 GMT")
_, _ = w.Write([]byte(`{
"id": "msg_1",
"model": "claude-test",
@@ -68,6 +95,7 @@ func TestAdapter_Complete_MapsToMessagesAPI_AndSetsBetaHeaders(t *testing.T) {
if strings.TrimSpace(resp.Text()) != "Hello" {
t.Fatalf("resp text: %q", resp.Text())
}
+ assertRateLimitInfo(t, resp.RateLimit)
if gotBeta != "prompt-caching-2024-07-31" {
t.Fatalf("anthropic-beta header: %q", gotBeta)
}
@@ -176,6 +204,11 @@ func TestAdapter_Stream_NormalizesDotsTodashesinModelID(t *testing.T) {
gotModel, _ = body["model"].(string)
w.Header().Set("Content-Type", "text/event-stream")
+ w.Header().Set("x-ratelimit-remaining-requests", "9")
+ w.Header().Set("x-ratelimit-limit-requests", "10")
+ w.Header().Set("x-ratelimit-remaining-tokens", "90")
+ w.Header().Set("x-ratelimit-limit-tokens", "100")
+ w.Header().Set("x-ratelimit-reset-requests", "Wed, 01 Jan 2025 00:00:10 GMT")
f, _ := w.(http.Flusher)
write := func(event, data string) {
_, _ = io.WriteString(w, "event: "+event+"\ndata: "+data+"\n\n")
@@ -251,6 +284,11 @@ func TestAdapter_Stream_YieldsTextDeltasAndFinish(t *testing.T) {
_ = json.Unmarshal(b, &gotBody)
w.Header().Set("Content-Type", "text/event-stream")
+ w.Header().Set("x-ratelimit-remaining-requests", "9")
+ w.Header().Set("x-ratelimit-limit-requests", "10")
+ w.Header().Set("x-ratelimit-remaining-tokens", "90")
+ w.Header().Set("x-ratelimit-limit-tokens", "100")
+ w.Header().Set("x-ratelimit-reset-requests", "Wed, 01 Jan 2025 00:00:10 GMT")
f, _ := w.(http.Flusher)
write := func(event string, data string) {
_, _ = io.WriteString(w, "event: "+event+"\n")
@@ -297,6 +335,7 @@ func TestAdapter_Stream_YieldsTextDeltasAndFinish(t *testing.T) {
if finish == nil || strings.TrimSpace(finish.Text()) != "Hello" {
t.Fatalf("finish response: %+v", finish)
}
+ assertRateLimitInfo(t, finish.RateLimit)
if gotBody == nil {
t.Fatalf("server did not capture request body")
}
diff --git a/internal/llm/providers/codexappserver/adapter.go b/internal/llm/providers/codexappserver/adapter.go
new file mode 100644
index 00000000..d5968272
--- /dev/null
+++ b/internal/llm/providers/codexappserver/adapter.go
@@ -0,0 +1,595 @@
+package codexappserver
+
+import (
+ "context"
+ "encoding/json"
+ "errors"
+ "os"
+ "strconv"
+ "strings"
+ "sync"
+ "time"
+
+ "github.com/danshapiro/kilroy/internal/llm"
+)
+
+type codexTransport interface {
+ Initialize(ctx context.Context) error
+ Close() error
+ Complete(ctx context.Context, payload map[string]any) (map[string]any, error)
+ Stream(ctx context.Context, payload map[string]any) (*NotificationStream, error)
+ ListModels(ctx context.Context, params map[string]any) (modelListResponse, error)
+}
+
+type AdapterOptions struct {
+ Provider string
+ Transport codexTransport
+ TransportOptions TransportOptions
+ TranslateRequest func(request llm.Request, streaming bool) (translateRequestResult, error)
+ TranslateResponse func(body map[string]any) (llm.Response, error)
+ TranslateStream func(events <-chan map[string]any) <-chan llm.StreamEvent
+}
+
+type Adapter struct {
+ provider string
+ transportProvided codexTransport
+ transportOptions TransportOptions
+
+ translateRequestFn func(request llm.Request, streaming bool) (translateRequestResult, error)
+ translateResponseFn func(body map[string]any) (llm.Response, error)
+ translateStreamFn func(events <-chan map[string]any) <-chan llm.StreamEvent
+
+ transportMu sync.Mutex
+ transport codexTransport
+
+ modelListMu sync.Mutex
+ modelList *modelListResponse
+}
+
+func init() {
+ llm.RegisterEnvAdapterFactory(func() (llm.ProviderAdapter, bool, error) {
+ opts, ok := transportOptionsFromEnv()
+ if !ok {
+ return nil, false, nil
+ }
+ return NewAdapter(AdapterOptions{TransportOptions: opts}), true, nil
+ })
+}
+
+func NewAdapter(options AdapterOptions) *Adapter {
+ provider := strings.TrimSpace(options.Provider)
+ if provider == "" {
+ provider = providerName
+ }
+ return &Adapter{
+ provider: provider,
+ transportProvided: options.Transport,
+ transportOptions: options.TransportOptions,
+ translateRequestFn: func(request llm.Request, streaming bool) (translateRequestResult, error) {
+ if options.TranslateRequest != nil {
+ return options.TranslateRequest(request, streaming)
+ }
+ return translateRequest(request, streaming)
+ },
+ translateResponseFn: func(body map[string]any) (llm.Response, error) {
+ if options.TranslateResponse != nil {
+ return options.TranslateResponse(body)
+ }
+ return translateResponse(body)
+ },
+ translateStreamFn: func(events <-chan map[string]any) <-chan llm.StreamEvent {
+ if options.TranslateStream != nil {
+ return options.TranslateStream(events)
+ }
+ return translateStream(events)
+ },
+ }
+}
+
+func NewFromEnv() (*Adapter, error) {
+ opts, _ := transportOptionsFromEnv()
+ return NewAdapter(AdapterOptions{TransportOptions: opts}), nil
+}
+
+func (a *Adapter) Name() string { return a.provider }
+
+func (a *Adapter) Complete(ctx context.Context, req llm.Request) (llm.Response, error) {
+ resolved, err := resolveFileImages(req)
+ if err != nil {
+ return llm.Response{}, mapCodexError(err, a.provider, "complete")
+ }
+
+ translated, err := a.translateRequestFn(resolved, false)
+ if err != nil {
+ return llm.Response{}, mapCodexError(err, a.provider, "complete")
+ }
+
+ transport, err := a.getTransport()
+ if err != nil {
+ return llm.Response{}, mapCodexError(err, a.provider, "complete")
+ }
+
+ result, err := transport.Complete(ctx, translated.Payload)
+ if err != nil {
+ return llm.Response{}, mapCodexError(err, a.provider, "complete")
+ }
+ if embedded := extractTurnError(result); embedded != nil {
+ return llm.Response{}, mapCodexError(embedded, a.provider, "complete")
+ }
+
+ response, err := a.translateResponseFn(result)
+ if err != nil {
+ return llm.Response{}, mapCodexError(err, a.provider, "complete")
+ }
+ response.Provider = a.provider
+ if len(translated.Warnings) > 0 {
+ response.Warnings = append(response.Warnings, translated.Warnings...)
+ }
+ return response, nil
+}
+
+func (a *Adapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, error) {
+ resolved, err := resolveFileImages(req)
+ if err != nil {
+ return nil, mapCodexError(err, a.provider, "stream")
+ }
+ translated, err := a.translateRequestFn(resolved, true)
+ if err != nil {
+ return nil, mapCodexError(err, a.provider, "stream")
+ }
+
+ transport, err := a.getTransport()
+ if err != nil {
+ return nil, mapCodexError(err, a.provider, "stream")
+ }
+
+ sctx, cancel := context.WithCancel(ctx)
+ stream, err := transport.Stream(sctx, translated.Payload)
+ if err != nil {
+ cancel()
+ return nil, mapCodexError(err, a.provider, "stream")
+ }
+
+ out := llm.NewChanStream(cancel)
+ go func() {
+ defer cancel()
+ defer stream.Close()
+ defer out.CloseSend()
+
+ translatedEvents := a.translateStreamFn(stream.Notifications)
+ warningsAttached := false
+ for event := range translatedEvents {
+ if !warningsAttached && event.Type == llm.StreamEventStreamStart {
+ warningsAttached = true
+ if len(translated.Warnings) > 0 {
+ event.Warnings = append(event.Warnings, translated.Warnings...)
+ }
+ out.Send(event)
+ continue
+ }
+ out.Send(event)
+ }
+ if !warningsAttached && len(translated.Warnings) > 0 {
+ out.Send(llm.StreamEvent{Type: llm.StreamEventStreamStart, Warnings: translated.Warnings})
+ }
+ if stream.Err != nil {
+ if streamErr, ok := <-stream.Err; ok && streamErr != nil {
+ out.Send(llm.StreamEvent{Type: llm.StreamEventError, Err: mapCodexError(streamErr, a.provider, "stream")})
+ }
+ }
+ }()
+
+ return out, nil
+}
+
+func (a *Adapter) Initialize(ctx context.Context) error {
+ transport, err := a.getTransport()
+ if err != nil {
+ return mapCodexError(err, a.provider, "complete")
+ }
+ if err := transport.Initialize(ctx); err != nil {
+ return mapCodexError(err, a.provider, "complete")
+ }
+ return nil
+}
+
+func (a *Adapter) ListModels(ctx context.Context, params map[string]any) (modelListResponse, error) {
+ a.modelListMu.Lock()
+ if a.modelList != nil {
+ cached := *a.modelList
+ a.modelListMu.Unlock()
+ return cached, nil
+ }
+ a.modelListMu.Unlock()
+
+ transport, err := a.getTransport()
+ if err != nil {
+ return modelListResponse{}, mapCodexError(err, a.provider, "complete")
+ }
+ resp, err := transport.ListModels(ctx, params)
+ if err != nil {
+ return modelListResponse{}, mapCodexError(err, a.provider, "complete")
+ }
+
+ a.modelListMu.Lock()
+ cp := resp
+ a.modelList = &cp
+ a.modelListMu.Unlock()
+ return resp, nil
+}
+
+func (a *Adapter) GetDefaultModel(ctx context.Context) (*modelEntry, error) {
+ resp, err := a.ListModels(ctx, nil)
+ if err != nil {
+ return nil, err
+ }
+ for idx := range resp.Data {
+ if resp.Data[idx].IsDefault {
+ entry := resp.Data[idx]
+ return &entry, nil
+ }
+ }
+ return nil, nil
+}
+
+func (a *Adapter) Close() error {
+ a.transportMu.Lock()
+ transport := a.transport
+ a.transportMu.Unlock()
+ if transport == nil {
+ return nil
+ }
+ if err := transport.Close(); err != nil {
+ return mapCodexError(err, a.provider, "complete")
+ }
+ return nil
+}
+
+func (a *Adapter) getTransport() (codexTransport, error) {
+ if a.transportProvided != nil {
+ return a.transportProvided, nil
+ }
+ a.transportMu.Lock()
+ defer a.transportMu.Unlock()
+ if a.transport != nil {
+ return a.transport, nil
+ }
+ opts := a.transportOptions
+ if envOpts, ok := transportOptionsFromEnv(); ok {
+ if strings.TrimSpace(opts.Command) == "" {
+ opts.Command = envOpts.Command
+ }
+ if len(opts.Args) == 0 {
+ opts.Args = append([]string{}, envOpts.Args...)
+ }
+ }
+ a.transport = NewTransport(opts)
+ return a.transport, nil
+}
+
+func resolveFileImages(req llm.Request) (llm.Request, error) {
+ resolved := req
+ resolved.Messages = make([]llm.Message, len(req.Messages))
+ for mi, message := range req.Messages {
+ copyMessage := message
+ copyMessage.Content = make([]llm.ContentPart, len(message.Content))
+ for pi, part := range message.Content {
+ copyPart := part
+ if part.Kind == llm.ContentImage && part.Image != nil && len(part.Image.Data) == 0 {
+ url := strings.TrimSpace(part.Image.URL)
+ if isResolvableImagePath(url) {
+ path := resolveImagePath(url)
+ bytes, err := os.ReadFile(path)
+ if err != nil {
+ return llm.Request{}, err
+ }
+ mediaType := strings.TrimSpace(part.Image.MediaType)
+ if mediaType == "" {
+ mediaType = llm.InferMimeTypeFromPath(path)
+ }
+ if mediaType == "" {
+ mediaType = "image/png"
+ }
+ copyPart.Image = &llm.ImageData{
+ Data: bytes,
+ MediaType: mediaType,
+ Detail: part.Image.Detail,
+ }
+ }
+ }
+ copyMessage.Content[pi] = copyPart
+ }
+ resolved.Messages[mi] = copyMessage
+ }
+ return resolved, nil
+}
+
+func isResolvableImagePath(url string) bool {
+ if strings.TrimSpace(url) == "" {
+ return false
+ }
+ if strings.HasPrefix(strings.TrimSpace(url), "file://") {
+ return true
+ }
+ return llm.IsLocalPath(url)
+}
+
+func resolveImagePath(url string) string {
+ path := strings.TrimSpace(url)
+ if strings.HasPrefix(path, "file://") {
+ path = strings.TrimPrefix(path, "file://")
+ }
+ return llm.ExpandTilde(path)
+}
+
+type normalizedErrorInfo struct {
+ Message string
+ Status int
+ HasStatus bool
+ Code string
+ RetryAfter *time.Duration
+ Raw any
+}
+
+func extractTurnError(value map[string]any) any {
+ if value == nil {
+ return nil
+ }
+ if turnError, ok := value["turnError"]; ok && turnError != nil {
+ return turnError
+ }
+ if rootErr, ok := value["error"]; ok && rootErr != nil {
+ return rootErr
+ }
+ turn := asMap(value["turn"])
+ if turn == nil {
+ return nil
+ }
+ if turnErr, ok := turn["error"]; ok && turnErr != nil {
+ return turnErr
+ }
+ status := strings.ToLower(strings.TrimSpace(asString(turn["status"])))
+ if status == "failed" || status == "error" {
+ return turn
+ }
+ return nil
+}
+
+func mapCodexError(raw any, provider string, contextKind string) error {
+ if raw == nil {
+ return nil
+ }
+ if rawMap := asMap(raw); rawMap != nil {
+ if embedded := extractTurnError(rawMap); embedded != nil {
+ raw = embedded
+ }
+ }
+ if err, ok := raw.(error); ok {
+ if mapped := llm.WrapContextError(provider, err); mapped != err {
+ return mapped
+ }
+ var llmErr llm.Error
+ if errorsAs(err, &llmErr) {
+ return err
+ }
+ }
+
+ info := normalizeErrorInfo(raw)
+ code := normalizeCode(info.Code)
+
+ if isTransportFailure(code, info.Message) {
+ if contextKind == "stream" {
+ return llm.NewStreamError(provider, info.Message)
+ }
+ return llm.NewNetworkError(provider, info.Message)
+ }
+
+ if info.HasStatus {
+ return llm.ErrorFromHTTPStatus(provider, info.Status, info.Message, info.Raw, info.RetryAfter)
+ }
+
+ if class := classifyByCode(code); class != "" {
+ switch class {
+ case "invalid_request":
+ return llm.ErrorFromHTTPStatus(provider, 400, info.Message, info.Raw, nil)
+ case "auth":
+ return llm.ErrorFromHTTPStatus(provider, 401, info.Message, info.Raw, nil)
+ case "rate_limit":
+ return llm.ErrorFromHTTPStatus(provider, 429, info.Message, info.Raw, info.RetryAfter)
+ case "server":
+ return llm.ErrorFromHTTPStatus(provider, 500, info.Message, info.Raw, nil)
+ }
+ }
+
+ msg := strings.ToLower(info.Message)
+ switch {
+ case strings.Contains(msg, "context length"), strings.Contains(msg, "too many tokens"):
+ return llm.ErrorFromHTTPStatus(provider, 413, info.Message, info.Raw, nil)
+ case strings.Contains(msg, "content filter"), strings.Contains(msg, "safety"):
+ return llm.ErrorFromHTTPStatus(provider, 400, info.Message, info.Raw, nil)
+ case strings.Contains(msg, "quota"), strings.Contains(msg, "billing"):
+ return llm.ErrorFromHTTPStatus(provider, 429, info.Message, info.Raw, info.RetryAfter)
+ case strings.Contains(msg, "not found"), strings.Contains(msg, "does not exist"):
+ return llm.ErrorFromHTTPStatus(provider, 404, info.Message, info.Raw, nil)
+ case strings.Contains(msg, "unauthorized"), strings.Contains(msg, "invalid key"):
+ return llm.ErrorFromHTTPStatus(provider, 401, info.Message, info.Raw, nil)
+ case strings.Contains(msg, "model") && (strings.Contains(msg, "not supported") || strings.Contains(msg, "unsupported") || strings.Contains(msg, "unknown model")):
+ return llm.ErrorFromHTTPStatus(provider, 400, info.Message, info.Raw, nil)
+ }
+
+ if contextKind == "stream" {
+ return llm.NewStreamError(provider, info.Message)
+ }
+ return llm.NewNetworkError(provider, info.Message)
+}
+
+func normalizeErrorInfo(raw any) normalizedErrorInfo {
+ info := normalizedErrorInfo{
+ Message: "codex-app-server request failed",
+ Raw: raw,
+ }
+ root := asMap(raw)
+ nested := asMap(root["error"])
+ source := root
+ if nested != nil {
+ source = nested
+ }
+ if source == nil {
+ source = map[string]any{}
+ }
+
+ if err, ok := raw.(error); ok {
+ info.Message = err.Error()
+ }
+ if message := firstNonEmpty(asString(source["message"]), asString(root["message"])); message != "" {
+ info.Message = unwrapJSONMessage(message)
+ }
+
+ if statusVal, ok := source["status"]; ok {
+ if status, hasStatus := parseHTTPStatus(statusVal); hasStatus {
+ info.Status = status
+ info.HasStatus = true
+ }
+ } else if statusVal, ok := root["status"]; ok {
+ if status, hasStatus := parseHTTPStatus(statusVal); hasStatus {
+ info.Status = status
+ info.HasStatus = true
+ }
+ }
+
+ info.Code = firstNonEmpty(
+ asString(source["code"]),
+ asString(source["type"]),
+ asString(root["code"]),
+ asString(root["type"]),
+ )
+
+ retry := source["retryAfter"]
+ if retry == nil {
+ retry = source["retry_after"]
+ }
+ if retry == nil {
+ retry = root["retryAfter"]
+ }
+ if retry == nil {
+ retry = root["retry_after"]
+ }
+ if retry != nil {
+ seconds := asInt(retry, -1)
+ if seconds >= 0 {
+ d := time.Duration(seconds) * time.Second
+ info.RetryAfter = &d
+ }
+ }
+ return info
+}
+
+func unwrapJSONMessage(message string) string {
+ trimmed := strings.TrimSpace(message)
+ if !strings.HasPrefix(trimmed, "{") {
+ return message
+ }
+ dec := json.NewDecoder(strings.NewReader(trimmed))
+ dec.UseNumber()
+ var payload map[string]any
+ if err := dec.Decode(&payload); err != nil {
+ return message
+ }
+ if detail := firstNonEmpty(asString(payload["detail"]), asString(payload["message"])); detail != "" {
+ return detail
+ }
+ return message
+}
+
+func parseHTTPStatus(raw any) (int, bool) {
+ switch value := raw.(type) {
+ case int:
+ return normalizeHTTPStatus(value)
+ case int8:
+ return normalizeHTTPStatus(int(value))
+ case int16:
+ return normalizeHTTPStatus(int(value))
+ case int32:
+ return normalizeHTTPStatus(int(value))
+ case int64:
+ return normalizeHTTPStatus(int(value))
+ case uint:
+ return normalizeHTTPStatus(int(value))
+ case uint8:
+ return normalizeHTTPStatus(int(value))
+ case uint16:
+ return normalizeHTTPStatus(int(value))
+ case uint32:
+ return normalizeHTTPStatus(int(value))
+ case uint64:
+ return normalizeHTTPStatus(int(value))
+ case float32:
+ return normalizeHTTPStatus(int(value))
+ case float64:
+ return normalizeHTTPStatus(int(value))
+ case json.Number:
+ if parsed, err := value.Int64(); err == nil {
+ return normalizeHTTPStatus(int(parsed))
+ }
+ return 0, false
+ case string:
+ trimmed := strings.TrimSpace(value)
+ if trimmed == "" {
+ return 0, false
+ }
+ parsed, err := strconv.Atoi(trimmed)
+ if err != nil {
+ return 0, false
+ }
+ return normalizeHTTPStatus(parsed)
+ default:
+ return 0, false
+ }
+}
+
+func normalizeHTTPStatus(status int) (int, bool) {
+ if status < 100 || status > 599 {
+ return 0, false
+ }
+ return status, true
+}
+
+func isTransportFailure(code, message string) bool {
+ if code != "" {
+ switch code {
+ case "ECONNREFUSED", "ECONNRESET", "EPIPE", "ENOTFOUND", "EAI_AGAIN", "ETIMEDOUT", "EHOSTUNREACH", "ENETUNREACH", "ECONNABORTED":
+ return true
+ }
+ }
+ lower := strings.ToLower(message)
+ return strings.Contains(lower, "broken pipe") ||
+ strings.Contains(lower, "econnrefused") ||
+ strings.Contains(lower, "econnreset") ||
+ strings.Contains(lower, "epipe") ||
+ strings.Contains(lower, "spawn")
+}
+
+func classifyByCode(code string) string {
+ if code == "" {
+ return ""
+ }
+ switch {
+ case strings.Contains(code, "INVALID_REQUEST"), strings.Contains(code, "BAD_REQUEST"), strings.Contains(code, "UNSUPPORTED"), strings.Contains(code, "INVALID_ARGUMENT"), strings.Contains(code, "INVALID_INPUT"):
+ return "invalid_request"
+ case strings.Contains(code, "UNAUTHENTICATED"), strings.Contains(code, "INVALID_API_KEY"), strings.Contains(code, "AUTHENTICATION"):
+ return "auth"
+ case strings.Contains(code, "RATE_LIMIT"), strings.Contains(code, "TOO_MANY_REQUESTS"), strings.Contains(code, "RESOURCE_EXHAUSTED"):
+ return "rate_limit"
+ case strings.Contains(code, "INTERNAL"), strings.Contains(code, "SERVER_ERROR"), strings.Contains(code, "UNAVAILABLE"):
+ return "server"
+ default:
+ return ""
+ }
+}
+
+func errorsAs(err error, target any) bool {
+ if err == nil {
+ return false
+ }
+ return errors.As(err, target)
+}
diff --git a/internal/llm/providers/codexappserver/adapter_helpers_test.go b/internal/llm/providers/codexappserver/adapter_helpers_test.go
new file mode 100644
index 00000000..52019b8a
--- /dev/null
+++ b/internal/llm/providers/codexappserver/adapter_helpers_test.go
@@ -0,0 +1,280 @@
+package codexappserver
+
+import (
+ "context"
+ "errors"
+ "os"
+ "strings"
+ "testing"
+ "time"
+
+ "github.com/danshapiro/kilroy/internal/llm"
+)
+
+func TestAdapterHelpers_PathAndTurnExtraction(t *testing.T) {
+ if isResolvableImagePath("") {
+ t.Fatalf("empty image path should not be resolvable")
+ }
+ if !isResolvableImagePath("file:///tmp/image.png") {
+ t.Fatalf("file:// image path should be resolvable")
+ }
+ if !strings.HasSuffix(resolveImagePath("file:///tmp/image.png"), "/tmp/image.png") {
+ t.Fatalf("resolved file path mismatch: %q", resolveImagePath("file:///tmp/image.png"))
+ }
+
+ if got := extractTurnError(nil); got != nil {
+ t.Fatalf("nil map should not produce turn error, got %#v", got)
+ }
+ turnErr := map[string]any{"message": "bad turn"}
+ if got := asMap(extractTurnError(map[string]any{"turnError": turnErr})); got["message"] != "bad turn" {
+ t.Fatalf("turnError precedence mismatch: %#v", got)
+ }
+ rootErr := map[string]any{"message": "bad root"}
+ if got := asMap(extractTurnError(map[string]any{"error": rootErr})); got["message"] != "bad root" {
+ t.Fatalf("root error extraction mismatch: %#v", got)
+ }
+ turn := map[string]any{"id": "turn_1", "status": "failed"}
+ if got := asMap(extractTurnError(map[string]any{"turn": turn})); got["id"] != "turn_1" {
+ t.Fatalf("failed turn should be treated as error payload: %#v", got)
+ }
+}
+
+func TestAdapterHelpers_NormalizeAndClassifyErrorInfo(t *testing.T) {
+ info := normalizeErrorInfo(map[string]any{
+ "error": map[string]any{
+ "message": `{"detail":"wrapped detail"}`,
+ "status": 429,
+ "code": "RATE_LIMIT",
+ "retryAfter": 7,
+ },
+ })
+ if info.Message != "wrapped detail" {
+ t.Fatalf("message: got %q want %q", info.Message, "wrapped detail")
+ }
+ if !info.HasStatus || info.Status != 429 {
+ t.Fatalf("status: got has=%v status=%d", info.HasStatus, info.Status)
+ }
+ if info.Code != "RATE_LIMIT" {
+ t.Fatalf("code: got %q", info.Code)
+ }
+ if info.RetryAfter == nil || *info.RetryAfter != 7*time.Second {
+ t.Fatalf("retryAfter: got %#v", info.RetryAfter)
+ }
+
+ if got := unwrapJSONMessage("plain text"); got != "plain text" {
+ t.Fatalf("unwrap plain text: got %q", got)
+ }
+ if got := unwrapJSONMessage(`{"message":"msg fallback"}`); got != "msg fallback" {
+ t.Fatalf("unwrap json message fallback: got %q", got)
+ }
+ if !isTransportFailure("ECONNREFUSED", "ignored") {
+ t.Fatalf("expected transport failure by code")
+ }
+ if !isTransportFailure("", "broken pipe from child process") {
+ t.Fatalf("expected transport failure by message")
+ }
+ if isTransportFailure("INVALID_REQUEST", "input is malformed") {
+ t.Fatalf("did not expect invalid request to be transport failure")
+ }
+
+ if got := classifyByCode("INVALID_REQUEST"); got != "invalid_request" {
+ t.Fatalf("classify invalid_request: got %q", got)
+ }
+ if got := classifyByCode("UNAUTHENTICATED"); got != "auth" {
+ t.Fatalf("classify auth: got %q", got)
+ }
+ if got := classifyByCode("RESOURCE_EXHAUSTED"); got != "rate_limit" {
+ t.Fatalf("classify rate_limit: got %q", got)
+ }
+ if got := classifyByCode("SERVER_ERROR"); got != "server" {
+ t.Fatalf("classify server: got %q", got)
+ }
+ if got := classifyByCode("SOMETHING_ELSE"); got != "" {
+ t.Fatalf("unexpected classification: %q", got)
+ }
+
+ var target *llm.AuthenticationError
+ if errorsAs(nil, &target) {
+ t.Fatalf("errorsAs should return false for nil error")
+ }
+ authErr := llm.ErrorFromHTTPStatus("codex-app-server", 401, "bad key", nil, nil)
+ if !errorsAs(authErr, &target) {
+ t.Fatalf("expected errorsAs to match authentication error")
+ }
+}
+
+func TestAdapterHelpers_MapCodexError_Branches(t *testing.T) {
+ if err := mapCodexError(nil, providerName, "complete"); err != nil {
+ t.Fatalf("nil error should stay nil, got %v", err)
+ }
+
+ timeoutErr := mapCodexError(context.DeadlineExceeded, providerName, "complete")
+ var requestTimeout *llm.RequestTimeoutError
+ if !errors.As(timeoutErr, &requestTimeout) {
+ t.Fatalf("expected RequestTimeoutError from wrapped context deadline, got %T (%v)", timeoutErr, timeoutErr)
+ }
+
+ transportErr := mapCodexError(map[string]any{
+ "error": map[string]any{
+ "code": "EPIPE",
+ "message": "broken pipe",
+ },
+ }, providerName, "stream")
+ var streamErr *llm.StreamError
+ if !errors.As(transportErr, &streamErr) {
+ t.Fatalf("expected StreamError for stream transport failures, got %T (%v)", transportErr, transportErr)
+ }
+
+ statusErr := mapCodexError(map[string]any{
+ "error": map[string]any{
+ "status": 404,
+ "message": "not found",
+ },
+ }, providerName, "complete")
+ var notFound *llm.NotFoundError
+ if !errors.As(statusErr, ¬Found) {
+ t.Fatalf("expected NotFoundError from explicit status, got %T (%v)", statusErr, statusErr)
+ }
+
+ classifiedErr := mapCodexError(map[string]any{
+ "error": map[string]any{
+ "code": "RATE_LIMIT",
+ "message": "too many requests",
+ },
+ }, providerName, "complete")
+ var rateLimit *llm.RateLimitError
+ if !errors.As(classifiedErr, &rateLimit) {
+ t.Fatalf("expected RateLimitError from classified code, got %T (%v)", classifiedErr, classifiedErr)
+ }
+
+ messageHintErr := mapCodexError(map[string]any{
+ "message": "model not supported for this endpoint",
+ }, providerName, "complete")
+ var invalidRequest *llm.InvalidRequestError
+ if !errors.As(messageHintErr, &invalidRequest) {
+ t.Fatalf("expected InvalidRequestError from message hint, got %T (%v)", messageHintErr, messageHintErr)
+ }
+
+ fallbackErr := mapCodexError(map[string]any{
+ "message": "completely unknown failure",
+ }, providerName, "complete")
+ var networkErr *llm.NetworkError
+ if !errors.As(fallbackErr, &networkErr) {
+ t.Fatalf("expected NetworkError fallback for unknown complete errors, got %T (%v)", fallbackErr, fallbackErr)
+ }
+}
+
+func TestAdapterHelpers_BasicLifecycleAndModelSelection(t *testing.T) {
+ t.Setenv(envCommand, "codex-test")
+ adapterFromEnv, err := NewFromEnv()
+ if err != nil {
+ t.Fatalf("NewFromEnv: %v", err)
+ }
+ if adapterFromEnv == nil {
+ t.Fatalf("expected adapter from env")
+ }
+ if adapterFromEnv.Name() != providerName {
+ t.Fatalf("Name: got %q want %q", adapterFromEnv.Name(), providerName)
+ }
+ if adapterFromEnv.transportOptions.Command != "codex-test" {
+ t.Fatalf("transport command from env: got %q", adapterFromEnv.transportOptions.Command)
+ }
+
+ initCalls := 0
+ closeCalls := 0
+ adapter := NewAdapter(AdapterOptions{
+ Transport: &fakeTransport{
+ initializeFn: func(ctx context.Context) error {
+ initCalls++
+ return nil
+ },
+ closeFn: func() error {
+ closeCalls++
+ return nil
+ },
+ listFn: func(ctx context.Context, params map[string]any) (modelListResponse, error) {
+ return modelListResponse{
+ Data: []modelEntry{
+ {ID: "model_a", Model: "codex-mini"},
+ {ID: "model_b", Model: "codex-pro", IsDefault: true},
+ },
+ }, nil
+ },
+ },
+ })
+
+ if err := adapter.Initialize(context.Background()); err != nil {
+ t.Fatalf("Initialize: %v", err)
+ }
+ if initCalls != 1 {
+ t.Fatalf("initialize calls: got %d want 1", initCalls)
+ }
+
+ def, err := adapter.GetDefaultModel(context.Background())
+ if err != nil {
+ t.Fatalf("GetDefaultModel: %v", err)
+ }
+ if def == nil || def.ID != "model_b" {
+ t.Fatalf("default model mismatch: %#v", def)
+ }
+
+ adapter.transport = &fakeTransport{
+ closeFn: func() error {
+ closeCalls++
+ return nil
+ },
+ }
+ if err := adapter.Close(); err != nil {
+ t.Fatalf("Close: %v", err)
+ }
+ if closeCalls != 1 {
+ t.Fatalf("close calls: got %d want 1", closeCalls)
+ }
+
+ adapterNoTransport := NewAdapter(AdapterOptions{})
+ if err := adapterNoTransport.Close(); err != nil {
+ t.Fatalf("Close without transport should succeed, got %v", err)
+ }
+}
+
+func TestAdapterHelpers_ProviderOverride(t *testing.T) {
+ adapter := NewAdapter(AdapterOptions{Provider: "custom-codex-provider"})
+ if got := adapter.Name(); got != "custom-codex-provider" {
+ t.Fatalf("Name: got %q want %q", got, "custom-codex-provider")
+ }
+}
+
+func TestAdapterHelpers_GetTransport_CachesAndRespectsProvidedTransport(t *testing.T) {
+ provided := &fakeTransport{}
+ adapterWithProvided := NewAdapter(AdapterOptions{Transport: provided})
+ got, err := adapterWithProvided.getTransport()
+ if err != nil {
+ t.Fatalf("getTransport provided: %v", err)
+ }
+ if got != provided {
+ t.Fatalf("expected provided transport to be returned")
+ }
+
+ t.Setenv(envCommand, "")
+ t.Setenv(envArgs, "")
+ t.Setenv(envCommandArgs, "")
+ _ = os.Unsetenv(envCommand)
+
+ adapter := NewAdapter(AdapterOptions{
+ TransportOptions: TransportOptions{
+ Command: "codex-custom",
+ Args: []string{"app-server", "--listen", "stdio://"},
+ },
+ })
+ first, err := adapter.getTransport()
+ if err != nil {
+ t.Fatalf("getTransport first: %v", err)
+ }
+ second, err := adapter.getTransport()
+ if err != nil {
+ t.Fatalf("getTransport second: %v", err)
+ }
+ if first != second {
+ t.Fatalf("expected getTransport to cache created transport instance")
+ }
+}
diff --git a/internal/llm/providers/codexappserver/adapter_test.go b/internal/llm/providers/codexappserver/adapter_test.go
new file mode 100644
index 00000000..3c544431
--- /dev/null
+++ b/internal/llm/providers/codexappserver/adapter_test.go
@@ -0,0 +1,358 @@
+package codexappserver
+
+import (
+ "context"
+ "encoding/json"
+ "errors"
+ "os"
+ "strings"
+ "testing"
+ "time"
+
+ "github.com/danshapiro/kilroy/internal/llm"
+)
+
+type fakeTransport struct {
+ initializeFn func(ctx context.Context) error
+ closeFn func() error
+ completeFn func(ctx context.Context, payload map[string]any) (map[string]any, error)
+ streamFn func(ctx context.Context, payload map[string]any) (*NotificationStream, error)
+ listFn func(ctx context.Context, params map[string]any) (modelListResponse, error)
+}
+
+func (f *fakeTransport) Initialize(ctx context.Context) error {
+ if f.initializeFn != nil {
+ return f.initializeFn(ctx)
+ }
+ return nil
+}
+
+func (f *fakeTransport) Close() error {
+ if f.closeFn != nil {
+ return f.closeFn()
+ }
+ return nil
+}
+
+func (f *fakeTransport) Complete(ctx context.Context, payload map[string]any) (map[string]any, error) {
+ if f.completeFn != nil {
+ return f.completeFn(ctx, payload)
+ }
+ return map[string]any{}, nil
+}
+
+func (f *fakeTransport) Stream(ctx context.Context, payload map[string]any) (*NotificationStream, error) {
+ if f.streamFn != nil {
+ return f.streamFn(ctx, payload)
+ }
+ events := make(chan map[string]any)
+ errs := make(chan error)
+ close(events)
+ close(errs)
+ return &NotificationStream{Notifications: events, Err: errs, closeFn: func() {}}, nil
+}
+
+func (f *fakeTransport) ListModels(ctx context.Context, params map[string]any) (modelListResponse, error) {
+ if f.listFn != nil {
+ return f.listFn(ctx, params)
+ }
+ return modelListResponse{Data: []modelEntry{}, NextCursor: nil}, nil
+}
+
+func TestAdapterComplete_UsesTransportAndMergesWarnings(t *testing.T) {
+ var seenPayload map[string]any
+ transport := &fakeTransport{
+ completeFn: func(ctx context.Context, payload map[string]any) (map[string]any, error) {
+ seenPayload = payload
+ return map[string]any{"turn": map[string]any{"id": "turn_1", "status": "completed", "items": []any{}}}, nil
+ },
+ }
+ adapter := NewAdapter(AdapterOptions{
+ Transport: transport,
+ TranslateRequest: func(request llm.Request, streaming bool) (translateRequestResult, error) {
+ return translateRequestResult{
+ Payload: map[string]any{"input": []any{}, "threadId": defaultThreadID},
+ Warnings: []llm.Warning{{Message: "Dropped unsupported audio", Code: "unsupported_part"}},
+ }, nil
+ },
+ TranslateResponse: func(body map[string]any) (llm.Response, error) {
+ return llm.Response{
+ ID: "resp_1",
+ Model: "codex-mini",
+ Provider: providerName,
+ Message: llm.Assistant("done"),
+ Finish: llm.FinishReason{Reason: llm.FinishReasonStop},
+ Usage: llm.Usage{InputTokens: 1, OutputTokens: 2, TotalTokens: 3},
+ Warnings: []llm.Warning{{Message: "Deprecated field"}},
+ }, nil
+ },
+ })
+
+ resp, err := adapter.Complete(context.Background(), llm.Request{Model: "codex-mini", Messages: []llm.Message{llm.User("hello")}})
+ if err != nil {
+ t.Fatalf("Complete: %v", err)
+ }
+ if seenPayload == nil {
+ t.Fatalf("transport payload not captured")
+ }
+ if len(resp.Warnings) != 2 {
+ t.Fatalf("warnings len: got %d want 2", len(resp.Warnings))
+ }
+ if resp.Warnings[0].Message != "Deprecated field" || resp.Warnings[1].Code != "unsupported_part" {
+ t.Fatalf("warnings mismatch: %+v", resp.Warnings)
+ }
+}
+
+func TestAdapterComplete_MapsTurnErrors(t *testing.T) {
+ transport := &fakeTransport{
+ completeFn: func(ctx context.Context, payload map[string]any) (map[string]any, error) {
+ return map[string]any{
+ "turn": map[string]any{
+ "id": "turn_bad",
+ "status": "failed",
+ "error": map[string]any{
+ "status": 429,
+ "code": "RATE_LIMITED",
+ "message": "too many requests",
+ },
+ },
+ }, nil
+ },
+ }
+ adapter := NewAdapter(AdapterOptions{
+ Transport: transport,
+ TranslateRequest: func(request llm.Request, streaming bool) (translateRequestResult, error) {
+ return translateRequestResult{Payload: map[string]any{"input": []any{}, "threadId": defaultThreadID}}, nil
+ },
+ TranslateResponse: func(body map[string]any) (llm.Response, error) {
+ return llm.Response{}, nil
+ },
+ })
+
+ _, err := adapter.Complete(context.Background(), llm.Request{Model: "codex-mini", Messages: []llm.Message{llm.User("hello")}})
+ if err == nil {
+ t.Fatalf("expected complete error")
+ }
+ var rateLimit *llm.RateLimitError
+ if !errors.As(err, &rateLimit) {
+ t.Fatalf("expected RateLimitError, got %T (%v)", err, err)
+ }
+}
+
+func TestAdapterStream_AttachesWarningsToStreamStart(t *testing.T) {
+ transport := &fakeTransport{
+ streamFn: func(ctx context.Context, payload map[string]any) (*NotificationStream, error) {
+ events := make(chan map[string]any, 2)
+ errs := make(chan error, 1)
+ events <- map[string]any{"method": "turn/started", "params": map[string]any{"turn": map[string]any{"id": "turn_1", "status": "inProgress", "items": []any{}}}}
+ events <- map[string]any{"method": "turn/completed", "params": map[string]any{"turn": map[string]any{"id": "turn_1", "status": "completed", "items": []any{}}}}
+ close(events)
+ close(errs)
+ return &NotificationStream{Notifications: events, Err: errs, closeFn: func() {}}, nil
+ },
+ }
+ adapter := NewAdapter(AdapterOptions{
+ Transport: transport,
+ TranslateRequest: func(request llm.Request, streaming bool) (translateRequestResult, error) {
+ return translateRequestResult{
+ Payload: map[string]any{"input": []any{}, "threadId": defaultThreadID},
+ Warnings: []llm.Warning{{Message: "Tool output truncated", Code: "truncated"}},
+ }, nil
+ },
+ })
+
+ stream, err := adapter.Stream(context.Background(), llm.Request{Model: "codex-mini", Messages: []llm.Message{llm.User("hello")}})
+ if err != nil {
+ t.Fatalf("Stream: %v", err)
+ }
+ defer stream.Close()
+
+ var start *llm.StreamEvent
+ for event := range stream.Events() {
+ if event.Type == llm.StreamEventStreamStart {
+ copyEvent := event
+ start = ©Event
+ }
+ }
+ if start == nil {
+ t.Fatalf("expected stream start event")
+ }
+ if len(start.Warnings) != 1 || start.Warnings[0].Code != "truncated" {
+ t.Fatalf("stream start warnings mismatch: %+v", start.Warnings)
+ }
+}
+
+func TestAdapter_ListModelsCachesFirstResponse(t *testing.T) {
+ calls := 0
+ transport := &fakeTransport{
+ listFn: func(ctx context.Context, params map[string]any) (modelListResponse, error) {
+ calls++
+ return modelListResponse{Data: []modelEntry{{ID: "1", Model: "codex-mini", IsDefault: true}}}, nil
+ },
+ }
+ adapter := NewAdapter(AdapterOptions{Transport: transport})
+
+ first, err := adapter.ListModels(context.Background(), map[string]any{"limit": 1})
+ if err != nil {
+ t.Fatalf("ListModels first: %v", err)
+ }
+ second, err := adapter.ListModels(context.Background(), map[string]any{"limit": 99})
+ if err != nil {
+ t.Fatalf("ListModels second: %v", err)
+ }
+ if calls != 1 {
+ t.Fatalf("list call count: got %d want 1", calls)
+ }
+ if len(first.Data) != 1 || len(second.Data) != 1 {
+ t.Fatalf("cached model list mismatch: first=%+v second=%+v", first, second)
+ }
+}
+
+func TestResolveFileImages_LoadsLocalPathData(t *testing.T) {
+ dir := t.TempDir()
+ path := dir + "/image.png"
+ if err := os.WriteFile(path, []byte{0x89, 0x50, 0x4e, 0x47}, 0o644); err != nil {
+ t.Fatalf("write image: %v", err)
+ }
+
+ resolved, err := resolveFileImages(llm.Request{
+ Model: "codex-mini",
+ Messages: []llm.Message{{
+ Role: llm.RoleUser,
+ Content: []llm.ContentPart{{
+ Kind: llm.ContentImage,
+ Image: &llm.ImageData{URL: path},
+ }},
+ }},
+ })
+ if err != nil {
+ t.Fatalf("resolveFileImages: %v", err)
+ }
+ part := resolved.Messages[0].Content[0]
+ if part.Image == nil || len(part.Image.Data) == 0 || part.Image.URL != "" {
+ t.Fatalf("resolved image mismatch: %+v", part.Image)
+ }
+ if part.Image.MediaType != "image/png" {
+ t.Fatalf("media type mismatch: got %q want %q", part.Image.MediaType, "image/png")
+ }
+}
+
+func TestMapCodexError_UnsupportedModelMessageBecomesInvalidRequest(t *testing.T) {
+ err := mapCodexError(map[string]any{
+ "turn": map[string]any{
+ "id": "turn_unsupported",
+ "status": "failed",
+ "error": map[string]any{
+ "message": "{\"detail\":\"The 'nonexistent-model-xyz' model is not supported when using Codex with a ChatGPT account.\"}",
+ },
+ },
+ }, providerName, "complete")
+
+ var invalid *llm.InvalidRequestError
+ if !errors.As(err, &invalid) {
+ t.Fatalf("expected InvalidRequestError, got %T (%v)", err, err)
+ }
+}
+
+func TestAdapterStream_PropagatesTransportErrors(t *testing.T) {
+ transport := &fakeTransport{
+ streamFn: func(ctx context.Context, payload map[string]any) (*NotificationStream, error) {
+ events := make(chan map[string]any)
+ errs := make(chan error, 1)
+ close(events)
+ errs <- context.DeadlineExceeded
+ close(errs)
+ return &NotificationStream{Notifications: events, Err: errs, closeFn: func() {}}, nil
+ },
+ }
+ adapter := NewAdapter(AdapterOptions{
+ Transport: transport,
+ TranslateRequest: func(request llm.Request, streaming bool) (translateRequestResult, error) {
+ return translateRequestResult{Payload: map[string]any{"input": []any{}, "threadId": defaultThreadID}}, nil
+ },
+ TranslateStream: func(events <-chan map[string]any) <-chan llm.StreamEvent {
+ out := make(chan llm.StreamEvent)
+ close(out)
+ return out
+ },
+ })
+
+ stream, err := adapter.Stream(context.Background(), llm.Request{Model: "codex-mini", Messages: []llm.Message{llm.User("hello")}})
+ if err != nil {
+ t.Fatalf("Stream: %v", err)
+ }
+ defer stream.Close()
+
+ var gotErr error
+ timeout := time.After(2 * time.Second)
+loop:
+ for {
+ select {
+ case event, ok := <-stream.Events():
+ if !ok {
+ break loop
+ }
+ if event.Type == llm.StreamEventError {
+ gotErr = event.Err
+ }
+ case <-timeout:
+ t.Fatalf("timed out waiting for stream events")
+ }
+ }
+ if gotErr == nil {
+ t.Fatalf("expected stream error event")
+ }
+ var timeoutErr *llm.RequestTimeoutError
+ if !errors.As(gotErr, &timeoutErr) {
+ t.Fatalf("expected RequestTimeoutError, got %T (%v)", gotErr, gotErr)
+ }
+}
+
+var _ codexTransport = (*fakeTransport)(nil)
+
+func TestNormalizeErrorInfo_UnwrapsJSONMessage(t *testing.T) {
+ info := normalizeErrorInfo(map[string]any{"message": `{"detail":"wrapped detail"}`})
+ if info.Message != "wrapped detail" {
+ t.Fatalf("message: got %q want %q", info.Message, "wrapped detail")
+ }
+}
+
+func TestNormalizeErrorInfo_IgnoresSymbolicStatus(t *testing.T) {
+ info := normalizeErrorInfo(map[string]any{
+ "error": map[string]any{
+ "status": "RESOURCE_EXHAUSTED",
+ "message": "rate limited",
+ },
+ })
+ if info.HasStatus {
+ t.Fatalf("expected symbolic status to be ignored, got status=%d", info.Status)
+ }
+}
+
+func TestNormalizeErrorInfo_ParsesNumericStatusString(t *testing.T) {
+ info := normalizeErrorInfo(map[string]any{
+ "error": map[string]any{
+ "status": "429",
+ "message": "rate limited",
+ },
+ })
+ if !info.HasStatus || info.Status != 429 {
+ t.Fatalf("expected HTTP status 429, got hasStatus=%v status=%d", info.HasStatus, info.Status)
+ }
+}
+
+func TestParseToolCall_NormalizesArguments(t *testing.T) {
+ tool := parseToolCall(`{"id":"call_1","name":"search","arguments":{"q":"foo"}}`)
+ if tool == nil {
+ t.Fatalf("expected tool call")
+ }
+ if tool.ID != "call_1" || tool.Name != "search" {
+ t.Fatalf("tool metadata mismatch: %+v", tool)
+ }
+ if strings.TrimSpace(string(tool.Arguments)) != `{"q":"foo"}` {
+ t.Fatalf("tool args mismatch: %q", string(tool.Arguments))
+ }
+ if !json.Valid(tool.Arguments) {
+ t.Fatalf("tool args should be valid json: %q", string(tool.Arguments))
+ }
+}
diff --git a/internal/llm/providers/codexappserver/env_config.go b/internal/llm/providers/codexappserver/env_config.go
new file mode 100644
index 00000000..28481aaf
--- /dev/null
+++ b/internal/llm/providers/codexappserver/env_config.go
@@ -0,0 +1,103 @@
+package codexappserver
+
+import (
+ "encoding/json"
+ "os"
+ "os/exec"
+ "regexp"
+ "strings"
+)
+
+const (
+ envCommand = "CODEX_APP_SERVER_COMMAND"
+ envArgs = "CODEX_APP_SERVER_ARGS"
+ envCommandArgs = "CODEX_APP_SERVER_COMMAND_ARGS"
+ envAutoDiscover = "CODEX_APP_SERVER_AUTO_DISCOVER"
+)
+
+var (
+ getenv = os.Getenv
+ lookPath = exec.LookPath
+ shellArgSplitRE = regexp.MustCompile(`(?:[^\s"']+|"[^"]*"|'[^']*')+`)
+)
+
+func parseArgs(raw string) []string {
+ trimmed := strings.TrimSpace(raw)
+ if trimmed == "" {
+ return nil
+ }
+ if strings.HasPrefix(trimmed, "[") {
+ var parsed []string
+ if err := json.Unmarshal([]byte(trimmed), &parsed); err == nil {
+ out := make([]string, 0, len(parsed))
+ for _, arg := range parsed {
+ if strings.TrimSpace(arg) != "" {
+ out = append(out, arg)
+ }
+ }
+ if len(out) > 0 {
+ return out
+ }
+ }
+ }
+ parts := shellArgSplitRE.FindAllString(trimmed, -1)
+ if len(parts) == 0 {
+ return nil
+ }
+ out := make([]string, 0, len(parts))
+ for _, part := range parts {
+ if len(part) >= 2 {
+ if (strings.HasPrefix(part, "\"") && strings.HasSuffix(part, "\"")) ||
+ (strings.HasPrefix(part, "'") && strings.HasSuffix(part, "'")) {
+ part = part[1 : len(part)-1]
+ }
+ }
+ if strings.TrimSpace(part) != "" {
+ out = append(out, part)
+ }
+ }
+ if len(out) == 0 {
+ return nil
+ }
+ return out
+}
+
+func transportOptionsFromEnv() (TransportOptions, bool) {
+ opts := TransportOptions{}
+ hasExplicitOverride := false
+ if cmd := strings.TrimSpace(getenv(envCommand)); cmd != "" {
+ opts.Command = cmd
+ hasExplicitOverride = true
+ }
+ argsRaw := getenv(envArgs)
+ if strings.TrimSpace(argsRaw) == "" {
+ argsRaw = getenv(envCommandArgs)
+ }
+ if args := parseArgs(argsRaw); len(args) > 0 {
+ opts.Args = args
+ hasExplicitOverride = true
+ }
+ if hasExplicitOverride {
+ return opts, true
+ }
+
+ // If no explicit overrides are provided, only enable env registration when
+ // explicit auto-discovery is enabled and the default codex command is
+ // available on PATH.
+ if !isTruthyEnvValue(getenv(envAutoDiscover)) {
+ return TransportOptions{}, false
+ }
+ if _, err := lookPath(defaultCommand); err == nil {
+ return opts, true
+ }
+ return TransportOptions{}, false
+}
+
+func isTruthyEnvValue(raw string) bool {
+ switch strings.ToLower(strings.TrimSpace(raw)) {
+ case "1", "true", "yes", "y", "on":
+ return true
+ default:
+ return false
+ }
+}
diff --git a/internal/llm/providers/codexappserver/env_config_test.go b/internal/llm/providers/codexappserver/env_config_test.go
new file mode 100644
index 00000000..dfba059d
--- /dev/null
+++ b/internal/llm/providers/codexappserver/env_config_test.go
@@ -0,0 +1,109 @@
+package codexappserver
+
+import (
+ "errors"
+ "testing"
+)
+
+func TestParseArgs_JSONAndShellFormats(t *testing.T) {
+ if got := parseArgs(`["app-server","--listen","stdio://"]`); len(got) != 3 || got[0] != "app-server" {
+ t.Fatalf("json parse args mismatch: %#v", got)
+ }
+ if got := parseArgs(`app-server --listen "stdio://"`); len(got) != 3 || got[2] != "stdio://" {
+ t.Fatalf("shell parse args mismatch: %#v", got)
+ }
+}
+
+func TestTransportOptionsFromEnv(t *testing.T) {
+ origGetenv := getenv
+ origLookPath := lookPath
+ t.Cleanup(func() {
+ getenv = origGetenv
+ lookPath = origLookPath
+ })
+ values := map[string]string{
+ envCommand: "codex-bin",
+ envArgs: `app-server --listen stdio://`,
+ }
+ getenv = func(key string) string { return values[key] }
+
+ opts, ok := transportOptionsFromEnv()
+ if !ok {
+ t.Fatalf("expected enabled transport options")
+ }
+ if opts.Command != "codex-bin" {
+ t.Fatalf("command: got %q", opts.Command)
+ }
+ if len(opts.Args) != 3 {
+ t.Fatalf("args: %#v", opts.Args)
+ }
+}
+
+func TestTransportOptionsFromEnv_DisabledWithoutExplicitOverridesOrOptIn(t *testing.T) {
+ origGetenv := getenv
+ origLookPath := lookPath
+ t.Cleanup(func() {
+ getenv = origGetenv
+ lookPath = origLookPath
+ })
+ getenv = func(string) string { return "" }
+ lookPath = func(string) (string, error) { return "/usr/bin/codex", nil }
+
+ opts, ok := transportOptionsFromEnv()
+ if ok {
+ t.Fatalf("expected transport options to remain disabled without explicit opt-in")
+ }
+ if opts.Command != "" {
+ t.Fatalf("command: got %q want empty", opts.Command)
+ }
+ if len(opts.Args) != 0 {
+ t.Fatalf("args: %#v", opts.Args)
+ }
+}
+
+func TestTransportOptionsFromEnv_EnabledWhenAutoDiscoverOptInAndCodexPresent(t *testing.T) {
+ origGetenv := getenv
+ origLookPath := lookPath
+ t.Cleanup(func() {
+ getenv = origGetenv
+ lookPath = origLookPath
+ })
+ values := map[string]string{
+ envAutoDiscover: "1",
+ }
+ getenv = func(key string) string { return values[key] }
+ lookPath = func(string) (string, error) { return "/usr/bin/codex", nil }
+
+ opts, ok := transportOptionsFromEnv()
+ if !ok {
+ t.Fatalf("expected transport options enabled with explicit auto-discover opt-in")
+ }
+ if opts.Command != "" {
+ t.Fatalf("command: got %q want empty", opts.Command)
+ }
+ if len(opts.Args) != 0 {
+ t.Fatalf("args: %#v", opts.Args)
+ }
+}
+
+func TestTransportOptionsFromEnv_DisabledWhenAutoDiscoverOptInButCodexMissing(t *testing.T) {
+ origGetenv := getenv
+ origLookPath := lookPath
+ t.Cleanup(func() {
+ getenv = origGetenv
+ lookPath = origLookPath
+ })
+ values := map[string]string{
+ envAutoDiscover: "true",
+ }
+ getenv = func(key string) string { return values[key] }
+ lookPath = func(string) (string, error) { return "", errors.New("not found") }
+
+ opts, ok := transportOptionsFromEnv()
+ if ok {
+ t.Fatalf("expected disabled transport options when codex is unavailable")
+ }
+ if opts.Command != "" || len(opts.Args) != 0 {
+ t.Fatalf("expected empty opts when disabled, got: %+v", opts)
+ }
+}
diff --git a/internal/llm/providers/codexappserver/protocol_types.go b/internal/llm/providers/codexappserver/protocol_types.go
new file mode 100644
index 00000000..50f4d2d2
--- /dev/null
+++ b/internal/llm/providers/codexappserver/protocol_types.go
@@ -0,0 +1,39 @@
+package codexappserver
+
+type jsonRPCError struct {
+ Code int `json:"code"`
+ Message string `json:"message"`
+ Data any `json:"data,omitempty"`
+}
+
+type jsonRPCMessage struct {
+ JSONRPC string `json:"jsonrpc,omitempty"`
+ ID any `json:"id,omitempty"`
+ Method string `json:"method,omitempty"`
+ Params any `json:"params,omitempty"`
+ Result any `json:"result,omitempty"`
+ Error *jsonRPCError `json:"error,omitempty"`
+}
+
+type modelReasoningEffort struct {
+ Effort string `json:"effort"`
+ Description string `json:"description"`
+}
+
+type modelEntry struct {
+ ID string `json:"id"`
+ Model string `json:"model"`
+ DisplayName string `json:"displayName"`
+ Hidden bool `json:"hidden"`
+ IsDefault bool `json:"isDefault"`
+ DefaultReasoningEffort string `json:"defaultReasoningEffort"`
+ ReasoningEffort []modelReasoningEffort `json:"reasoningEffort"`
+ InputModalities []string `json:"inputModalities"`
+ SupportsPersonality bool `json:"supportsPersonality"`
+ Upgrade any `json:"upgrade,omitempty"`
+}
+
+type modelListResponse struct {
+ Data []modelEntry `json:"data"`
+ NextCursor any `json:"nextCursor"`
+}
diff --git a/internal/llm/providers/codexappserver/request_translator.go b/internal/llm/providers/codexappserver/request_translator.go
new file mode 100644
index 00000000..afba28c7
--- /dev/null
+++ b/internal/llm/providers/codexappserver/request_translator.go
@@ -0,0 +1,715 @@
+package codexappserver
+
+import (
+ "bytes"
+ "encoding/json"
+ "fmt"
+ "regexp"
+ "strings"
+
+ "github.com/danshapiro/kilroy/internal/llm"
+)
+
+const (
+ transcriptBeginMarker = "[[[UNIFIED_TRANSCRIPT_V1_BEGIN]]]"
+ transcriptEndMarker = "[[[UNIFIED_TRANSCRIPT_V1_END]]]"
+ transcriptPayloadBeginMarker = "[[[UNIFIED_TRANSCRIPT_PAYLOAD_BEGIN]]]"
+ transcriptPayloadEndMarker = "[[[UNIFIED_TRANSCRIPT_PAYLOAD_END]]]"
+ toolCallBeginMarker = "[[TOOL_CALL]]"
+ toolCallEndMarker = "[[/TOOL_CALL]]"
+ defaultThreadID = "thread_stateless"
+ transcriptVersion = "unified.codex-app-server.request.v1"
+ defaultReasoningEffort = "high"
+)
+
+var (
+ jsonObjectOutputSchema = map[string]any{
+ "type": "object",
+ "properties": map[string]any{},
+ "additionalProperties": true,
+ }
+ supportedReasoningEfforts = map[string]struct{}{
+ "none": {},
+ "minimal": {},
+ "low": {},
+ "medium": {},
+ "high": {},
+ "xhigh": {},
+ }
+ turnOptionKeyMap = map[string]string{
+ "cwd": "cwd",
+ "approvalPolicy": "approvalPolicy",
+ "approval_policy": "approvalPolicy",
+ "sandbox": "sandbox",
+ "sandbox_mode": "sandbox",
+ "sandboxPolicy": "sandboxPolicy",
+ "sandbox_policy": "sandboxPolicy",
+ "model": "model",
+ "effort": "effort",
+ "summary": "summary",
+ "personality": "personality",
+ "collaborationMode": "collaborationMode",
+ "collaboration_mode": "collaborationMode",
+ "outputSchema": "outputSchema",
+ "output_schema": "outputSchema",
+ }
+ controlOptionKeyMap = map[string]string{
+ "temperature": "temperature",
+ "topP": "topP",
+ "top_p": "topP",
+ "maxTokens": "maxTokens",
+ "max_tokens": "maxTokens",
+ "stopSequences": "stopSequences",
+ "stop_sequences": "stopSequences",
+ "metadata": "metadata",
+ "reasoningEffort": "reasoningEffort",
+ "reasoning_effort": "reasoningEffort",
+ }
+ uriSchemeRE = regexp.MustCompile(`^[a-zA-Z][a-zA-Z\d+\-.]*:`)
+)
+
+type resolvedToolChoice struct {
+ Mode string `json:"mode"`
+ ToolName string `json:"toolName,omitempty"`
+}
+
+type transcriptControls struct {
+ Model string `json:"model"`
+ ToolChoice resolvedToolChoice `json:"toolChoice"`
+ ResponseFormat map[string]any `json:"responseFormat"`
+ Temperature *float64 `json:"temperature,omitempty"`
+ TopP *float64 `json:"topP,omitempty"`
+ MaxTokens *int `json:"maxTokens,omitempty"`
+ StopSequences []string `json:"stopSequences,omitempty"`
+ ReasoningEff string `json:"reasoningEffort,omitempty"`
+ Metadata map[string]interface{} `json:"metadata,omitempty"`
+}
+
+type transcriptPayload struct {
+ Version string `json:"version"`
+ ToolCallProtocol map[string]string `json:"toolCallProtocol"`
+ Controls transcriptControls `json:"controls"`
+ Tools []map[string]any `json:"tools"`
+ History []map[string]any `json:"history"`
+}
+
+type translateRequestResult struct {
+ Payload map[string]any
+ Warnings []llm.Warning
+}
+
+func translateRequest(request llm.Request, _ bool) (translateRequestResult, error) {
+ warnings := make([]llm.Warning, 0, 4)
+ toolChoice := normalizeToolChoice(request)
+ if err := validateToolChoice(toolChoice, request); err != nil {
+ return translateRequestResult{}, err
+ }
+
+ var reasoningInput string
+ if request.ReasoningEffort != nil {
+ reasoningInput = *request.ReasoningEffort
+ }
+ reasoningEffort := normalizeReasoningEffort(reasoningInput, &warnings, "request.reasoningEffort")
+ if reasoningEffort == "" {
+ reasoningEffort = defaultReasoningEffort
+ }
+
+ controls := transcriptControls{
+ Model: request.Model,
+ ToolChoice: toolChoice,
+ ResponseFormat: responseFormatForTranscript(request.ResponseFormat),
+ Temperature: request.Temperature,
+ TopP: request.TopP,
+ MaxTokens: request.MaxTokens,
+ StopSequences: append([]string{}, request.StopSequences...),
+ ReasoningEff: reasoningEffort,
+ Metadata: metadataForTranscript(request.Metadata),
+ }
+
+ history, imageInputs := translateMessages(request.Messages, &warnings)
+
+ params := map[string]any{
+ "threadId": defaultThreadID,
+ "model": request.Model,
+ "effort": controls.ReasoningEff,
+ }
+ if outputSchema := resolveOutputSchema(request.ResponseFormat); outputSchema != nil {
+ params["outputSchema"] = outputSchema
+ }
+
+ applyProviderOptions(request, params, &controls, &warnings)
+ if model := strings.TrimSpace(asString(params["model"])); model != "" {
+ controls.Model = model
+ }
+ paramEffort := normalizeReasoningEffort(asString(params["effort"]), &warnings, "codex_app_server.effort")
+ if paramEffort == "" {
+ paramEffort = controls.ReasoningEff
+ }
+ if paramEffort == "" {
+ paramEffort = defaultReasoningEffort
+ }
+ params["effort"] = paramEffort
+ controls.ReasoningEff = paramEffort
+
+ payload := transcriptPayload{
+ Version: transcriptVersion,
+ ToolCallProtocol: map[string]string{
+ "beginMarker": toolCallBeginMarker,
+ "endMarker": toolCallEndMarker,
+ },
+ Controls: controls,
+ Tools: buildToolsSection(request),
+ History: history,
+ }
+
+ transcript, err := buildTranscript(payload, toolChoice)
+ if err != nil {
+ return translateRequestResult{}, err
+ }
+
+ input := make([]any, 0, 1+len(imageInputs))
+ input = append(input, map[string]any{
+ "type": "text",
+ "text": transcript,
+ "text_elements": []any{},
+ })
+ for _, in := range imageInputs {
+ input = append(input, in)
+ }
+ params["input"] = input
+
+ return translateRequestResult{Payload: params, Warnings: warnings}, nil
+}
+
+func normalizeToolChoice(request llm.Request) resolvedToolChoice {
+ if request.ToolChoice != nil {
+ mode := strings.TrimSpace(strings.ToLower(request.ToolChoice.Mode))
+ if mode == "named" {
+ return resolvedToolChoice{Mode: "named", ToolName: strings.TrimSpace(request.ToolChoice.Name)}
+ }
+ if mode == "" {
+ mode = "auto"
+ }
+ return resolvedToolChoice{Mode: mode}
+ }
+ if len(request.Tools) > 0 {
+ return resolvedToolChoice{Mode: "auto"}
+ }
+ return resolvedToolChoice{Mode: "none"}
+}
+
+func validateToolChoice(choice resolvedToolChoice, request llm.Request) error {
+ toolNames := make(map[string]struct{}, len(request.Tools))
+ for _, tool := range request.Tools {
+ toolNames[strings.TrimSpace(tool.Name)] = struct{}{}
+ }
+ switch choice.Mode {
+ case "required":
+ if len(toolNames) == 0 {
+ return fmt.Errorf("toolChoice.mode=\"required\" requires at least one tool definition")
+ }
+ case "named":
+ if len(toolNames) == 0 {
+ return fmt.Errorf("toolChoice.mode=\"named\" requires tools, but no tools were provided")
+ }
+ if strings.TrimSpace(choice.ToolName) == "" {
+ return fmt.Errorf("toolChoice.mode=\"named\" requires a non-empty toolName")
+ }
+ if _, ok := toolNames[choice.ToolName]; !ok {
+ return fmt.Errorf("toolChoice.mode=\"named\" references unknown tool %q", choice.ToolName)
+ }
+ }
+ return nil
+}
+
+func responseFormatForTranscript(format *llm.ResponseFormat) map[string]any {
+ if format == nil {
+ return map[string]any{"type": "text"}
+ }
+ out := map[string]any{"type": format.Type}
+ if format.JSONSchema != nil {
+ out["jsonSchema"] = format.JSONSchema
+ }
+ if format.Strict {
+ out["strict"] = true
+ }
+ return out
+}
+
+func metadataForTranscript(metadata map[string]string) map[string]interface{} {
+ if len(metadata) == 0 {
+ return nil
+ }
+ out := make(map[string]interface{}, len(metadata))
+ for key, value := range metadata {
+ out[key] = value
+ }
+ return out
+}
+
+func resolveOutputSchema(responseFormat *llm.ResponseFormat) map[string]any {
+ if responseFormat == nil || strings.EqualFold(responseFormat.Type, "text") || strings.TrimSpace(responseFormat.Type) == "" {
+ return nil
+ }
+ if strings.EqualFold(responseFormat.Type, "json") {
+ return deepCopyMap(jsonObjectOutputSchema)
+ }
+ if responseFormat.JSONSchema == nil {
+ return nil
+ }
+ return deepCopyMap(responseFormat.JSONSchema)
+}
+
+func buildToolsSection(request llm.Request) []map[string]any {
+ if len(request.Tools) == 0 {
+ return nil
+ }
+ out := make([]map[string]any, 0, len(request.Tools))
+ for _, tool := range request.Tools {
+ out = append(out, map[string]any{
+ "name": tool.Name,
+ "description": tool.Description,
+ "parameters": tool.Parameters,
+ })
+ }
+ return out
+}
+
+func translateMessages(messages []llm.Message, warnings *[]llm.Warning) ([]map[string]any, []map[string]any) {
+ history := make([]map[string]any, 0, len(messages))
+ imageInputs := make([]map[string]any, 0, 2)
+ imageIndex := 0
+ nextImageID := func() string {
+ imageIndex++
+ return fmt.Sprintf("img_%04d", imageIndex)
+ }
+
+ for messageIndex, message := range messages {
+ parts := make([]map[string]any, 0, len(message.Content))
+ for partIndex, part := range message.Content {
+ parts = append(parts, translatePart(part, messageIndex, partIndex, warnings, &imageInputs, nextImageID))
+ }
+ history = append(history, map[string]any{
+ "index": messageIndex,
+ "role": string(message.Role),
+ "name": message.Name,
+ "toolCallId": message.ToolCallID,
+ "parts": parts,
+ })
+ }
+
+ return history, imageInputs
+}
+
+func translatePart(
+ part llm.ContentPart,
+ messageIndex int,
+ partIndex int,
+ warnings *[]llm.Warning,
+ imageInputs *[]map[string]any,
+ nextImageID func() string,
+) map[string]any {
+ switch part.Kind {
+ case llm.ContentText:
+ return map[string]any{
+ "index": partIndex,
+ "kind": "text",
+ "text": part.Text,
+ }
+ case llm.ContentImage:
+ imageID := nextImageID()
+ if part.Image != nil {
+ if len(part.Image.Data) > 0 {
+ mediaType := strings.TrimSpace(part.Image.MediaType)
+ if mediaType == "" {
+ mediaType = "image/png"
+ }
+ *imageInputs = append(*imageInputs, map[string]any{
+ "type": "image",
+ "url": llm.DataURI(mediaType, part.Image.Data),
+ })
+ return map[string]any{
+ "index": partIndex,
+ "kind": "image",
+ "assetId": imageID,
+ "inputType": "image",
+ "source": "inline_data",
+ "mediaType": mediaType,
+ "detail": part.Image.Detail,
+ }
+ }
+ if url := strings.TrimSpace(part.Image.URL); url != "" {
+ if isLikelyLocalPath(url) {
+ *imageInputs = append(*imageInputs, map[string]any{"type": "localImage", "path": url})
+ return map[string]any{
+ "index": partIndex,
+ "kind": "image",
+ "assetId": imageID,
+ "inputType": "localImage",
+ "source": "local_path",
+ "path": url,
+ "detail": part.Image.Detail,
+ }
+ }
+ *imageInputs = append(*imageInputs, map[string]any{"type": "image", "url": url})
+ return map[string]any{
+ "index": partIndex,
+ "kind": "image",
+ "assetId": imageID,
+ "inputType": "image",
+ "source": "remote_url",
+ "url": url,
+ "detail": part.Image.Detail,
+ }
+ }
+ }
+ *warnings = append(*warnings, llm.Warning{
+ Code: "unsupported_part",
+ Message: "Image content parts without data or url cannot be attached and were translated to fallback text",
+ })
+ return map[string]any{
+ "index": partIndex,
+ "kind": "image",
+ "assetId": imageID,
+ "fallback": "missing_image_data_or_url",
+ }
+ case llm.ContentAudio:
+ *warnings = append(*warnings, warningForFallback("Audio"))
+ byteLength := 0
+ url := ""
+ mediaType := ""
+ if part.Audio != nil {
+ byteLength = len(part.Audio.Data)
+ url = part.Audio.URL
+ mediaType = part.Audio.MediaType
+ }
+ return map[string]any{
+ "index": partIndex,
+ "kind": "audio",
+ "fallback": map[string]any{
+ "url": url,
+ "mediaType": mediaType,
+ "byteLength": byteLength,
+ },
+ }
+ case llm.ContentDocument:
+ *warnings = append(*warnings, warningForFallback("Document"))
+ byteLength := 0
+ url := ""
+ mediaType := ""
+ filename := ""
+ if part.Document != nil {
+ byteLength = len(part.Document.Data)
+ url = part.Document.URL
+ mediaType = part.Document.MediaType
+ filename = part.Document.FileName
+ }
+ return map[string]any{
+ "index": partIndex,
+ "kind": "document",
+ "fallback": map[string]any{
+ "url": url,
+ "mediaType": mediaType,
+ "fileName": filename,
+ "byteLength": byteLength,
+ },
+ }
+ case llm.ContentToolCall:
+ if part.ToolCall == nil {
+ break
+ }
+ value, raw := normalizeToolArguments(part.ToolCall.Arguments)
+ protocolPayload := map[string]any{
+ "id": part.ToolCall.ID,
+ "name": part.ToolCall.Name,
+ "arguments": value,
+ }
+ protocolJSON, _ := json.Marshal(protocolPayload)
+ return map[string]any{
+ "index": partIndex,
+ "kind": "tool_call",
+ "id": part.ToolCall.ID,
+ "name": part.ToolCall.Name,
+ "arguments": value,
+ "rawArguments": raw,
+ "protocolBlock": strings.Join([]string{
+ toolCallBeginMarker,
+ string(protocolJSON),
+ toolCallEndMarker,
+ }, "\n"),
+ }
+ case llm.ContentToolResult:
+ if part.ToolResult == nil {
+ break
+ }
+ item := map[string]any{
+ "index": partIndex,
+ "kind": "tool_result",
+ "toolCallId": part.ToolResult.ToolCallID,
+ "content": part.ToolResult.Content,
+ "isError": part.ToolResult.IsError,
+ }
+ if len(part.ToolResult.ImageData) > 0 {
+ mediaType := strings.TrimSpace(part.ToolResult.ImageMediaType)
+ if mediaType == "" {
+ mediaType = "image/png"
+ }
+ item["imageDataUri"] = llm.DataURI(mediaType, part.ToolResult.ImageData)
+ item["imageMediaType"] = mediaType
+ }
+ return item
+ case llm.ContentThinking:
+ if part.Thinking == nil {
+ break
+ }
+ return map[string]any{
+ "index": partIndex,
+ "kind": "thinking",
+ "text": part.Thinking.Text,
+ "signature": part.Thinking.Signature,
+ "redacted": false,
+ }
+ case llm.ContentRedThinking:
+ if part.Thinking == nil {
+ break
+ }
+ return map[string]any{
+ "index": partIndex,
+ "kind": "redacted_thinking",
+ "text": part.Thinking.Text,
+ "signature": part.Thinking.Signature,
+ "redacted": true,
+ }
+ default:
+ if kind := strings.TrimSpace(string(part.Kind)); kind != "" {
+ *warnings = append(*warnings, warningForFallback(fmt.Sprintf("Custom (%s)", kind)))
+ fallback := map[string]any{
+ "index": partIndex,
+ "kind": kind,
+ "fallbackKind": "custom",
+ }
+ if part.Data != nil {
+ fallback["data"] = part.Data
+ }
+ return fallback
+ }
+ *warnings = append(*warnings, llm.Warning{
+ Code: "unsupported_part",
+ Message: fmt.Sprintf("Unknown content part kind at message index %d was translated to fallback text", messageIndex),
+ })
+ return map[string]any{
+ "index": partIndex,
+ "kind": "unknown",
+ "fallback": true,
+ }
+ }
+
+ *warnings = append(*warnings, llm.Warning{
+ Code: "unsupported_part",
+ Message: fmt.Sprintf("Content part kind %q at message index %d was empty and translated to fallback text", part.Kind, messageIndex),
+ })
+ return map[string]any{
+ "index": partIndex,
+ "kind": string(part.Kind),
+ "fallback": true,
+ }
+}
+
+func normalizeToolArguments(arguments json.RawMessage) (any, string) {
+ trimmed := strings.TrimSpace(string(arguments))
+ if trimmed == "" {
+ return map[string]any{}, "{}"
+ }
+ dec := json.NewDecoder(bytes.NewReader([]byte(trimmed)))
+ dec.UseNumber()
+ var parsed any
+ if err := dec.Decode(&parsed); err != nil {
+ return trimmed, trimmed
+ }
+ return parsed, trimmed
+}
+
+func warningForFallback(kind string) llm.Warning {
+ return llm.Warning{
+ Code: "unsupported_part",
+ Message: fmt.Sprintf("%s content parts are not natively supported by codex-app-server and were translated to deterministic transcript fallback text", kind),
+ }
+}
+
+func warningForUnsupportedProviderOption(key string) llm.Warning {
+ return llm.Warning{
+ Code: "unsupported_option",
+ Message: fmt.Sprintf("Provider option codex_app_server.%s is not supported and was ignored", key),
+ }
+}
+
+func normalizeReasoningEffort(value string, warnings *[]llm.Warning, source string) string {
+ normalized := strings.ToLower(strings.TrimSpace(value))
+ if normalized == "" {
+ return ""
+ }
+ if _, ok := supportedReasoningEfforts[normalized]; ok {
+ return normalized
+ }
+ if warnings != nil {
+ *warnings = append(*warnings, llm.Warning{
+ Code: "unsupported_option",
+ Message: fmt.Sprintf(
+ "%s value %q is unsupported and was ignored (expected none, minimal, low, medium, high, or xhigh)",
+ source,
+ value,
+ ),
+ })
+ }
+ return ""
+}
+
+func applyProviderOptions(
+ request llm.Request,
+ params map[string]any,
+ controls *transcriptControls,
+ warnings *[]llm.Warning,
+) {
+ options := codexProviderOptions(request.ProviderOptions)
+ if len(options) == 0 {
+ return
+ }
+
+ for key, value := range options {
+ if turnKey, ok := turnOptionKeyMap[key]; ok {
+ params[turnKey] = value
+ continue
+ }
+ if controlKey, ok := controlOptionKeyMap[key]; ok {
+ applyControlOverride(controlKey, value, controls, warnings, key)
+ continue
+ }
+ *warnings = append(*warnings, warningForUnsupportedProviderOption(key))
+ }
+}
+
+func codexProviderOptions(options map[string]any) map[string]any {
+ if len(options) == 0 {
+ return nil
+ }
+ for _, key := range []string{"codex_app_server", "codex-app-server", "codexappserver"} {
+ if raw, ok := options[key]; ok {
+ if m := asMap(raw); m != nil {
+ return m
+ }
+ }
+ }
+ return nil
+}
+
+func applyControlOverride(
+ key string,
+ value any,
+ controls *transcriptControls,
+ warnings *[]llm.Warning,
+ rawKey string,
+) {
+ switch key {
+ case "temperature":
+ if f, ok := value.(float64); ok {
+ controls.Temperature = &f
+ return
+ }
+ case "topP":
+ if f, ok := value.(float64); ok {
+ controls.TopP = &f
+ return
+ }
+ case "maxTokens":
+ if n, ok := value.(float64); ok {
+ i := int(n)
+ controls.MaxTokens = &i
+ return
+ }
+ if n, ok := value.(int); ok {
+ controls.MaxTokens = &n
+ return
+ }
+ case "stopSequences":
+ if arr, ok := value.([]any); ok {
+ out := make([]string, 0, len(arr))
+ for _, item := range arr {
+ s := asString(item)
+ if s == "" {
+ *outWarning(warnings) = append(*outWarning(warnings), warningForUnsupportedProviderOption(rawKey))
+ return
+ }
+ out = append(out, s)
+ }
+ controls.StopSequences = out
+ return
+ }
+ if arr, ok := value.([]string); ok {
+ controls.StopSequences = append([]string{}, arr...)
+ return
+ }
+ case "metadata":
+ if rec := asMap(value); rec != nil {
+ if controls.Metadata == nil {
+ controls.Metadata = map[string]interface{}{}
+ }
+ for mk, mv := range rec {
+ controls.Metadata[mk] = mv
+ }
+ return
+ }
+ case "reasoningEffort":
+ normalized := normalizeReasoningEffort(asString(value), warnings, fmt.Sprintf("codex_app_server.%s", rawKey))
+ if normalized != "" {
+ controls.ReasoningEff = normalized
+ }
+ return
+ }
+ *warnings = append(*warnings, warningForUnsupportedProviderOption(rawKey))
+}
+
+func outWarning(w *[]llm.Warning) *[]llm.Warning { return w }
+
+func buildTranscript(payload transcriptPayload, choice resolvedToolChoice) (string, error) {
+ toolChoiceLine := "Tool choice policy: " + choice.Mode
+ if choice.Mode == "named" {
+ toolChoiceLine = fmt.Sprintf("Tool choice policy: named (%s)", choice.ToolName)
+ }
+ payloadJSON, err := json.Marshal(payload)
+ if err != nil {
+ return "", err
+ }
+ lines := []string{
+ transcriptBeginMarker,
+ "Stateless transcript payload for unified-llm codex-app-server translation.",
+ "Treat the payload as the authoritative full conversation history.",
+ "When emitting tool calls, use deterministic protocol blocks exactly:",
+ toolCallBeginMarker,
+ `{"id":"call_","name":"","arguments":{}}`,
+ toolCallEndMarker,
+ "Do not wrap tool-call protocol blocks in markdown fences.",
+ toolChoiceLine,
+ transcriptPayloadBeginMarker,
+ string(payloadJSON),
+ transcriptPayloadEndMarker,
+ transcriptEndMarker,
+ }
+ return strings.Join(lines, "\n"), nil
+}
+
+func isLikelyLocalPath(url string) bool {
+ url = strings.TrimSpace(url)
+ if strings.HasPrefix(url, "http://") || strings.HasPrefix(url, "https://") {
+ return false
+ }
+ if strings.HasPrefix(url, "data:") {
+ return false
+ }
+ if strings.HasPrefix(url, "file://") {
+ return true
+ }
+ if uriSchemeRE.MatchString(url) {
+ return false
+ }
+ return true
+}
diff --git a/internal/llm/providers/codexappserver/request_translator_controls_test.go b/internal/llm/providers/codexappserver/request_translator_controls_test.go
new file mode 100644
index 00000000..f44c17de
--- /dev/null
+++ b/internal/llm/providers/codexappserver/request_translator_controls_test.go
@@ -0,0 +1,125 @@
+package codexappserver
+
+import (
+ "testing"
+
+ "github.com/danshapiro/kilroy/internal/llm"
+)
+
+func TestRequestTranslator_ApplyControlOverride_SupportedValues(t *testing.T) {
+ controls := &transcriptControls{}
+ warnings := []llm.Warning{}
+
+ applyControlOverride("temperature", 0.7, controls, &warnings, "temperature")
+ applyControlOverride("topP", 0.9, controls, &warnings, "topP")
+ applyControlOverride("maxTokens", 42.0, controls, &warnings, "maxTokens")
+ applyControlOverride("maxTokens", 64, controls, &warnings, "maxTokens")
+ applyControlOverride("stopSequences", []any{"END", "STOP"}, controls, &warnings, "stopSequences")
+ applyControlOverride("metadata", map[string]any{"team": "qa"}, controls, &warnings, "metadata")
+ applyControlOverride("reasoningEffort", "medium", controls, &warnings, "reasoningEffort")
+
+ if controls.Temperature == nil || *controls.Temperature != 0.7 {
+ t.Fatalf("temperature: %#v", controls.Temperature)
+ }
+ if controls.TopP == nil || *controls.TopP != 0.9 {
+ t.Fatalf("topP: %#v", controls.TopP)
+ }
+ if controls.MaxTokens == nil || *controls.MaxTokens != 64 {
+ t.Fatalf("maxTokens: %#v", controls.MaxTokens)
+ }
+ if len(controls.StopSequences) != 2 || controls.StopSequences[0] != "END" || controls.StopSequences[1] != "STOP" {
+ t.Fatalf("stopSequences: %#v", controls.StopSequences)
+ }
+ if controls.Metadata["team"] != "qa" {
+ t.Fatalf("metadata: %#v", controls.Metadata)
+ }
+ if controls.ReasoningEff != "medium" {
+ t.Fatalf("reasoningEffort: %q", controls.ReasoningEff)
+ }
+ if len(warnings) != 0 {
+ t.Fatalf("did not expect warnings for supported values: %+v", warnings)
+ }
+}
+
+func TestRequestTranslator_ApplyControlOverride_InvalidValuesEmitWarnings(t *testing.T) {
+ controls := &transcriptControls{}
+ warnings := []llm.Warning{}
+
+ applyControlOverride("stopSequences", []any{"END", 2}, controls, &warnings, "stopSequences")
+ applyControlOverride("reasoningEffort", "invalid-effort", controls, &warnings, "reasoningEffort")
+ applyControlOverride("temperature", "not-a-number", controls, &warnings, "temperature")
+ applyControlOverride("unknownKey", true, controls, &warnings, "unknownKey")
+
+ if len(controls.StopSequences) != 0 {
+ t.Fatalf("stopSequences should remain unset on invalid input: %#v", controls.StopSequences)
+ }
+ if controls.ReasoningEff != "" {
+ t.Fatalf("reasoningEffort should remain empty for invalid value: %q", controls.ReasoningEff)
+ }
+ if controls.Temperature != nil {
+ t.Fatalf("temperature should remain nil on invalid input: %#v", controls.Temperature)
+ }
+ if len(warnings) < 4 {
+ t.Fatalf("expected warnings for invalid values, got %d (%+v)", len(warnings), warnings)
+ }
+}
+
+func TestRequestTranslator_ApplyProviderOptions_MapsKnownKeysAndWarnsUnknown(t *testing.T) {
+ params := map[string]any{}
+ controls := &transcriptControls{}
+ warnings := []llm.Warning{}
+
+ applyProviderOptions(llm.Request{
+ ProviderOptions: map[string]any{
+ "codex_app_server": map[string]any{
+ "cwd": "/tmp/project",
+ "approval_policy": "never",
+ "sandbox_mode": "danger-full-access",
+ "sandboxPolicy": map[string]any{"type": "dangerFullAccess"},
+ "temperature": 0.2,
+ "reasoning_effort": "high",
+ "unsupportedX": true,
+ },
+ },
+ }, params, controls, &warnings)
+
+ if params["cwd"] != "/tmp/project" {
+ t.Fatalf("cwd mapping: %#v", params["cwd"])
+ }
+ if params["approvalPolicy"] != "never" {
+ t.Fatalf("approvalPolicy mapping: %#v", params["approvalPolicy"])
+ }
+ if params["sandbox"] != "danger-full-access" {
+ t.Fatalf("sandbox mapping: %#v", params["sandbox"])
+ }
+ rawSandboxPolicy, ok := params["sandboxPolicy"]
+ if !ok {
+ t.Fatalf("missing sandboxPolicy mapping: %#v", params)
+ }
+ sandboxPolicy, ok := rawSandboxPolicy.(map[string]any)
+ if !ok {
+ t.Fatalf("sandboxPolicy type=%T want map[string]any", rawSandboxPolicy)
+ }
+ if sandboxPolicy["type"] != "dangerFullAccess" {
+ t.Fatalf("sandboxPolicy.type=%#v want %q", sandboxPolicy["type"], "dangerFullAccess")
+ }
+ if controls.Temperature == nil || *controls.Temperature != 0.2 {
+ t.Fatalf("temperature override: %#v", controls.Temperature)
+ }
+ if controls.ReasoningEff != "high" {
+ t.Fatalf("reasoningEffort override: %q", controls.ReasoningEff)
+ }
+ if len(warnings) == 0 {
+ t.Fatalf("expected warning for unsupported provider option")
+ }
+}
+
+func TestRequestTranslator_WarningHelpers(t *testing.T) {
+ w := warningForUnsupportedProviderOption("x_opt")
+ if w.Code != "unsupported_option" {
+ t.Fatalf("warning code: %q", w.Code)
+ }
+ if out := outWarning(&[]llm.Warning{}); out == nil {
+ t.Fatalf("outWarning should return same pointer")
+ }
+}
diff --git a/internal/llm/providers/codexappserver/request_translator_test.go b/internal/llm/providers/codexappserver/request_translator_test.go
new file mode 100644
index 00000000..c8956a8e
--- /dev/null
+++ b/internal/llm/providers/codexappserver/request_translator_test.go
@@ -0,0 +1,237 @@
+package codexappserver
+
+import (
+ "encoding/json"
+ "strings"
+ "testing"
+
+ "github.com/danshapiro/kilroy/internal/llm"
+)
+
+func mustTranscriptPayload(t *testing.T, params map[string]any) map[string]any {
+ t.Helper()
+ input := asSlice(params["input"])
+ if len(input) == 0 {
+ t.Fatalf("missing input")
+ }
+ textItem := asMap(input[0])
+ if asString(textItem["type"]) != "text" {
+ t.Fatalf("first input item is not text: %#v", textItem)
+ }
+ transcript := asString(textItem["text"])
+ if !strings.Contains(transcript, transcriptPayloadBeginMarker) || !strings.Contains(transcript, transcriptPayloadEndMarker) {
+ t.Fatalf("missing transcript payload markers")
+ }
+ start := strings.Index(transcript, transcriptPayloadBeginMarker+"\n")
+ if start < 0 {
+ t.Fatalf("missing payload start marker")
+ }
+ start += len(transcriptPayloadBeginMarker) + 1
+ end := strings.Index(transcript[start:], "\n"+transcriptPayloadEndMarker)
+ if end < 0 {
+ t.Fatalf("missing payload end marker")
+ }
+ payloadJSON := transcript[start : start+end]
+ var payload map[string]any
+ if err := json.Unmarshal([]byte(payloadJSON), &payload); err != nil {
+ t.Fatalf("payload json unmarshal: %v", err)
+ }
+ return payload
+}
+
+func TestTranslateRequest_FullSurface(t *testing.T) {
+ temperature := 0.3
+ topP := 0.8
+ maxTokens := 300
+ reasoning := "high"
+
+ request := llm.Request{
+ Model: "gpt-5.2-codex",
+ Messages: []llm.Message{
+ llm.System("System guardrails"),
+ llm.Developer("Developer instruction"),
+ {
+ Role: llm.RoleUser,
+ Content: []llm.ContentPart{
+ {Kind: llm.ContentText, Text: "What is in this image?"},
+ {Kind: llm.ContentImage, Image: &llm.ImageData{URL: "https://example.com/cat.png", Detail: "high"}},
+ },
+ },
+ {
+ Role: llm.RoleAssistant,
+ Content: []llm.ContentPart{
+ {Kind: llm.ContentText, Text: "Let me inspect it."},
+ {Kind: llm.ContentToolCall, ToolCall: &llm.ToolCallData{ID: "call_weather", Name: "get_weather", Arguments: json.RawMessage(`{"city":"SF"}`)}},
+ },
+ },
+ llm.ToolResultNamed("call_weather", "get_weather", map[string]any{"temperature": "72F"}, false),
+ },
+ Tools: []llm.ToolDefinition{{
+ Name: "get_weather",
+ Description: "Get weather for a city",
+ Parameters: map[string]any{
+ "type": "object",
+ "properties": map[string]any{"city": map[string]any{"type": "string"}},
+ "required": []any{"city"},
+ },
+ }},
+ ToolChoice: &llm.ToolChoice{Mode: "named", Name: "get_weather"},
+ ResponseFormat: &llm.ResponseFormat{
+ Type: "json_schema",
+ JSONSchema: map[string]any{
+ "type": "object",
+ "properties": map[string]any{"answer": map[string]any{"type": "string"}},
+ "required": []any{"answer"},
+ },
+ },
+ Temperature: &temperature,
+ TopP: &topP,
+ MaxTokens: &maxTokens,
+ StopSequences: []string{""},
+ ReasoningEffort: &reasoning,
+ Metadata: map[string]string{
+ "traceId": "trace-123",
+ "tenant": "acme",
+ },
+ ProviderOptions: map[string]any{
+ "codex_app_server": map[string]any{
+ "cwd": "/tmp/project",
+ "summary": "concise",
+ "personality": "pragmatic",
+ },
+ },
+ }
+
+ translated, err := translateRequest(request, false)
+ if err != nil {
+ t.Fatalf("translateRequest: %v", err)
+ }
+ if len(translated.Warnings) != 0 {
+ t.Fatalf("unexpected warnings: %+v", translated.Warnings)
+ }
+ params := translated.Payload
+ if got := asString(params["threadId"]); got != defaultThreadID {
+ t.Fatalf("threadId: got %q want %q", got, defaultThreadID)
+ }
+ if got := asString(params["model"]); got != "gpt-5.2-codex" {
+ t.Fatalf("model: got %q", got)
+ }
+ if got := asString(params["cwd"]); got != "/tmp/project" {
+ t.Fatalf("cwd: got %q", got)
+ }
+ if got := asString(params["summary"]); got != "concise" {
+ t.Fatalf("summary: got %q", got)
+ }
+ if got := asString(params["personality"]); got != "pragmatic" {
+ t.Fatalf("personality: got %q", got)
+ }
+ if params["outputSchema"] == nil {
+ t.Fatalf("expected outputSchema to be set")
+ }
+
+ input := asSlice(params["input"])
+ if len(input) != 2 {
+ t.Fatalf("input len: got %d want 2", len(input))
+ }
+ imageInput := asMap(input[1])
+ if asString(imageInput["type"]) != "image" || asString(imageInput["url"]) != "https://example.com/cat.png" {
+ t.Fatalf("image input mismatch: %#v", imageInput)
+ }
+
+ payload := mustTranscriptPayload(t, params)
+ if got := asString(payload["version"]); got != transcriptVersion {
+ t.Fatalf("payload version: got %q want %q", got, transcriptVersion)
+ }
+ controls := asMap(payload["controls"])
+ if got := asString(controls["model"]); got != "gpt-5.2-codex" {
+ t.Fatalf("controls.model: got %q", got)
+ }
+ if got := asString(asMap(controls["toolChoice"])["mode"]); got != "named" {
+ t.Fatalf("tool choice mode: got %q", got)
+ }
+ if got := asString(asMap(controls["toolChoice"])["toolName"]); got != "get_weather" {
+ t.Fatalf("tool choice name: got %q", got)
+ }
+}
+
+func TestTranslateRequest_FallbackWarnings(t *testing.T) {
+ request := llm.Request{
+ Model: "codex-mini",
+ Messages: []llm.Message{{
+ Role: llm.RoleUser,
+ Content: []llm.ContentPart{
+ {Kind: llm.ContentAudio, Audio: &llm.AudioData{URL: "https://example.com/a.wav", MediaType: "audio/wav"}},
+ {Kind: llm.ContentDocument, Document: &llm.DocumentData{URL: "https://example.com/r.pdf", MediaType: "application/pdf", FileName: "r.pdf"}},
+ {Kind: llm.ContentKind("custom_note"), Data: map[string]any{"topic": "ops", "priority": "high"}},
+ },
+ }},
+ }
+
+ translated, err := translateRequest(request, false)
+ if err != nil {
+ t.Fatalf("translateRequest: %v", err)
+ }
+ if len(translated.Warnings) != 3 {
+ t.Fatalf("warning len: got %d want 3", len(translated.Warnings))
+ }
+ for _, w := range translated.Warnings {
+ if w.Code != "unsupported_part" {
+ t.Fatalf("warning code: got %q want unsupported_part", w.Code)
+ }
+ }
+ if translated.Warnings[2].Message != "Custom (custom_note) content parts are not natively supported by codex-app-server and were translated to deterministic transcript fallback text" {
+ t.Fatalf("custom warning mismatch: %q", translated.Warnings[2].Message)
+ }
+
+ payload := mustTranscriptPayload(t, translated.Payload)
+ history := asSlice(payload["history"])
+ if len(history) != 1 {
+ t.Fatalf("history len: got %d want 1", len(history))
+ }
+ parts := asSlice(asMap(history[0])["parts"])
+ if len(parts) != 3 {
+ t.Fatalf("parts len: got %d want 3", len(parts))
+ }
+ customPart := asMap(parts[2])
+ if got := asString(customPart["fallbackKind"]); got != "custom" {
+ t.Fatalf("fallbackKind: got %q want custom", got)
+ }
+ customData := asMap(customPart["data"])
+ if asString(customData["topic"]) != "ops" || asString(customData["priority"]) != "high" {
+ t.Fatalf("custom data mismatch: %#v", customData)
+ }
+}
+
+func TestTranslateRequest_ValidatesToolChoice(t *testing.T) {
+ req := llm.Request{
+ Model: "codex-mini",
+ Messages: []llm.Message{llm.User("Need tools")},
+ ToolChoice: &llm.ToolChoice{Mode: "required"},
+ }
+ if _, err := translateRequest(req, false); err == nil {
+ t.Fatalf("expected error for required tool choice without tools")
+ }
+
+ req = llm.Request{
+ Model: "codex-mini",
+ Messages: []llm.Message{llm.User("Need weather")},
+ Tools: []llm.ToolDefinition{{
+ Name: "lookup_weather",
+ Parameters: map[string]any{"type": "object"},
+ }},
+ ToolChoice: &llm.ToolChoice{Mode: "named", Name: "missing_tool"},
+ }
+ if _, err := translateRequest(req, false); err == nil {
+ t.Fatalf("expected error for named tool choice without matching tool")
+ }
+}
+
+func TestTranslateRequest_DefaultReasoningEffort(t *testing.T) {
+ translated, err := translateRequest(llm.Request{Model: "codex-mini", Messages: []llm.Message{llm.User("Hello")}}, false)
+ if err != nil {
+ t.Fatalf("translateRequest: %v", err)
+ }
+ if got := asString(translated.Payload["effort"]); got != "high" {
+ t.Fatalf("effort: got %q want high", got)
+ }
+}
diff --git a/internal/llm/providers/codexappserver/response_translator.go b/internal/llm/providers/codexappserver/response_translator.go
new file mode 100644
index 00000000..e5d1f434
--- /dev/null
+++ b/internal/llm/providers/codexappserver/response_translator.go
@@ -0,0 +1,507 @@
+package codexappserver
+
+import (
+ "bytes"
+ "encoding/json"
+ "fmt"
+ "regexp"
+ "sort"
+ "strings"
+ "time"
+
+ "github.com/danshapiro/kilroy/internal/llm"
+)
+
+type normalizedNotification struct {
+ Method string
+ Params map[string]any
+}
+
+type itemDeltas struct {
+ agentByID map[string]string
+ reasoningSummaryByID map[string]map[int]string
+ reasoningContentByID map[string]map[int]string
+}
+
+var toolProtocolRE = regexp.MustCompile(`(?is)\[\[TOOL_CALL\]\]([\s\S]*?)\[\[/TOOL_CALL\]\]`)
+
+func translateResponse(body map[string]any) (llm.Response, error) {
+ notifications := extractNotifications(body)
+ turn := asMap(body["turn"])
+ items := collectItems(turn, notifications)
+
+ content := make([]llm.ContentPart, 0, 8)
+ for _, item := range items {
+ switch asString(item["type"]) {
+ case "reasoning":
+ content = append(content, translateReasoning(item)...)
+ case "agentMessage":
+ content = append(content, translateAgentMessage(item)...)
+ }
+ }
+ if len(content) == 0 {
+ if fallback := asString(body["text"]); fallback != "" {
+ content = append(content, llm.ContentPart{Kind: llm.ContentText, Text: fallback})
+ }
+ }
+
+ rawStatus := firstNonEmpty(asString(turn["status"]), asString(body["status"]))
+ hasToolCalls := false
+ for _, part := range content {
+ if part.Kind == llm.ContentToolCall {
+ hasToolCalls = true
+ break
+ }
+ }
+
+ usage := translateUsage(body, notifications)
+ warnings := extractWarnings(body, notifications)
+ response := llm.Response{
+ ID: firstNonEmpty(asString(turn["id"]), asString(body["id"])),
+ Model: extractModel(body, notifications),
+ Provider: "codex-app-server",
+ Message: llm.Message{
+ Role: llm.RoleAssistant,
+ Content: content,
+ },
+ Finish: llm.FinishReason{
+ Reason: mapFinishReason(rawStatus, hasToolCalls),
+ Raw: rawStatus,
+ },
+ Usage: usage,
+ Raw: body,
+ Warnings: warnings,
+ }
+ return response, nil
+}
+
+func collectItems(turn map[string]any, notifications []normalizedNotification) []map[string]any {
+ deltas := collectDeltas(notifications)
+ orderedIDs := make([]string, 0, 16)
+ byID := make(map[string]map[string]any)
+
+ upsert := func(item map[string]any) {
+ id := strings.TrimSpace(asString(item["id"]))
+ if id == "" {
+ return
+ }
+ if _, exists := byID[id]; !exists {
+ orderedIDs = append(orderedIDs, id)
+ }
+ byID[id] = item
+ }
+
+ for _, notification := range notifications {
+ if notification.Method != "item/completed" {
+ continue
+ }
+ item := asMap(notification.Params["item"])
+ if item != nil {
+ upsert(item)
+ }
+ }
+ for _, itemRaw := range asSlice(turn["items"]) {
+ item := asMap(itemRaw)
+ if item != nil {
+ upsert(item)
+ }
+ }
+ for itemID, text := range deltas.agentByID {
+ if _, exists := byID[itemID]; exists {
+ continue
+ }
+ upsert(map[string]any{"id": itemID, "type": "agentMessage", "text": text})
+ }
+ for itemID, summaryMap := range deltas.reasoningSummaryByID {
+ if _, exists := byID[itemID]; exists {
+ continue
+ }
+ upsert(map[string]any{
+ "id": itemID,
+ "type": "reasoning",
+ "summary": mapByIndex(summaryMap),
+ "content": mapByIndex(deltas.reasoningContentByID[itemID]),
+ })
+ }
+ for itemID, contentMap := range deltas.reasoningContentByID {
+ if _, exists := byID[itemID]; exists {
+ continue
+ }
+ upsert(map[string]any{
+ "id": itemID,
+ "type": "reasoning",
+ "summary": mapByIndex(deltas.reasoningSummaryByID[itemID]),
+ "content": mapByIndex(contentMap),
+ })
+ }
+
+ out := make([]map[string]any, 0, len(orderedIDs))
+ for _, id := range orderedIDs {
+ if item := byID[id]; item != nil {
+ out = append(out, item)
+ }
+ }
+ return out
+}
+
+func collectDeltas(notifications []normalizedNotification) itemDeltas {
+ agentByID := map[string]string{}
+ reasoningSummaryByID := map[string]map[int]string{}
+ reasoningContentByID := map[string]map[int]string{}
+
+ appendByIndex := func(target map[string]map[int]string, itemID string, idx int, delta string) {
+ if _, ok := target[itemID]; !ok {
+ target[itemID] = map[int]string{}
+ }
+ target[itemID][idx] = target[itemID][idx] + delta
+ }
+
+ for _, notification := range notifications {
+ switch notification.Method {
+ case "item/agentMessage/delta":
+ itemID := asString(notification.Params["itemId"])
+ if itemID == "" {
+ continue
+ }
+ agentByID[itemID] = agentByID[itemID] + asString(notification.Params["delta"])
+ case "item/reasoning/summaryTextDelta":
+ itemID := asString(notification.Params["itemId"])
+ if itemID == "" {
+ continue
+ }
+ appendByIndex(reasoningSummaryByID, itemID, asInt(notification.Params["summaryIndex"], 0), asString(notification.Params["delta"]))
+ case "item/reasoning/textDelta":
+ itemID := asString(notification.Params["itemId"])
+ if itemID == "" {
+ continue
+ }
+ appendByIndex(reasoningContentByID, itemID, asInt(notification.Params["contentIndex"], 0), asString(notification.Params["delta"]))
+ }
+ }
+
+ return itemDeltas{
+ agentByID: agentByID,
+ reasoningSummaryByID: reasoningSummaryByID,
+ reasoningContentByID: reasoningContentByID,
+ }
+}
+
+func mapByIndex(in map[int]string) []any {
+ if len(in) == 0 {
+ return nil
+ }
+ keys := make([]int, 0, len(in))
+ for idx := range in {
+ keys = append(keys, idx)
+ }
+ sort.Ints(keys)
+ out := make([]any, 0, len(keys))
+ for _, idx := range keys {
+ out = append(out, in[idx])
+ }
+ return out
+}
+
+func translateReasoning(item map[string]any) []llm.ContentPart {
+ parts := make([]llm.ContentPart, 0, 4)
+ for _, source := range []any{item["summary"], item["content"]} {
+ for _, chunk := range asSlice(source) {
+ text := asString(chunk)
+ if strings.TrimSpace(text) == "" {
+ continue
+ }
+ parts = append(parts, splitReasoningChunk(text)...)
+ }
+ }
+ return parts
+}
+
+func splitReasoningChunk(text string) []llm.ContentPart {
+ segments := splitReasoningSegments(text)
+ out := make([]llm.ContentPart, 0, len(segments))
+ for _, segment := range segments {
+ trimmed := strings.TrimSpace(segment.Text)
+ if trimmed == "" {
+ continue
+ }
+ thinking := &llm.ThinkingData{Text: trimmed, Redacted: segment.Redacted}
+ if segment.Redacted {
+ out = append(out, llm.ContentPart{Kind: llm.ContentRedThinking, Thinking: thinking})
+ continue
+ }
+ out = append(out, llm.ContentPart{Kind: llm.ContentThinking, Thinking: thinking})
+ }
+ return out
+}
+
+func translateAgentMessage(item map[string]any) []llm.ContentPart {
+ text := asString(item["text"])
+ if text == "" {
+ return nil
+ }
+ matches := toolProtocolRE.FindAllStringSubmatchIndex(text, -1)
+ if len(matches) == 0 {
+ return []llm.ContentPart{{Kind: llm.ContentText, Text: text}}
+ }
+
+ parts := make([]llm.ContentPart, 0, len(matches)*2+1)
+ cursor := 0
+ for _, m := range matches {
+ if len(m) < 4 {
+ continue
+ }
+ start := m[0]
+ end := m[1]
+ payloadStart := m[2]
+ payloadEnd := m[3]
+ if start > cursor {
+ prefix := text[cursor:start]
+ if prefix != "" {
+ parts = append(parts, llm.ContentPart{Kind: llm.ContentText, Text: prefix})
+ }
+ }
+ payload := strings.TrimSpace(text[payloadStart:payloadEnd])
+ if toolCall := parseToolCall(payload); toolCall != nil {
+ parts = append(parts, llm.ContentPart{Kind: llm.ContentToolCall, ToolCall: toolCall})
+ } else {
+ block := text[start:end]
+ if block != "" {
+ parts = append(parts, llm.ContentPart{Kind: llm.ContentText, Text: block})
+ }
+ }
+ cursor = end
+ }
+ if cursor < len(text) {
+ suffix := text[cursor:]
+ if suffix != "" {
+ parts = append(parts, llm.ContentPart{Kind: llm.ContentText, Text: suffix})
+ }
+ }
+ return parts
+}
+
+func parseToolCall(payload string) *llm.ToolCallData {
+ if strings.TrimSpace(payload) == "" {
+ return nil
+ }
+ m, ok := parseJSONRecord(payload)
+ if !ok {
+ return nil
+ }
+ name := strings.TrimSpace(asString(m["name"]))
+ if name == "" {
+ return nil
+ }
+ id := strings.TrimSpace(asString(m["id"]))
+ if id == "" {
+ id = fmt.Sprintf("call_%d", time.Now().UnixNano())
+ }
+ typ := strings.TrimSpace(asString(m["type"]))
+
+ argsRaw := m["arguments"]
+ arguments, rawStr := normalizeParsedArguments(argsRaw)
+ _ = rawStr
+
+ toolCall := &llm.ToolCallData{
+ ID: id,
+ Name: name,
+ Arguments: arguments,
+ }
+ if typ != "" {
+ toolCall.Type = typ
+ }
+ return toolCall
+}
+
+func normalizeParsedArguments(value any) (json.RawMessage, string) {
+ if s, ok := value.(string); ok {
+ trimmed := strings.TrimSpace(s)
+ if trimmed == "" {
+ return json.RawMessage("{}"), "{}"
+ }
+ if json.Valid([]byte(trimmed)) {
+ return json.RawMessage(trimmed), trimmed
+ }
+ encoded, _ := json.Marshal(trimmed)
+ return json.RawMessage(encoded), trimmed
+ }
+ if value == nil {
+ return json.RawMessage("{}"), "{}"
+ }
+ b, err := json.Marshal(value)
+ if err != nil || len(b) == 0 {
+ return json.RawMessage("{}"), "{}"
+ }
+ return json.RawMessage(b), string(b)
+}
+
+func extractModel(body map[string]any, notifications []normalizedNotification) string {
+ if model := firstNonEmpty(asString(body["model"]), asString(body["modelId"]), asString(body["model_name"])); model != "" {
+ return model
+ }
+ for idx := len(notifications) - 1; idx >= 0; idx-- {
+ notification := notifications[idx]
+ if notification.Method != "model/rerouted" {
+ continue
+ }
+ if model := asString(notification.Params["toModel"]); model != "" {
+ return model
+ }
+ }
+ return ""
+}
+
+func translateUsage(body map[string]any, notifications []normalizedNotification) llm.Usage {
+ var usageSource map[string]any
+ var rawUsage map[string]any
+
+ for idx := len(notifications) - 1; idx >= 0; idx-- {
+ notification := notifications[idx]
+ if notification.Method != "thread/tokenUsage/updated" {
+ continue
+ }
+ tokenUsage := asMap(notification.Params["tokenUsage"])
+ if tokenUsage == nil {
+ continue
+ }
+ rawUsage = tokenUsage
+ usageSource = asMap(tokenUsage["last"])
+ if usageSource == nil {
+ usageSource = tokenUsage
+ }
+ break
+ }
+ if usageSource == nil {
+ tokenUsage := asMap(body["tokenUsage"])
+ if tokenUsage != nil {
+ rawUsage = tokenUsage
+ usageSource = asMap(tokenUsage["last"])
+ if usageSource == nil {
+ usageSource = tokenUsage
+ }
+ }
+ }
+ if usageSource == nil {
+ usage := asMap(body["usage"])
+ if usage != nil {
+ rawUsage = usage
+ usageSource = usage
+ }
+ }
+
+ usage := llm.Usage{
+ InputTokens: asInt(usageSource["inputTokens"], asInt(usageSource["input_tokens"], 0)),
+ OutputTokens: asInt(usageSource["outputTokens"], asInt(usageSource["output_tokens"], 0)),
+ TotalTokens: asInt(usageSource["totalTokens"], asInt(usageSource["total_tokens"], 0)),
+ }
+ reasoningTokens := asInt(usageSource["reasoningOutputTokens"], asInt(usageSource["reasoning_tokens"], -1))
+ cacheReadTokens := asInt(usageSource["cachedInputTokens"], asInt(usageSource["cache_read_input_tokens"], -1))
+ cacheWriteTokens := asInt(usageSource["cacheWriteTokens"], asInt(usageSource["cache_write_input_tokens"], -1))
+ if usage.TotalTokens <= 0 {
+ usage.TotalTokens = usage.InputTokens + usage.OutputTokens
+ }
+ if reasoningTokens >= 0 {
+ usage.ReasoningTokens = intPtr(reasoningTokens)
+ }
+ if cacheReadTokens >= 0 {
+ usage.CacheReadTokens = intPtr(cacheReadTokens)
+ }
+ if cacheWriteTokens >= 0 {
+ usage.CacheWriteTokens = intPtr(cacheWriteTokens)
+ }
+ if rawUsage != nil {
+ usage.Raw = rawUsage
+ }
+ return usage
+}
+
+func extractWarnings(body map[string]any, notifications []normalizedNotification) []llm.Warning {
+ warnings := make([]llm.Warning, 0, 4)
+ for _, warningValue := range asSlice(body["warnings"]) {
+ warning := asMap(warningValue)
+ if warning == nil {
+ continue
+ }
+ message := strings.TrimSpace(asString(warning["message"]))
+ if message == "" {
+ continue
+ }
+ code := strings.TrimSpace(asString(warning["code"]))
+ warnings = append(warnings, llm.Warning{Message: message, Code: code})
+ }
+ for _, notification := range notifications {
+ if notification.Method != "deprecationNotice" && notification.Method != "configWarning" {
+ continue
+ }
+ message := firstNonEmpty(
+ asString(notification.Params["message"]),
+ asString(notification.Params["notice"]),
+ asString(notification.Params["warning"]),
+ )
+ if message == "" {
+ continue
+ }
+ warnings = append(warnings, llm.Warning{Message: message, Code: notification.Method})
+ }
+ return warnings
+}
+
+func extractNotifications(body map[string]any) []normalizedNotification {
+ notifications := make([]normalizedNotification, 0, 16)
+ sources := make([]any, 0)
+ sources = append(sources, asSlice(body["notifications"])...)
+ sources = append(sources, asSlice(body["events"])...)
+ sources = append(sources, asSlice(body["rawNotifications"])...)
+
+ for _, raw := range sources {
+ entry := asMap(raw)
+ if entry == nil {
+ continue
+ }
+ method := firstNonEmpty(asString(entry["method"]), asString(entry["event"]), asString(entry["type"]))
+ if method == "" {
+ continue
+ }
+ params := asMap(entry["params"])
+ if params == nil {
+ if dataString, ok := entry["data"].(string); ok {
+ if parsed, ok := parseJSONRecord(dataString); ok {
+ params = parsed
+ }
+ } else {
+ params = asMap(entry["data"])
+ }
+ }
+ if params == nil {
+ params = map[string]any{}
+ }
+ notifications = append(notifications, normalizedNotification{Method: method, Params: params})
+ }
+
+ return notifications
+}
+
+func parseJSONRecord(in string) (map[string]any, bool) {
+ dec := json.NewDecoder(strings.NewReader(strings.TrimSpace(in)))
+ dec.UseNumber()
+ var parsed map[string]any
+ if err := dec.Decode(&parsed); err != nil {
+ return nil, false
+ }
+ if parsed == nil {
+ return nil, false
+ }
+ return parsed, true
+}
+
+func intPtr(v int) *int { return &v }
+
+func parseJSONAny(in string) any {
+ dec := json.NewDecoder(bytes.NewReader([]byte(in)))
+ dec.UseNumber()
+ var parsed any
+ if err := dec.Decode(&parsed); err != nil {
+ return nil
+ }
+ return parsed
+}
diff --git a/internal/llm/providers/codexappserver/response_translator_test.go b/internal/llm/providers/codexappserver/response_translator_test.go
new file mode 100644
index 00000000..e1aa015b
--- /dev/null
+++ b/internal/llm/providers/codexappserver/response_translator_test.go
@@ -0,0 +1,146 @@
+package codexappserver
+
+import (
+ "strings"
+ "testing"
+
+ "github.com/danshapiro/kilroy/internal/llm"
+)
+
+func testNotification(method string, params map[string]any) map[string]any {
+ return map[string]any{"method": method, "params": params}
+}
+
+func TestTranslateResponse_ToolProtocolReasoningAndUsage(t *testing.T) {
+ body := map[string]any{
+ "id": "resp_codex_1",
+ "model": "codex-mini",
+ "turn": map[string]any{
+ "id": "turn_1",
+ "status": "completed",
+ },
+ "notifications": []any{
+ testNotification("item/completed", map[string]any{
+ "item": map[string]any{
+ "id": "reasoning_1",
+ "type": "reasoning",
+ "summary": []any{"Plan steps"},
+ "content": []any{"Visible [[REDACTED_REASONING]]secret[[/REDACTED_REASONING]] done"},
+ },
+ }),
+ testNotification("item/completed", map[string]any{
+ "item": map[string]any{
+ "id": "agent_1",
+ "type": "agentMessage",
+ "text": "Before [[TOOL_CALL]]{\"id\":\"call_1\",\"name\":\"search\",\"arguments\":{\"q\":\"foo\"}}[[/TOOL_CALL]] After",
+ },
+ }),
+ testNotification("thread/tokenUsage/updated", map[string]any{
+ "tokenUsage": map[string]any{
+ "last": map[string]any{
+ "inputTokens": 11,
+ "outputTokens": 7,
+ "totalTokens": 18,
+ "reasoningOutputTokens": 3,
+ "cachedInputTokens": 2,
+ },
+ },
+ }),
+ },
+ }
+
+ response, err := translateResponse(body)
+ if err != nil {
+ t.Fatalf("translateResponse: %v", err)
+ }
+ if response.ID != "turn_1" {
+ t.Fatalf("response id: got %q want turn_1", response.ID)
+ }
+ if response.Model != "codex-mini" {
+ t.Fatalf("response model: got %q", response.Model)
+ }
+ if response.Provider != providerName {
+ t.Fatalf("response provider: got %q", response.Provider)
+ }
+ if response.Finish.Reason != llm.FinishReasonToolCalls {
+ t.Fatalf("finish reason: got %q want %q", response.Finish.Reason, llm.FinishReasonToolCalls)
+ }
+ if response.Usage.InputTokens != 11 || response.Usage.OutputTokens != 7 || response.Usage.TotalTokens != 18 {
+ t.Fatalf("usage mismatch: %+v", response.Usage)
+ }
+ if response.Usage.ReasoningTokens == nil || *response.Usage.ReasoningTokens != 3 {
+ t.Fatalf("reasoning tokens mismatch: %+v", response.Usage)
+ }
+ if response.Usage.CacheReadTokens == nil || *response.Usage.CacheReadTokens != 2 {
+ t.Fatalf("cache read tokens mismatch: %+v", response.Usage)
+ }
+
+ if len(response.Message.Content) < 4 {
+ t.Fatalf("expected content parts, got %+v", response.Message.Content)
+ }
+ foundToolCall := false
+ for _, part := range response.Message.Content {
+ if part.Kind == llm.ContentToolCall && part.ToolCall != nil {
+ foundToolCall = true
+ if part.ToolCall.ID != "call_1" || part.ToolCall.Name != "search" {
+ t.Fatalf("tool call mismatch: %+v", part.ToolCall)
+ }
+ if strings.TrimSpace(string(part.ToolCall.Arguments)) != `{"q":"foo"}` {
+ t.Fatalf("tool arguments mismatch: %q", string(part.ToolCall.Arguments))
+ }
+ }
+ }
+ if !foundToolCall {
+ t.Fatalf("expected tool call part in response content")
+ }
+}
+
+func TestTranslateResponse_FinishReasonMapping(t *testing.T) {
+ interrupted, err := translateResponse(map[string]any{
+ "model": "codex-mini",
+ "turn": map[string]any{"id": "turn_2", "status": "interrupted"},
+ })
+ if err != nil {
+ t.Fatalf("translateResponse interrupted: %v", err)
+ }
+ if interrupted.Finish.Reason != llm.FinishReasonLength {
+ t.Fatalf("interrupted finish reason: got %q", interrupted.Finish.Reason)
+ }
+
+ failed, err := translateResponse(map[string]any{
+ "model": "codex-mini",
+ "turn": map[string]any{"id": "turn_3", "status": "failed"},
+ })
+ if err != nil {
+ t.Fatalf("translateResponse failed: %v", err)
+ }
+ if failed.Finish.Reason != llm.FinishReasonError {
+ t.Fatalf("failed finish reason: got %q", failed.Finish.Reason)
+ }
+}
+
+func TestTranslateResponse_ReconstructsFromDeltas(t *testing.T) {
+ body := map[string]any{
+ "model": "codex-mini",
+ "turn": map[string]any{"id": "turn_4", "status": "completed"},
+ "notifications": []any{
+ testNotification("item/agentMessage/delta", map[string]any{"itemId": "agent_delta", "delta": "Hello "}),
+ testNotification("item/agentMessage/delta", map[string]any{"itemId": "agent_delta", "delta": "world"}),
+ testNotification("item/reasoning/summaryTextDelta", map[string]any{"itemId": "reason_delta", "summaryIndex": 0, "delta": "Need to inspect state"}),
+ },
+ }
+
+ response, err := translateResponse(body)
+ if err != nil {
+ t.Fatalf("translateResponse: %v", err)
+ }
+ if len(response.Message.Content) != 2 {
+ t.Fatalf("content len: got %d want 2 (%+v)", len(response.Message.Content), response.Message.Content)
+ }
+ if response.Message.Content[0].Kind != llm.ContentText || response.Message.Content[0].Text != "Hello world" {
+ t.Fatalf("agent text reconstruction mismatch: %+v", response.Message.Content[0])
+ }
+ if response.Message.Content[1].Kind != llm.ContentThinking || response.Message.Content[1].Thinking == nil || response.Message.Content[1].Thinking.Text != "Need to inspect state" {
+ t.Fatalf("reasoning reconstruction mismatch: %+v", response.Message.Content[1])
+ }
+}
diff --git a/internal/llm/providers/codexappserver/stream_translator.go b/internal/llm/providers/codexappserver/stream_translator.go
new file mode 100644
index 00000000..4c43fd17
--- /dev/null
+++ b/internal/llm/providers/codexappserver/stream_translator.go
@@ -0,0 +1,501 @@
+package codexappserver
+
+import (
+ "strconv"
+ "strings"
+
+ "github.com/danshapiro/kilroy/internal/llm"
+)
+
+const (
+ toolProtocolStartToken = "[[TOOL_CALL]]"
+ toolProtocolStartTokenLower = "[[tool_call]]"
+ toolProtocolEndToken = "[[/TOOL_CALL]]"
+ toolProtocolEndTokenLower = "[[/tool_call]]"
+)
+
+const maxStartReserve = len(toolProtocolStartToken) - 1
+
+type parsedSegment struct {
+ Kind string
+ Text string
+ ToolCall *llm.ToolCallData
+}
+
+type toolProtocolStreamParser struct {
+ buffer string
+ insideBlock bool
+ opening string
+}
+
+func (p *toolProtocolStreamParser) feed(delta string) []parsedSegment {
+ p.buffer += delta
+ return p.drain(false)
+}
+
+func (p *toolProtocolStreamParser) flush() []parsedSegment {
+ return p.drain(true)
+}
+
+func (p *toolProtocolStreamParser) drain(finalize bool) []parsedSegment {
+ segments := make([]parsedSegment, 0, 4)
+
+ for {
+ if p.insideBlock {
+ endIdx := strings.Index(strings.ToLower(p.buffer), toolProtocolEndTokenLower)
+ if endIdx < 0 {
+ if !finalize {
+ break
+ }
+ if p.buffer != "" || p.opening != "" {
+ segments = append(segments, parsedSegment{Kind: "text", Text: p.opening + p.buffer})
+ }
+ p.buffer = ""
+ p.insideBlock = false
+ p.opening = ""
+ continue
+ }
+
+ payload := p.buffer[:endIdx]
+ closing := p.buffer[endIdx : endIdx+len(toolProtocolEndToken)]
+ p.buffer = p.buffer[endIdx+len(toolProtocolEndToken):]
+ p.insideBlock = false
+
+ if toolCall := parseToolCall(payload); toolCall != nil {
+ segments = append(segments, parsedSegment{Kind: "tool_call", ToolCall: toolCall})
+ } else {
+ segments = append(segments, parsedSegment{Kind: "text", Text: p.opening + payload + closing})
+ }
+ p.opening = ""
+ continue
+ }
+
+ lower := strings.ToLower(p.buffer)
+ startIdx := strings.Index(lower, toolProtocolStartTokenLower)
+ if startIdx < 0 {
+ if p.buffer == "" {
+ break
+ }
+ if finalize {
+ segments = append(segments, parsedSegment{Kind: "text", Text: p.buffer})
+ p.buffer = ""
+ break
+ }
+ if len(p.buffer) <= maxStartReserve {
+ break
+ }
+ safeText := p.buffer[:len(p.buffer)-maxStartReserve]
+ p.buffer = p.buffer[len(p.buffer)-maxStartReserve:]
+ if safeText != "" {
+ segments = append(segments, parsedSegment{Kind: "text", Text: safeText})
+ }
+ break
+ }
+
+ if startIdx > 0 {
+ segments = append(segments, parsedSegment{Kind: "text", Text: p.buffer[:startIdx]})
+ }
+
+ p.opening = p.buffer[startIdx : startIdx+len(toolProtocolStartToken)]
+ p.buffer = p.buffer[startIdx+len(toolProtocolStartToken):]
+ p.insideBlock = true
+ }
+
+ return segments
+}
+
+type textStreamState struct {
+ TextStarted bool
+ Parser *toolProtocolStreamParser
+}
+
+func translateStream(events <-chan map[string]any) <-chan llm.StreamEvent {
+ out := make(chan llm.StreamEvent, 64)
+ go func() {
+ defer close(out)
+
+ streamStarted := false
+ streamID := ""
+ model := ""
+ emittedToolCalls := false
+ var latestUsage *llm.Usage
+
+ textStates := make(map[string]*textStreamState)
+ reasoningByItem := make(map[string]map[string]struct{})
+ activeReasoningIDs := make(map[string]struct{})
+
+ closeReasoningForItem := func(itemID string) []llm.StreamEvent {
+ itemSet := reasoningByItem[itemID]
+ if len(itemSet) == 0 {
+ return nil
+ }
+ outEvents := make([]llm.StreamEvent, 0, len(itemSet))
+ for reasoningID := range itemSet {
+ if _, ok := activeReasoningIDs[reasoningID]; !ok {
+ continue
+ }
+ delete(activeReasoningIDs, reasoningID)
+ outEvents = append(outEvents, llm.StreamEvent{
+ Type: llm.StreamEventReasoningEnd,
+ ReasoningID: reasoningID,
+ })
+ }
+ delete(reasoningByItem, itemID)
+ return outEvents
+ }
+
+ closeAllReasoning := func() []llm.StreamEvent {
+ if len(activeReasoningIDs) == 0 {
+ return nil
+ }
+ keys := make([]string, 0, len(activeReasoningIDs))
+ for reasoningID := range activeReasoningIDs {
+ keys = append(keys, reasoningID)
+ }
+ outEvents := make([]llm.StreamEvent, 0, len(keys))
+ for _, reasoningID := range keys {
+ outEvents = append(outEvents, llm.StreamEvent{
+ Type: llm.StreamEventReasoningEnd,
+ ReasoningID: reasoningID,
+ })
+ delete(activeReasoningIDs, reasoningID)
+ }
+ reasoningByItem = map[string]map[string]struct{}{}
+ return outEvents
+ }
+
+ ensureReasoningStarted := func(itemID, reasoningID string) []llm.StreamEvent {
+ if _, ok := reasoningByItem[itemID]; !ok {
+ reasoningByItem[itemID] = map[string]struct{}{}
+ }
+ reasoningByItem[itemID][reasoningID] = struct{}{}
+ if _, ok := activeReasoningIDs[reasoningID]; ok {
+ return nil
+ }
+ activeReasoningIDs[reasoningID] = struct{}{}
+ return []llm.StreamEvent{{
+ Type: llm.StreamEventReasoningStart,
+ ReasoningID: reasoningID,
+ }}
+ }
+
+ emitAgentSegments := func(itemID string, segments []parsedSegment) []llm.StreamEvent {
+ state := textStates[itemID]
+ if state == nil {
+ return nil
+ }
+ outEvents := make([]llm.StreamEvent, 0, len(segments)*3)
+ for _, segment := range segments {
+ switch segment.Kind {
+ case "text":
+ if segment.Text == "" {
+ continue
+ }
+ if !state.TextStarted {
+ state.TextStarted = true
+ outEvents = append(outEvents, llm.StreamEvent{Type: llm.StreamEventTextStart, TextID: itemID})
+ }
+ outEvents = append(outEvents, llm.StreamEvent{Type: llm.StreamEventTextDelta, TextID: itemID, Delta: segment.Text})
+ case "tool_call":
+ if segment.ToolCall == nil {
+ continue
+ }
+ emittedToolCalls = true
+ if state.TextStarted {
+ state.TextStarted = false
+ outEvents = append(outEvents, llm.StreamEvent{Type: llm.StreamEventTextEnd, TextID: itemID})
+ }
+ call := *segment.ToolCall
+ outEvents = append(outEvents,
+ llm.StreamEvent{Type: llm.StreamEventToolCallStart, ToolCall: &llm.ToolCallData{ID: call.ID, Name: call.Name, Type: firstNonEmpty(call.Type, "function")}},
+ llm.StreamEvent{Type: llm.StreamEventToolCallDelta, ToolCall: &llm.ToolCallData{ID: call.ID, Name: call.Name, Type: firstNonEmpty(call.Type, "function"), Arguments: call.Arguments}},
+ llm.StreamEvent{Type: llm.StreamEventToolCallEnd, ToolCall: &llm.ToolCallData{ID: call.ID, Name: call.Name, Type: firstNonEmpty(call.Type, "function"), Arguments: call.Arguments}},
+ )
+ }
+ }
+ return outEvents
+ }
+
+ flushAgentState := func(itemID string) []llm.StreamEvent {
+ state := textStates[itemID]
+ if state == nil {
+ return nil
+ }
+ outEvents := emitAgentSegments(itemID, state.Parser.flush())
+ if state.TextStarted {
+ state.TextStarted = false
+ outEvents = append(outEvents, llm.StreamEvent{Type: llm.StreamEventTextEnd, TextID: itemID})
+ }
+ delete(textStates, itemID)
+ return outEvents
+ }
+
+ for rawEvent := range events {
+ notification, ok := normalizeNotification(rawEvent)
+ if !ok {
+ continue
+ }
+ outEvents := make([]llm.StreamEvent, 0, 6)
+
+ switch notification.Method {
+ case "turn/started":
+ turn := asMap(notification.Params["turn"])
+ if turnID := firstNonEmpty(asString(turn["id"]), asString(notification.Params["turnId"])); turnID != "" {
+ streamID = turnID
+ }
+ emittedToolCalls = false
+ if reroutedModel := asString(notification.Params["model"]); reroutedModel != "" {
+ model = reroutedModel
+ }
+ if !streamStarted {
+ streamStarted = true
+ outEvents = append(outEvents, llm.StreamEvent{Type: llm.StreamEventStreamStart, ID: streamID, Model: model})
+ }
+
+ case "item/agentMessage/delta":
+ itemID := asString(notification.Params["itemId"])
+ if itemID == "" {
+ break
+ }
+ if _, ok := textStates[itemID]; !ok {
+ textStates[itemID] = &textStreamState{Parser: &toolProtocolStreamParser{}}
+ }
+ delta := asString(notification.Params["delta"])
+ outEvents = append(outEvents, emitAgentSegments(itemID, textStates[itemID].Parser.feed(delta))...)
+
+ case "item/reasoning/summaryPartAdded":
+ itemID := asString(notification.Params["itemId"])
+ summaryIndex := asInt(notification.Params["summaryIndex"], -1)
+ if itemID == "" || summaryIndex < 0 {
+ break
+ }
+ nextReasoningID := fmtReasoningID(itemID, "summary", summaryIndex)
+ if existing := reasoningByItem[itemID]; len(existing) > 0 {
+ for reasoningID := range existing {
+ if !strings.HasPrefix(reasoningID, itemID+":summary:") || reasoningID == nextReasoningID {
+ continue
+ }
+ if _, ok := activeReasoningIDs[reasoningID]; ok {
+ delete(activeReasoningIDs, reasoningID)
+ outEvents = append(outEvents, llm.StreamEvent{Type: llm.StreamEventReasoningEnd, ReasoningID: reasoningID})
+ }
+ }
+ }
+ outEvents = append(outEvents, ensureReasoningStarted(itemID, nextReasoningID)...)
+
+ case "item/reasoning/summaryTextDelta":
+ itemID := asString(notification.Params["itemId"])
+ if itemID == "" {
+ break
+ }
+ reasoningID := fmtReasoningID(itemID, "summary", asInt(notification.Params["summaryIndex"], 0))
+ outEvents = append(outEvents, ensureReasoningStarted(itemID, reasoningID)...)
+ for _, segment := range splitReasoningSegments(asString(notification.Params["delta"])) {
+ if segment.Text == "" {
+ continue
+ }
+ var redacted *bool
+ if segment.Redacted {
+ v := true
+ redacted = &v
+ }
+ outEvents = append(outEvents, llm.StreamEvent{
+ Type: llm.StreamEventReasoningDelta,
+ ReasoningDelta: segment.Text,
+ ReasoningID: reasoningID,
+ Redacted: redacted,
+ })
+ }
+
+ case "item/reasoning/textDelta":
+ itemID := asString(notification.Params["itemId"])
+ if itemID == "" {
+ break
+ }
+ reasoningID := fmtReasoningID(itemID, "content", asInt(notification.Params["contentIndex"], 0))
+ outEvents = append(outEvents, ensureReasoningStarted(itemID, reasoningID)...)
+ for _, segment := range splitReasoningSegments(asString(notification.Params["delta"])) {
+ if segment.Text == "" {
+ continue
+ }
+ var redacted *bool
+ if segment.Redacted {
+ v := true
+ redacted = &v
+ }
+ outEvents = append(outEvents, llm.StreamEvent{
+ Type: llm.StreamEventReasoningDelta,
+ ReasoningDelta: segment.Text,
+ ReasoningID: reasoningID,
+ Redacted: redacted,
+ })
+ }
+
+ case "item/completed":
+ item := asMap(notification.Params["item"])
+ if item == nil {
+ break
+ }
+ itemID := asString(item["id"])
+ itemType := asString(item["type"])
+ if itemID == "" {
+ break
+ }
+ if itemType == "agentMessage" {
+ outEvents = append(outEvents, flushAgentState(itemID)...)
+ break
+ }
+ if itemType == "reasoning" {
+ outEvents = append(outEvents, closeReasoningForItem(itemID)...)
+ break
+ }
+ outEvents = append(outEvents, llm.StreamEvent{
+ Type: llm.StreamEventProviderEvent,
+ EventType: notification.Method,
+ Raw: notification.Params,
+ })
+
+ case "thread/tokenUsage/updated":
+ latestUsage = usageFromTokenUsage(asMap(notification.Params["tokenUsage"]))
+
+ case "error":
+ errorData := asMap(notification.Params["error"])
+ message := firstNonEmpty(asString(errorData["message"]), "Unknown stream error")
+ outEvents = append(outEvents, llm.StreamEvent{
+ Type: llm.StreamEventError,
+ Err: llm.NewStreamError("codex-app-server", message),
+ Raw: notification.Params,
+ })
+
+ case "turn/completed":
+ turn := asMap(notification.Params["turn"])
+ if turnID := asString(turn["id"]); turnID != "" {
+ streamID = turnID
+ }
+ for itemID := range textStates {
+ outEvents = append(outEvents, flushAgentState(itemID)...)
+ }
+ outEvents = append(outEvents, closeAllReasoning()...)
+ status := asString(turn["status"])
+ if status == "failed" {
+ turnError := asMap(turn["error"])
+ message := firstNonEmpty(asString(turnError["message"]), "Turn failed")
+ outEvents = append(outEvents, llm.StreamEvent{
+ Type: llm.StreamEventError,
+ Err: llm.NewStreamError("codex-app-server", message),
+ Raw: notification.Params,
+ })
+ }
+ if turnUsage := usageFromTokenUsage(asMap(turn["tokenUsage"])); turnUsage != nil {
+ latestUsage = turnUsage
+ } else if turnUsage := usageFromTokenUsage(asMap(turn["token_usage"])); turnUsage != nil {
+ latestUsage = turnUsage
+ }
+ finish := llm.FinishReason{Reason: mapFinishReason(status, emittedToolCalls), Raw: status}
+ outEvents = append(outEvents, llm.StreamEvent{
+ Type: llm.StreamEventFinish,
+ FinishReason: &finish,
+ Usage: latestUsage,
+ Raw: notification.Params,
+ })
+
+ default:
+ outEvents = append(outEvents, llm.StreamEvent{
+ Type: llm.StreamEventProviderEvent,
+ EventType: notification.Method,
+ Raw: notification.Params,
+ })
+ }
+
+ if !streamStarted && notification.Method != "turn/started" {
+ hasTranslated := false
+ for _, event := range outEvents {
+ if event.Type != llm.StreamEventProviderEvent {
+ hasTranslated = true
+ break
+ }
+ }
+ if hasTranslated {
+ streamStarted = true
+ start := llm.StreamEvent{Type: llm.StreamEventStreamStart, ID: streamID, Model: model}
+ outEvents = append([]llm.StreamEvent{start}, outEvents...)
+ }
+ }
+
+ for _, event := range outEvents {
+ out <- event
+ }
+ }
+ }()
+ return out
+}
+
+func usageFromTokenUsage(tokenUsage map[string]any) *llm.Usage {
+ if tokenUsage == nil {
+ return nil
+ }
+ last := asMap(tokenUsage["last"])
+ if last == nil {
+ last = tokenUsage
+ }
+ usage := llm.Usage{
+ InputTokens: asInt(last["inputTokens"], asInt(last["input_tokens"], 0)),
+ OutputTokens: asInt(last["outputTokens"], asInt(last["output_tokens"], 0)),
+ TotalTokens: asInt(last["totalTokens"], asInt(last["total_tokens"], 0)),
+ Raw: tokenUsage,
+ }
+ if usage.TotalTokens <= 0 {
+ usage.TotalTokens = usage.InputTokens + usage.OutputTokens
+ }
+ reasoningTokens := asInt(last["reasoningOutputTokens"], asInt(last["reasoning_tokens"], -1))
+ if reasoningTokens >= 0 {
+ usage.ReasoningTokens = intPtr(reasoningTokens)
+ }
+ cacheReadTokens := asInt(last["cachedInputTokens"], asInt(last["cache_read_input_tokens"], -1))
+ if cacheReadTokens >= 0 {
+ usage.CacheReadTokens = intPtr(cacheReadTokens)
+ }
+ cacheWriteTokens := asInt(last["cacheWriteTokens"], asInt(last["cache_write_input_tokens"], -1))
+ if cacheWriteTokens >= 0 {
+ usage.CacheWriteTokens = intPtr(cacheWriteTokens)
+ }
+ return &usage
+}
+
+func normalizeNotification(rawEvent map[string]any) (normalizedNotification, bool) {
+ if method := strings.TrimSpace(asString(rawEvent["method"])); method != "" {
+ params := asMap(rawEvent["params"])
+ if params == nil {
+ params = map[string]any{}
+ }
+ return normalizedNotification{Method: method, Params: params}, true
+ }
+
+ if event := strings.TrimSpace(asString(rawEvent["event"])); event != "" {
+ params := map[string]any{}
+ switch data := rawEvent["data"].(type) {
+ case string:
+ if parsed, ok := parseJSONRecord(data); ok {
+ params = parsed
+ }
+ default:
+ if rec := asMap(data); rec != nil {
+ params = rec
+ }
+ }
+ return normalizedNotification{Method: event, Params: params}, true
+ }
+
+ typ := strings.TrimSpace(asString(rawEvent["type"]))
+ if strings.Contains(typ, "/") {
+ params := deepCopyMap(rawEvent)
+ delete(params, "type")
+ return normalizedNotification{Method: typ, Params: params}, true
+ }
+
+ return normalizedNotification{}, false
+}
+
+func fmtReasoningID(itemID, segment string, idx int) string {
+ return itemID + ":" + segment + ":" + strconv.Itoa(idx)
+}
diff --git a/internal/llm/providers/codexappserver/stream_translator_test.go b/internal/llm/providers/codexappserver/stream_translator_test.go
new file mode 100644
index 00000000..4b055c2b
--- /dev/null
+++ b/internal/llm/providers/codexappserver/stream_translator_test.go
@@ -0,0 +1,172 @@
+package codexappserver
+
+import (
+ "testing"
+
+ "github.com/danshapiro/kilroy/internal/llm"
+)
+
+func streamNotification(method string, params map[string]any) map[string]any {
+ return map[string]any{"method": method, "params": params}
+}
+
+func collectStreamEvents(events []map[string]any) []llm.StreamEvent {
+ in := make(chan map[string]any, len(events))
+ for _, event := range events {
+ in <- event
+ }
+ close(in)
+
+ out := make([]llm.StreamEvent, 0, 16)
+ for event := range translateStream(in) {
+ out = append(out, event)
+ }
+ return out
+}
+
+func TestTranslateStream_TextAndFinishUsage(t *testing.T) {
+ events := collectStreamEvents([]map[string]any{
+ streamNotification("turn/started", map[string]any{"turn": map[string]any{"id": "turn_1", "status": "inProgress", "items": []any{}}}),
+ streamNotification("item/agentMessage/delta", map[string]any{"itemId": "agent_1", "delta": "Hello"}),
+ streamNotification("item/agentMessage/delta", map[string]any{"itemId": "agent_1", "delta": " world"}),
+ streamNotification("item/completed", map[string]any{"item": map[string]any{"id": "agent_1", "type": "agentMessage", "text": "Hello world"}}),
+ streamNotification("thread/tokenUsage/updated", map[string]any{"tokenUsage": map[string]any{"last": map[string]any{"inputTokens": 12, "outputTokens": 8, "totalTokens": 20, "reasoningOutputTokens": 2, "cachedInputTokens": 3}}}),
+ streamNotification("turn/completed", map[string]any{"turn": map[string]any{"id": "turn_1", "status": "completed", "items": []any{}}}),
+ })
+
+ if len(events) == 0 {
+ t.Fatalf("expected events")
+ }
+ if events[0].Type != llm.StreamEventStreamStart {
+ t.Fatalf("first event type: got %q want %q", events[0].Type, llm.StreamEventStreamStart)
+ }
+
+ text := ""
+ var finish *llm.StreamEvent
+ for idx := range events {
+ event := events[idx]
+ if event.Type == llm.StreamEventTextDelta {
+ text += event.Delta
+ }
+ if event.Type == llm.StreamEventFinish {
+ finish = &events[idx]
+ }
+ }
+ if text != "Hello world" {
+ t.Fatalf("text delta mismatch: got %q want %q", text, "Hello world")
+ }
+ if finish == nil {
+ t.Fatalf("expected finish event")
+ }
+ if finish.FinishReason == nil || finish.FinishReason.Reason != llm.FinishReasonStop {
+ t.Fatalf("finish reason mismatch: %+v", finish.FinishReason)
+ }
+ if finish.Usage == nil || finish.Usage.TotalTokens != 20 {
+ t.Fatalf("finish usage mismatch: %+v", finish.Usage)
+ }
+}
+
+func TestTranslateStream_ParsesToolCallProtocol(t *testing.T) {
+ events := collectStreamEvents([]map[string]any{
+ streamNotification("turn/started", map[string]any{"turn": map[string]any{"id": "turn_2", "status": "inProgress", "items": []any{}}}),
+ streamNotification("item/agentMessage/delta", map[string]any{"itemId": "agent_2", "delta": "Lead [[TOOL_CALL]]{\"id\":\"call_abc\",\"name\":\"lookup\",\"arguments\":{\"x\":1}}[[/TOOL_CALL]] tail"}),
+ streamNotification("item/completed", map[string]any{"item": map[string]any{"id": "agent_2", "type": "agentMessage", "text": ""}}),
+ streamNotification("turn/completed", map[string]any{"turn": map[string]any{"id": "turn_2", "status": "completed", "items": []any{}}}),
+ })
+
+ seenStart := false
+ seenDelta := false
+ seenEnd := false
+ finish := llm.FinishReason{}
+ for _, event := range events {
+ switch event.Type {
+ case llm.StreamEventToolCallStart:
+ seenStart = true
+ if event.ToolCall == nil || event.ToolCall.ID != "call_abc" || event.ToolCall.Name != "lookup" {
+ t.Fatalf("tool call start mismatch: %+v", event.ToolCall)
+ }
+ case llm.StreamEventToolCallDelta:
+ seenDelta = true
+ if event.ToolCall == nil || string(event.ToolCall.Arguments) != `{"x":1}` {
+ t.Fatalf("tool call delta mismatch: %+v", event.ToolCall)
+ }
+ case llm.StreamEventToolCallEnd:
+ seenEnd = true
+ case llm.StreamEventFinish:
+ if event.FinishReason != nil {
+ finish = *event.FinishReason
+ }
+ }
+ }
+ if !seenStart || !seenDelta || !seenEnd {
+ t.Fatalf("tool call events missing: start=%t delta=%t end=%t", seenStart, seenDelta, seenEnd)
+ }
+ if finish.Reason != llm.FinishReasonToolCalls {
+ t.Fatalf("finish reason mismatch: got %q want %q", finish.Reason, llm.FinishReasonToolCalls)
+ }
+}
+
+func TestTranslateStream_FailedTurnEmitsErrorAndFinish(t *testing.T) {
+ events := collectStreamEvents([]map[string]any{
+ streamNotification("turn/started", map[string]any{"turn": map[string]any{"id": "turn_3", "status": "inProgress", "items": []any{}}}),
+ streamNotification("error", map[string]any{"error": map[string]any{"message": "upstream overloaded"}}),
+ streamNotification("turn/completed", map[string]any{"turn": map[string]any{"id": "turn_3", "status": "failed", "error": map[string]any{"message": "turn failed hard"}, "items": []any{}}}),
+ })
+
+ errorCount := 0
+ finishCount := 0
+ for _, event := range events {
+ if event.Type == llm.StreamEventError {
+ errorCount++
+ }
+ if event.Type == llm.StreamEventFinish {
+ finishCount++
+ if event.FinishReason == nil || event.FinishReason.Reason != llm.FinishReasonError {
+ t.Fatalf("finish reason mismatch: %+v", event.FinishReason)
+ }
+ }
+ }
+ if errorCount != 2 {
+ t.Fatalf("error count: got %d want 2", errorCount)
+ }
+ if finishCount != 1 {
+ t.Fatalf("finish count: got %d want 1", finishCount)
+ }
+}
+
+func TestTranslateStream_ProviderEventPassthrough(t *testing.T) {
+ events := collectStreamEvents([]map[string]any{
+ streamNotification("model/rerouted", map[string]any{"fromModel": "codex-mini", "toModel": "codex-pro"}),
+ })
+ if len(events) != 1 {
+ t.Fatalf("event count: got %d want 1", len(events))
+ }
+ if events[0].Type != llm.StreamEventProviderEvent {
+ t.Fatalf("event type: got %q want %q", events[0].Type, llm.StreamEventProviderEvent)
+ }
+ if events[0].EventType != "model/rerouted" {
+ t.Fatalf("event_type: got %q want %q", events[0].EventType, "model/rerouted")
+ }
+}
+
+func TestTranslateStream_ItemCompletedToolEventPassthrough(t *testing.T) {
+ events := collectStreamEvents([]map[string]any{
+ streamNotification("item/completed", map[string]any{
+ "item": map[string]any{
+ "id": "cmd_1",
+ "type": "commandExecution",
+ "status": "completed",
+ },
+ }),
+ })
+
+ if len(events) != 1 {
+ t.Fatalf("event count: got %d want 1", len(events))
+ }
+ if events[0].Type != llm.StreamEventProviderEvent {
+ t.Fatalf("event type: got %q want %q", events[0].Type, llm.StreamEventProviderEvent)
+ }
+ if events[0].EventType != "item/completed" {
+ t.Fatalf("event_type: got %q want %q", events[0].EventType, "item/completed")
+ }
+}
diff --git a/internal/llm/providers/codexappserver/translator_utils.go b/internal/llm/providers/codexappserver/translator_utils.go
new file mode 100644
index 00000000..b2023e2d
--- /dev/null
+++ b/internal/llm/providers/codexappserver/translator_utils.go
@@ -0,0 +1,78 @@
+package codexappserver
+
+import "regexp"
+
+type reasoningSegment struct {
+ Text string
+ Redacted bool
+}
+
+var redactedReasoningRE = regexp.MustCompile(`(?is)([\s\S]*?)|\[\[REDACTED_REASONING\]\]([\s\S]*?)\[\[/REDACTED_REASONING\]\]`)
+
+func splitReasoningSegments(text string) []reasoningSegment {
+ if text == "" {
+ return nil
+ }
+ segments := make([]reasoningSegment, 0, 4)
+ cursor := 0
+ matches := redactedReasoningRE.FindAllStringSubmatchIndex(text, -1)
+ for _, m := range matches {
+ if len(m) < 6 {
+ continue
+ }
+ start := m[0]
+ end := m[1]
+ if start > cursor {
+ visible := text[cursor:start]
+ if visible != "" {
+ segments = append(segments, reasoningSegment{Text: visible})
+ }
+ }
+ redacted := ""
+ if m[2] >= 0 && m[3] >= 0 {
+ redacted = text[m[2]:m[3]]
+ } else if m[4] >= 0 && m[5] >= 0 {
+ redacted = text[m[4]:m[5]]
+ }
+ if redacted != "" {
+ segments = append(segments, reasoningSegment{Text: redacted, Redacted: true})
+ }
+ cursor = end
+ }
+ if cursor < len(text) {
+ tail := text[cursor:]
+ if tail != "" {
+ prefixRe := regexp.MustCompile(`(?is)^(?:\[REDACTED\]\s*|REDACTED:\s*)([\s\S]+)$`)
+ if sm := prefixRe.FindStringSubmatch(tail); len(sm) == 2 && sm[1] != "" {
+ segments = append(segments, reasoningSegment{Text: sm[1], Redacted: true})
+ } else {
+ segments = append(segments, reasoningSegment{Text: tail})
+ }
+ }
+ }
+ return segments
+}
+
+func mapFinishReason(rawStatus string, hasToolCalls bool) string {
+ if hasToolCalls {
+ return llmFinishReasonToolCalls
+ }
+ switch rawStatus {
+ case "completed":
+ return llmFinishReasonStop
+ case "interrupted":
+ return llmFinishReasonLength
+ case "failed":
+ return llmFinishReasonError
+ default:
+ return llmFinishReasonOther
+ }
+}
+
+const (
+ llmFinishReasonStop = "stop"
+ llmFinishReasonLength = "length"
+ llmFinishReasonToolCalls = "tool_calls"
+ llmFinishReasonError = "error"
+ llmFinishReasonOther = "other"
+)
diff --git a/internal/llm/providers/codexappserver/transport.go b/internal/llm/providers/codexappserver/transport.go
new file mode 100644
index 00000000..161dbf29
--- /dev/null
+++ b/internal/llm/providers/codexappserver/transport.go
@@ -0,0 +1,1174 @@
+package codexappserver
+
+import (
+ "bufio"
+ "bytes"
+ "context"
+ "encoding/json"
+ "fmt"
+ "io"
+ "os"
+ "os/exec"
+ "strings"
+ "sync"
+ "sync/atomic"
+ "time"
+
+ "github.com/danshapiro/kilroy/internal/llm"
+)
+
+const (
+ providerName = "codex-app-server"
+ defaultCommand = "codex"
+ defaultConnectTimeout = 15 * time.Second
+ // No provider-imposed request cap by default; execution deadlines should come
+ // from caller context (for example stage/runtime policy timeouts).
+ defaultRequestTimeout = 0
+ defaultShutdownTimeout = 5 * time.Second
+ defaultInterruptTimeout = 2 * time.Second
+ defaultStderrTailLimit = 16 * 1024
+ maxJSONRPCLineSize = 16 * 1024 * 1024
+)
+
+var defaultCommandArgs = []string{"app-server", "--listen", "stdio://"}
+
+type TransportOptions struct {
+ Command string
+ Args []string
+ CWD string
+ Env map[string]string
+ InitializeParams map[string]any
+ ConnectTimeout time.Duration
+ RequestTimeout time.Duration
+ ShutdownTimeout time.Duration
+ StderrTailLimit int
+}
+
+type NotificationStream struct {
+ Notifications <-chan map[string]any
+ Err <-chan error
+ closeFn func()
+}
+
+func (s *NotificationStream) Close() {
+ if s == nil || s.closeFn == nil {
+ return
+ }
+ s.closeFn()
+}
+
+type pendingRequest struct {
+ method string
+ respCh chan pendingResult
+}
+
+type pendingResult struct {
+ result any
+ err error
+}
+
+type processLifecycle struct {
+ done chan struct{}
+ once sync.Once
+ mu sync.Mutex
+ err error
+}
+
+type turnWaitOutcome int
+
+const (
+ turnWaitCompleted turnWaitOutcome = iota
+ turnWaitContextDone
+ turnWaitProcessTerminated
+)
+
+func newProcessLifecycle() *processLifecycle {
+ return &processLifecycle{done: make(chan struct{})}
+}
+
+func (l *processLifecycle) finish(err error) {
+ if l == nil {
+ return
+ }
+ l.mu.Lock()
+ if l.err == nil {
+ l.err = err
+ }
+ l.mu.Unlock()
+ l.once.Do(func() {
+ close(l.done)
+ })
+}
+
+func (l *processLifecycle) doneCh() <-chan struct{} {
+ if l == nil {
+ return nil
+ }
+ return l.done
+}
+
+func (l *processLifecycle) processError() error {
+ if l == nil {
+ return nil
+ }
+ l.mu.Lock()
+ defer l.mu.Unlock()
+ return l.err
+}
+
+type stdioTransport struct {
+ opts TransportOptions
+
+ mu sync.Mutex
+ writeMu sync.Mutex
+ cmd *exec.Cmd
+ stdin io.WriteCloser
+ stdout io.ReadCloser
+ stderr io.ReadCloser
+ procDone chan struct{}
+ life *processLifecycle
+
+ closed bool
+ shuttingDown bool
+ initialized bool
+ initWait chan struct{}
+ initErr error
+
+ nextID int64
+
+ pending map[string]*pendingRequest
+ listeners map[int]func(map[string]any)
+ nextLID int
+
+ stderrTail string
+}
+
+func NewTransport(opts TransportOptions) *stdioTransport {
+ opts.Command = strings.TrimSpace(opts.Command)
+ if opts.Command == "" {
+ opts.Command = defaultCommand
+ }
+ if len(opts.Args) == 0 {
+ opts.Args = append([]string{}, defaultCommandArgs...)
+ }
+ if opts.ConnectTimeout <= 0 {
+ opts.ConnectTimeout = defaultConnectTimeout
+ }
+ if opts.RequestTimeout <= 0 {
+ opts.RequestTimeout = defaultRequestTimeout
+ }
+ if opts.ShutdownTimeout <= 0 {
+ opts.ShutdownTimeout = defaultShutdownTimeout
+ }
+ if opts.StderrTailLimit <= 0 {
+ opts.StderrTailLimit = defaultStderrTailLimit
+ }
+ if opts.InitializeParams == nil {
+ opts.InitializeParams = map[string]any{
+ "clientInfo": map[string]any{
+ "name": "unified_llm",
+ "title": "Unified LLM",
+ "version": "0.1.0",
+ },
+ }
+ }
+ return &stdioTransport{
+ opts: opts,
+ pending: map[string]*pendingRequest{},
+ listeners: map[int]func(map[string]any){},
+ }
+}
+
+func (t *stdioTransport) Initialize(ctx context.Context) error {
+ return t.ensureInitialized(ctx)
+}
+
+func (t *stdioTransport) Complete(ctx context.Context, payload map[string]any) (map[string]any, error) {
+ return t.runTurn(ctx, payload)
+}
+
+func (t *stdioTransport) Stream(ctx context.Context, payload map[string]any) (*NotificationStream, error) {
+ events := make(chan map[string]any, 128)
+ errs := make(chan error, 1)
+ sctx, cancel := context.WithCancel(ctx)
+
+ go func() {
+ defer close(events)
+ defer close(errs)
+
+ if err := t.ensureInitialized(sctx); err != nil {
+ errs <- err
+ return
+ }
+ life := t.currentProcessLifecycle()
+ if life == nil {
+ errs <- llm.NewNetworkError(providerName, "Codex app-server process is unavailable")
+ return
+ }
+
+ turnTemplate, err := parseTurnStartPayload(payload)
+ if err != nil {
+ errs <- err
+ return
+ }
+
+ requestCtx, requestCancel := contextWithRequestTimeout(sctx, t.opts.RequestTimeout)
+ defer requestCancel()
+
+ threadResp, err := t.startThread(requestCtx, toThreadStartParams(turnTemplate))
+ if err != nil {
+ errs <- err
+ return
+ }
+ thread := asMap(threadResp["thread"])
+ threadID := asString(thread["id"])
+ if threadID == "" {
+ errs <- llm.ErrorFromHTTPStatus(providerName, 400, "thread/start response missing thread.id", threadResp, nil)
+ return
+ }
+
+ turnParams := deepCopyMap(turnTemplate)
+ turnParams["threadId"] = threadID
+
+ var (
+ stateMu sync.Mutex
+ turnID string
+ completed = make(chan struct{}, 1)
+ )
+
+ sendNotification := func(notification map[string]any) {
+ select {
+ case events <- deepCopyMap(notification):
+ case <-requestCtx.Done():
+ }
+ }
+
+ unsubscribe := t.subscribe(func(notification map[string]any) {
+ stateMu.Lock()
+ currentTurnID := turnID
+ stateMu.Unlock()
+ if !notificationBelongsToTurn(notification, threadID, currentTurnID) {
+ return
+ }
+ sendNotification(notification)
+ notificationTurnID := extractTurnID(notification)
+ if notificationTurnID != "" {
+ stateMu.Lock()
+ if turnID == "" {
+ turnID = notificationTurnID
+ }
+ currentTurnID = turnID
+ stateMu.Unlock()
+ }
+ if asString(notification["method"]) == "turn/completed" {
+ if currentTurnID == "" || notificationTurnID == "" || notificationTurnID == currentTurnID {
+ select {
+ case completed <- struct{}{}:
+ default:
+ }
+ }
+ }
+ })
+ defer unsubscribe()
+
+ turnResp, err := t.startTurn(requestCtx, turnParams)
+ if err != nil {
+ errs <- err
+ return
+ }
+ turn := asMap(turnResp["turn"])
+ if tid := asString(turn["id"]); tid != "" {
+ stateMu.Lock()
+ turnID = tid
+ stateMu.Unlock()
+ }
+ if isTerminalTurnStatus(asString(turn["status"])) {
+ sendNotification(map[string]any{
+ "method": "turn/completed",
+ "params": map[string]any{
+ "threadId": threadID,
+ "turn": turn,
+ },
+ })
+ select {
+ case completed <- struct{}{}:
+ default:
+ }
+ }
+
+ outcome, waitErr := t.waitForTurnCompletion(requestCtx, completed, life)
+ if waitErr == nil {
+ return
+ }
+ if outcome == turnWaitContextDone {
+ stateMu.Lock()
+ currentTurnID := turnID
+ stateMu.Unlock()
+ if currentTurnID != "" {
+ go t.interruptTurnBestEffort(threadID, currentTurnID)
+ }
+ }
+ errs <- waitErr
+ return
+ }()
+
+ return &NotificationStream{Notifications: events, Err: errs, closeFn: cancel}, nil
+}
+
+func (t *stdioTransport) ListModels(ctx context.Context, params map[string]any) (modelListResponse, error) {
+ if err := t.ensureInitialized(ctx); err != nil {
+ return modelListResponse{}, err
+ }
+ if params == nil {
+ params = map[string]any{}
+ }
+ result, err := t.sendRequest(ctx, "model/list", params, t.opts.RequestTimeout)
+ if err != nil {
+ return modelListResponse{}, err
+ }
+ b, err := json.Marshal(result)
+ if err != nil {
+ return modelListResponse{}, err
+ }
+ dec := json.NewDecoder(bytes.NewReader(b))
+ dec.UseNumber()
+ var out modelListResponse
+ if err := dec.Decode(&out); err != nil {
+ return modelListResponse{}, err
+ }
+ if out.Data == nil {
+ out.Data = []modelEntry{}
+ }
+ return out, nil
+}
+
+func (t *stdioTransport) Close() error {
+ t.mu.Lock()
+ if t.closed {
+ t.mu.Unlock()
+ return nil
+ }
+ t.closed = true
+ t.mu.Unlock()
+
+ t.rejectAllPending(llm.NewNetworkError(providerName, "Codex transport closed"))
+ return t.shutdownProcess()
+}
+
+func (t *stdioTransport) runTurn(ctx context.Context, payload map[string]any) (map[string]any, error) {
+ if err := t.ensureInitialized(ctx); err != nil {
+ return nil, err
+ }
+ life := t.currentProcessLifecycle()
+ if life == nil {
+ return nil, llm.NewNetworkError(providerName, "Codex app-server process is unavailable")
+ }
+
+ turnTemplate, err := parseTurnStartPayload(payload)
+ if err != nil {
+ return nil, err
+ }
+
+ requestCtx, requestCancel := contextWithRequestTimeout(ctx, t.opts.RequestTimeout)
+ defer requestCancel()
+
+ threadResp, err := t.startThread(requestCtx, toThreadStartParams(turnTemplate))
+ if err != nil {
+ return nil, err
+ }
+ thread := asMap(threadResp["thread"])
+ threadID := asString(thread["id"])
+ if threadID == "" {
+ return nil, llm.ErrorFromHTTPStatus(providerName, 400, "thread/start response missing thread.id", threadResp, nil)
+ }
+
+ turnParams := deepCopyMap(turnTemplate)
+ turnParams["threadId"] = threadID
+
+ var (
+ stateMu sync.Mutex
+ notifications []map[string]any
+ turnID string
+ completed = make(chan struct{}, 1)
+ )
+
+ unsubscribe := t.subscribe(func(notification map[string]any) {
+ stateMu.Lock()
+ currentTurnID := turnID
+ stateMu.Unlock()
+ if !notificationBelongsToTurn(notification, threadID, currentTurnID) {
+ return
+ }
+ stateMu.Lock()
+ notifications = append(notifications, deepCopyMap(notification))
+ notificationTurnID := extractTurnID(notification)
+ if turnID == "" && notificationTurnID != "" {
+ turnID = notificationTurnID
+ }
+ currentTurnID = turnID
+ stateMu.Unlock()
+ if asString(notification["method"]) == "turn/completed" {
+ if currentTurnID == "" || notificationTurnID == "" || notificationTurnID == currentTurnID {
+ select {
+ case completed <- struct{}{}:
+ default:
+ }
+ }
+ }
+ })
+ defer unsubscribe()
+
+ turnResp, err := t.startTurn(requestCtx, turnParams)
+ if err != nil {
+ return nil, err
+ }
+ turn := asMap(turnResp["turn"])
+ if tid := asString(turn["id"]); tid != "" {
+ stateMu.Lock()
+ turnID = tid
+ stateMu.Unlock()
+ }
+ if isTerminalTurnStatus(asString(turn["status"])) {
+ select {
+ case completed <- struct{}{}:
+ default:
+ }
+ }
+
+ outcome, waitErr := t.waitForTurnCompletion(requestCtx, completed, life)
+ if waitErr != nil {
+ if outcome == turnWaitContextDone {
+ stateMu.Lock()
+ currentTurnID := turnID
+ stateMu.Unlock()
+ if currentTurnID != "" {
+ go t.interruptTurnBestEffort(threadID, currentTurnID)
+ }
+ }
+ return nil, waitErr
+ }
+
+ stateMu.Lock()
+ capturedNotifications := append([]map[string]any{}, notifications...)
+ capturedTurnID := turnID
+ stateMu.Unlock()
+
+ completedTurn := findCompletedTurn(capturedNotifications, capturedTurnID)
+ if completedTurn == nil {
+ completedTurn = turn
+ }
+ result := map[string]any{
+ "thread": thread,
+ "turn": completedTurn,
+ "threadId": threadID,
+ "turnId": firstNonEmpty(capturedTurnID, asString(completedTurn["id"])),
+ "notifications": capturedNotifications,
+ "threadResponse": threadResp,
+ "turnResponse": turnResp,
+ }
+ return result, nil
+}
+
+func (t *stdioTransport) startThread(ctx context.Context, params map[string]any) (map[string]any, error) {
+ return t.sendRequest(ctx, "thread/start", params, t.opts.RequestTimeout)
+}
+
+func (t *stdioTransport) startTurn(ctx context.Context, params map[string]any) (map[string]any, error) {
+ return t.sendRequest(ctx, "turn/start", params, t.opts.RequestTimeout)
+}
+
+func (t *stdioTransport) interruptTurn(ctx context.Context, params map[string]any) error {
+ _, err := t.sendRequest(ctx, "turn/interrupt", params, t.opts.RequestTimeout)
+ return err
+}
+
+func (t *stdioTransport) interruptTurnBestEffort(threadID, turnID string) {
+ if strings.TrimSpace(threadID) == "" || strings.TrimSpace(turnID) == "" {
+ return
+ }
+ timeout := t.interruptTimeout()
+ ctx, cancel := context.WithTimeout(context.Background(), timeout)
+ defer cancel()
+ _ = t.interruptTurn(ctx, map[string]any{"threadId": threadID, "turnId": turnID})
+}
+
+func (t *stdioTransport) interruptTimeout() time.Duration {
+ timeout := t.opts.RequestTimeout
+ if timeout <= 0 {
+ timeout = defaultInterruptTimeout
+ }
+ if t.opts.ShutdownTimeout > 0 && t.opts.ShutdownTimeout < timeout {
+ timeout = t.opts.ShutdownTimeout
+ }
+ return timeout
+}
+
+func (t *stdioTransport) currentProcessLifecycle() *processLifecycle {
+ t.mu.Lock()
+ defer t.mu.Unlock()
+ return t.life
+}
+
+func (t *stdioTransport) processTerminationError(life *processLifecycle) error {
+ if life != nil {
+ if err := life.processError(); err != nil {
+ return err
+ }
+ }
+ return llm.NewNetworkError(providerName, "Codex app-server process exited")
+}
+
+func (t *stdioTransport) waitForTurnCompletion(ctx context.Context, completed <-chan struct{}, life *processLifecycle) (turnWaitOutcome, error) {
+ if life == nil {
+ return turnWaitProcessTerminated, llm.NewNetworkError(providerName, "Codex app-server process is unavailable")
+ }
+ select {
+ case <-completed:
+ return turnWaitCompleted, nil
+ case <-ctx.Done():
+ return turnWaitContextDone, llm.WrapContextError(providerName, ctx.Err())
+ case <-life.doneCh():
+ return turnWaitProcessTerminated, t.processTerminationError(life)
+ }
+}
+
+func (t *stdioTransport) ensureInitialized(ctx context.Context) error {
+ t.mu.Lock()
+ if t.closed {
+ t.mu.Unlock()
+ return llm.NewNetworkError(providerName, "Codex transport is closed")
+ }
+ if t.initialized {
+ t.mu.Unlock()
+ return nil
+ }
+ if t.initWait != nil {
+ wait := t.initWait
+ t.mu.Unlock()
+ select {
+ case <-wait:
+ t.mu.Lock()
+ err := t.initErr
+ t.mu.Unlock()
+ return err
+ case <-ctx.Done():
+ return llm.WrapContextError(providerName, ctx.Err())
+ }
+ }
+
+ wait := make(chan struct{})
+ t.initWait = wait
+ t.mu.Unlock()
+
+ err := t.startAndInitialize(ctx)
+
+ t.mu.Lock()
+ if err == nil {
+ t.initialized = true
+ }
+ t.initErr = err
+ close(wait)
+ t.initWait = nil
+ t.mu.Unlock()
+ return err
+}
+
+func (t *stdioTransport) startAndInitialize(ctx context.Context) error {
+ if err := t.spawnProcess(); err != nil {
+ return err
+ }
+ connCtx, cancel := contextWithRequestTimeout(ctx, t.opts.ConnectTimeout)
+ defer cancel()
+ if _, err := t.sendRequest(connCtx, "initialize", t.opts.InitializeParams, t.opts.ConnectTimeout); err != nil {
+ _ = t.shutdownProcess()
+ return err
+ }
+ if err := t.sendNotification(connCtx, "initialized", nil); err != nil {
+ _ = t.shutdownProcess()
+ return err
+ }
+ return nil
+}
+
+func (t *stdioTransport) spawnProcess() error {
+ t.mu.Lock()
+ if t.cmd != nil && processAlive(t.cmd) {
+ t.mu.Unlock()
+ return nil
+ }
+ if t.closed {
+ t.mu.Unlock()
+ return llm.NewNetworkError(providerName, "Codex transport is closed")
+ }
+ t.mu.Unlock()
+
+ cmd := exec.Command(t.opts.Command, t.opts.Args...)
+ if strings.TrimSpace(t.opts.CWD) != "" {
+ cmd.Dir = t.opts.CWD
+ }
+ if len(t.opts.Env) > 0 {
+ env := os.Environ()
+ for key, value := range t.opts.Env {
+ env = append(env, key+"="+value)
+ }
+ cmd.Env = env
+ }
+
+ stdin, err := cmd.StdinPipe()
+ if err != nil {
+ return llm.NewNetworkError(providerName, fmt.Sprintf("failed to open stdin pipe: %v", err))
+ }
+ stdout, err := cmd.StdoutPipe()
+ if err != nil {
+ _ = stdin.Close()
+ return llm.NewNetworkError(providerName, fmt.Sprintf("failed to open stdout pipe: %v", err))
+ }
+ stderr, err := cmd.StderrPipe()
+ if err != nil {
+ _ = stdin.Close()
+ _ = stdout.Close()
+ return llm.NewNetworkError(providerName, fmt.Sprintf("failed to open stderr pipe: %v", err))
+ }
+
+ if err := cmd.Start(); err != nil {
+ _ = stdin.Close()
+ _ = stdout.Close()
+ _ = stderr.Close()
+ return llm.NewNetworkError(providerName, fmt.Sprintf("failed to spawn codex app-server: %v", err))
+ }
+
+ procDone := make(chan struct{})
+ life := newProcessLifecycle()
+ t.mu.Lock()
+ t.cmd = cmd
+ t.stdin = stdin
+ t.stdout = stdout
+ t.stderr = stderr
+ t.procDone = procDone
+ t.life = life
+ t.stderrTail = ""
+ t.initialized = false
+ t.mu.Unlock()
+
+ go t.readStdout(cmd, stdout, life)
+ go t.readStderr(stderr)
+ go t.waitForExit(cmd, procDone, life)
+
+ return nil
+}
+
+func (t *stdioTransport) readStdout(cmd *exec.Cmd, stdout io.Reader, life *processLifecycle) {
+ scanner := bufio.NewScanner(stdout)
+ buf := make([]byte, 0, 64*1024)
+ scanner.Buffer(buf, maxJSONRPCLineSize)
+ for scanner.Scan() {
+ line := strings.TrimSpace(scanner.Text())
+ if line == "" {
+ continue
+ }
+ dec := json.NewDecoder(strings.NewReader(line))
+ dec.UseNumber()
+ var message map[string]any
+ if err := dec.Decode(&message); err != nil {
+ continue
+ }
+ t.handleIncomingMessage(message)
+ }
+ if err := scanner.Err(); err != nil {
+ t.handleUnexpectedProcessTermination(life, llm.NewNetworkError(providerName, fmt.Sprintf("Codex stdout read error: %v", err)))
+ }
+ _ = cmd
+}
+
+func (t *stdioTransport) readStderr(stderr io.Reader) {
+ buf := make([]byte, 4096)
+ for {
+ n, err := stderr.Read(buf)
+ if n > 0 {
+ t.appendStderrTail(string(buf[:n]))
+ }
+ if err != nil {
+ return
+ }
+ }
+}
+
+func (t *stdioTransport) appendStderrTail(chunk string) {
+ if chunk == "" {
+ return
+ }
+ t.mu.Lock()
+ t.stderrTail += chunk
+ if len(t.stderrTail) > t.opts.StderrTailLimit {
+ t.stderrTail = t.stderrTail[len(t.stderrTail)-t.opts.StderrTailLimit:]
+ }
+ t.mu.Unlock()
+}
+
+func (t *stdioTransport) waitForExit(cmd *exec.Cmd, done chan struct{}, life *processLifecycle) {
+ err := cmd.Wait()
+ t.mu.Lock()
+ shuttingDown := t.shuttingDown
+ closed := t.closed
+ stderrTail := strings.TrimSpace(t.stderrTail)
+ if t.cmd == cmd {
+ t.cmd = nil
+ t.stdin = nil
+ t.stdout = nil
+ t.stderr = nil
+ t.procDone = nil
+ t.life = nil
+ t.initialized = false
+ }
+ t.shuttingDown = false
+ t.mu.Unlock()
+ close(done)
+
+ exitMessage := "Codex app-server process exited"
+ if err != nil {
+ exitMessage = fmt.Sprintf("Codex app-server process exited: %v", err)
+ }
+ if stderrTail != "" {
+ exitMessage = exitMessage + ". stderr: " + stderrTail
+ }
+ exitErr := llm.NewNetworkError(providerName, exitMessage)
+ life.finish(exitErr)
+
+ if shuttingDown || closed {
+ return
+ }
+ message := "Codex app-server exited unexpectedly"
+ if err != nil {
+ message = fmt.Sprintf("Codex app-server exited unexpectedly: %v", err)
+ }
+ if stderrTail != "" {
+ message = message + ". stderr: " + stderrTail
+ }
+ t.handleUnexpectedProcessTermination(life, llm.NewNetworkError(providerName, message))
+}
+
+func (t *stdioTransport) handleUnexpectedProcessTermination(life *processLifecycle, err error) {
+ life.finish(err)
+ t.rejectAllPending(err)
+}
+
+func (t *stdioTransport) handleIncomingMessage(message map[string]any) {
+ id, hasID := message["id"]
+ _, hasResult := message["result"]
+ errorObj := asMap(message["error"])
+
+ if hasID && hasResult {
+ t.resolvePendingRequest(id, pendingResult{result: message["result"]})
+ return
+ }
+ if hasID && errorObj != nil {
+ t.resolvePendingRequest(id, pendingResult{err: t.toRPCError(asString(message["method"]), errorObj)})
+ return
+ }
+
+ method := strings.TrimSpace(asString(message["method"]))
+ if method == "" {
+ return
+ }
+ if hasID {
+ go t.handleServerRequest(id, method, message["params"])
+ return
+ }
+ notification := map[string]any{"method": method}
+ if params := asMap(message["params"]); params != nil {
+ notification["params"] = params
+ }
+ t.emitNotification(notification)
+}
+
+func (t *stdioTransport) emitNotification(notification map[string]any) {
+ t.mu.Lock()
+ listeners := make([]func(map[string]any), 0, len(t.listeners))
+ for _, listener := range t.listeners {
+ listeners = append(listeners, listener)
+ }
+ t.mu.Unlock()
+ for _, listener := range listeners {
+ func(l func(map[string]any)) {
+ defer func() { _ = recover() }()
+ l(notification)
+ }(listener)
+ }
+}
+
+func (t *stdioTransport) subscribe(listener func(map[string]any)) func() {
+ t.mu.Lock()
+ id := t.nextLID
+ t.nextLID++
+ t.listeners[id] = listener
+ t.mu.Unlock()
+ return func() {
+ t.mu.Lock()
+ delete(t.listeners, id)
+ t.mu.Unlock()
+ }
+}
+
+func (t *stdioTransport) resolvePendingRequest(id any, result pendingResult) {
+ key := rpcIDKey(id)
+ t.mu.Lock()
+ pending := t.pending[key]
+ if pending != nil {
+ delete(t.pending, key)
+ }
+ t.mu.Unlock()
+ if pending == nil {
+ return
+ }
+ select {
+ case pending.respCh <- result:
+ default:
+ }
+}
+
+func (t *stdioTransport) rejectAllPending(err error) {
+ t.mu.Lock()
+ pending := t.pending
+ t.pending = map[string]*pendingRequest{}
+ t.mu.Unlock()
+ for _, req := range pending {
+ if req == nil {
+ continue
+ }
+ select {
+ case req.respCh <- pendingResult{err: err}:
+ default:
+ }
+ }
+}
+
+func (t *stdioTransport) sendRequest(ctx context.Context, method string, params any, timeout time.Duration) (map[string]any, error) {
+ requestCtx, cancel := contextWithRequestTimeout(ctx, timeout)
+ defer cancel()
+ if err := requestCtx.Err(); err != nil {
+ return nil, llm.WrapContextError(providerName, err)
+ }
+
+ id := atomic.AddInt64(&t.nextID, 1)
+ idKey := rpcIDKey(id)
+ respCh := make(chan pendingResult, 1)
+
+ t.mu.Lock()
+ if t.closed {
+ t.mu.Unlock()
+ return nil, llm.NewNetworkError(providerName, "Codex transport is closed")
+ }
+ t.pending[idKey] = &pendingRequest{method: method, respCh: respCh}
+ t.mu.Unlock()
+
+ request := map[string]any{"id": id, "method": method}
+ if params != nil {
+ request["params"] = params
+ }
+ if err := t.writeJSONLine(request); err != nil {
+ t.mu.Lock()
+ delete(t.pending, idKey)
+ t.mu.Unlock()
+ return nil, err
+ }
+
+ select {
+ case result := <-respCh:
+ if result.err != nil {
+ return nil, result.err
+ }
+ if m := asMap(result.result); m != nil {
+ return m, nil
+ }
+ if result.result == nil {
+ return map[string]any{}, nil
+ }
+ b, err := json.Marshal(result.result)
+ if err != nil {
+ return nil, llm.NewNetworkError(providerName, fmt.Sprintf("invalid RPC result for %s: %v", method, err))
+ }
+ return decodeJSONToMap(b), nil
+ case <-requestCtx.Done():
+ t.mu.Lock()
+ delete(t.pending, idKey)
+ t.mu.Unlock()
+ return nil, llm.WrapContextError(providerName, requestCtx.Err())
+ }
+}
+
+func (t *stdioTransport) sendNotification(ctx context.Context, method string, params any) error {
+ if err := ctx.Err(); err != nil {
+ return llm.WrapContextError(providerName, err)
+ }
+ message := map[string]any{"method": method}
+ if params != nil {
+ message["params"] = params
+ }
+ return t.writeJSONLine(message)
+}
+
+func (t *stdioTransport) writeJSONLine(message map[string]any) error {
+ b, err := json.Marshal(message)
+ if err != nil {
+ return llm.NewNetworkError(providerName, fmt.Sprintf("failed to marshal RPC message: %v", err))
+ }
+ line := append(b, '\n')
+
+ t.mu.Lock()
+ stdin := t.stdin
+ cmd := t.cmd
+ t.mu.Unlock()
+ if stdin == nil || cmd == nil || !processAlive(cmd) {
+ return llm.NewNetworkError(providerName, "Codex app-server stdin is not writable")
+ }
+
+ t.writeMu.Lock()
+ defer t.writeMu.Unlock()
+ if _, err := stdin.Write(line); err != nil {
+ return llm.NewNetworkError(providerName, fmt.Sprintf("failed to write to codex app-server: %v", err))
+ }
+ return nil
+}
+
+func (t *stdioTransport) toRPCError(method string, errObj map[string]any) error {
+ code := asInt(errObj["code"], 0)
+ message := firstNonEmpty(asString(errObj["message"]), "RPC error")
+ wrapped := fmt.Sprintf("Codex RPC %s failed (%d): %s", method, code, message)
+ switch code {
+ case -32700, -32600, -32601, -32602:
+ return llm.ErrorFromHTTPStatus(providerName, 400, wrapped, errObj["data"], nil)
+ default:
+ return llm.ErrorFromHTTPStatus(providerName, 500, wrapped, errObj["data"], nil)
+ }
+}
+
+func (t *stdioTransport) handleServerRequest(id any, method string, params any) {
+ sendSuccess := func(result any) {
+ _ = t.writeJSONLine(map[string]any{"id": id, "result": result})
+ }
+ sendError := func(code int, message string, data any) {
+ errObj := map[string]any{"code": code, "message": message}
+ if data != nil {
+ errObj["data"] = data
+ }
+ _ = t.writeJSONLine(map[string]any{"id": id, "error": errObj})
+ }
+
+ switch method {
+ case "item/tool/call":
+ sendSuccess(map[string]any{"contentItems": []any{}, "success": false})
+ case "item/tool/requestUserInput":
+ sendSuccess(buildDefaultUserInputResponse(params))
+ case "item/commandExecution/requestApproval":
+ sendSuccess(map[string]any{"decision": "decline"})
+ case "item/fileChange/requestApproval":
+ sendSuccess(map[string]any{"decision": "decline"})
+ case "applyPatchApproval":
+ sendSuccess(map[string]any{"decision": "denied"})
+ case "execCommandApproval":
+ sendSuccess(map[string]any{"decision": "denied"})
+ case "account/chatgptAuthTokens/refresh":
+ sendError(-32001, "External ChatGPT auth token refresh is not configured", nil)
+ default:
+ sendError(-32601, "Method not found: "+method, nil)
+ }
+}
+
+func buildDefaultUserInputResponse(params any) map[string]any {
+ answers := map[string]any{}
+ p := asMap(params)
+ if p == nil {
+ return map[string]any{"answers": answers}
+ }
+ for _, questionRaw := range asSlice(p["questions"]) {
+ question := asMap(questionRaw)
+ if question == nil {
+ continue
+ }
+ id := asString(question["id"])
+ if id == "" {
+ continue
+ }
+ answers[id] = map[string]any{"answers": []any{}}
+ }
+ return map[string]any{"answers": answers}
+}
+
+func (t *stdioTransport) shutdownProcess() error {
+ t.mu.Lock()
+ cmd := t.cmd
+ stdin := t.stdin
+ done := t.procDone
+ t.shuttingDown = true
+ t.mu.Unlock()
+
+ if cmd == nil {
+ return nil
+ }
+ if stdin != nil {
+ _ = stdin.Close()
+ }
+ if cmd.Process != nil {
+ _ = cmd.Process.Signal(os.Interrupt)
+ }
+
+ if done != nil {
+ select {
+ case <-done:
+ return nil
+ case <-time.After(t.opts.ShutdownTimeout):
+ }
+ }
+
+ if cmd.Process != nil {
+ _ = cmd.Process.Kill()
+ }
+ if done != nil {
+ select {
+ case <-done:
+ case <-time.After(time.Second):
+ }
+ }
+ return nil
+}
+
+func parseTurnStartPayload(payload map[string]any) (map[string]any, error) {
+ if payload == nil {
+ return nil, llm.ErrorFromHTTPStatus(providerName, 400, "codex-app-server turn payload must be an object", nil, nil)
+ }
+ input := asSlice(payload["input"])
+ if input == nil {
+ return nil, llm.ErrorFromHTTPStatus(providerName, 400, "codex-app-server turn payload is missing input array", payload, nil)
+ }
+ out := deepCopyMap(payload)
+ if strings.TrimSpace(asString(out["threadId"])) == "" {
+ out["threadId"] = defaultThreadID
+ }
+ out["input"] = input
+ return out, nil
+}
+
+func toThreadStartParams(turn map[string]any) map[string]any {
+ thread := map[string]any{}
+ for _, key := range []string{"model", "cwd", "approvalPolicy", "personality"} {
+ if v, ok := turn[key]; ok && v != nil {
+ thread[key] = v
+ }
+ }
+ if sandbox := asString(turn["sandbox"]); sandbox == "read-only" || sandbox == "workspace-write" || sandbox == "danger-full-access" {
+ thread["sandbox"] = sandbox
+ }
+ return thread
+}
+
+func isTerminalTurnStatus(status string) bool {
+ switch strings.TrimSpace(status) {
+ case "completed", "failed", "interrupted":
+ return true
+ default:
+ return false
+ }
+}
+
+func notificationBelongsToTurn(notification map[string]any, threadID, turnID string) bool {
+ notificationThreadID := extractThreadID(notification)
+ if notificationThreadID != "" && notificationThreadID != threadID {
+ return false
+ }
+ if turnID == "" {
+ return true
+ }
+ notificationTurnID := extractTurnID(notification)
+ if notificationTurnID != "" && notificationTurnID != turnID {
+ return false
+ }
+ return true
+}
+
+func extractThreadID(notification map[string]any) string {
+ params := asMap(notification["params"])
+ if params == nil {
+ return ""
+ }
+ if threadID := asString(params["threadId"]); threadID != "" {
+ return threadID
+ }
+ if threadID := asString(params["thread_id"]); threadID != "" {
+ return threadID
+ }
+ return ""
+}
+
+func extractTurnID(notification map[string]any) string {
+ params := asMap(notification["params"])
+ if params == nil {
+ return ""
+ }
+ if turnID := asString(params["turnId"]); turnID != "" {
+ return turnID
+ }
+ if turnID := asString(params["turn_id"]); turnID != "" {
+ return turnID
+ }
+ turn := asMap(params["turn"])
+ if turn != nil {
+ if turnID := asString(turn["id"]); turnID != "" {
+ return turnID
+ }
+ }
+ return ""
+}
+
+func findCompletedTurn(notifications []map[string]any, turnID string) map[string]any {
+ for idx := len(notifications) - 1; idx >= 0; idx-- {
+ notification := notifications[idx]
+ if asString(notification["method"]) != "turn/completed" {
+ continue
+ }
+ notificationTurnID := extractTurnID(notification)
+ if turnID != "" && notificationTurnID != "" && notificationTurnID != turnID {
+ continue
+ }
+ turn := asMap(asMap(notification["params"])["turn"])
+ if turn == nil {
+ continue
+ }
+ if asString(turn["id"]) == "" {
+ continue
+ }
+ if asSlice(turn["items"]) == nil {
+ continue
+ }
+ return turn
+ }
+ return nil
+}
+
+func contextWithRequestTimeout(ctx context.Context, timeout time.Duration) (context.Context, context.CancelFunc) {
+ if timeout <= 0 {
+ return context.WithCancel(ctx)
+ }
+ if deadline, ok := ctx.Deadline(); ok {
+ if time.Until(deadline) <= timeout {
+ return context.WithCancel(ctx)
+ }
+ }
+ return context.WithTimeout(ctx, timeout)
+}
+
+func processAlive(cmd *exec.Cmd) bool {
+ if cmd == nil {
+ return false
+ }
+ if cmd.ProcessState != nil && cmd.ProcessState.Exited() {
+ return false
+ }
+ if cmd.Process == nil {
+ return false
+ }
+ return true
+}
+
+func rpcIDKey(id any) string {
+ return strings.TrimSpace(fmt.Sprintf("%v", id))
+}
diff --git a/internal/llm/providers/codexappserver/transport_helpers_test.go b/internal/llm/providers/codexappserver/transport_helpers_test.go
new file mode 100644
index 00000000..594d8d2d
--- /dev/null
+++ b/internal/llm/providers/codexappserver/transport_helpers_test.go
@@ -0,0 +1,630 @@
+package codexappserver
+
+import (
+ "bytes"
+ "context"
+ "encoding/json"
+ "errors"
+ "fmt"
+ "io"
+ "os"
+ "os/exec"
+ "strings"
+ "sync"
+ "testing"
+ "time"
+
+ "github.com/danshapiro/kilroy/internal/llm"
+)
+
+type recordingWriteCloser struct {
+ mu sync.Mutex
+ buf bytes.Buffer
+ err error
+ close bool
+}
+
+func (w *recordingWriteCloser) Write(p []byte) (int, error) {
+ w.mu.Lock()
+ defer w.mu.Unlock()
+ if w.err != nil {
+ return 0, w.err
+ }
+ return w.buf.Write(p)
+}
+
+func (w *recordingWriteCloser) Close() error {
+ w.mu.Lock()
+ defer w.mu.Unlock()
+ w.close = true
+ return nil
+}
+
+func (w *recordingWriteCloser) lines() []string {
+ w.mu.Lock()
+ defer w.mu.Unlock()
+ raw := strings.TrimSpace(w.buf.String())
+ if raw == "" {
+ return nil
+ }
+ return strings.Split(raw, "\n")
+}
+
+func aliveCmd(t *testing.T) *exec.Cmd {
+ t.Helper()
+ proc, err := os.FindProcess(os.Getpid())
+ if err != nil {
+ t.Fatalf("FindProcess: %v", err)
+ }
+ return &exec.Cmd{Process: proc}
+}
+
+func TestTransport_ParseTurnStartPayload_ValidatesAndDefaults(t *testing.T) {
+ if _, err := parseTurnStartPayload(nil); err == nil {
+ t.Fatalf("expected error for nil payload")
+ }
+
+ if _, err := parseTurnStartPayload(map[string]any{"threadId": "thread_1"}); err == nil {
+ t.Fatalf("expected error for missing input array")
+ }
+
+ in := map[string]any{
+ "input": []any{
+ map[string]any{"type": "message", "role": "user"},
+ },
+ }
+ out, err := parseTurnStartPayload(in)
+ if err != nil {
+ t.Fatalf("parseTurnStartPayload: %v", err)
+ }
+ if got := asString(out["threadId"]); got != defaultThreadID {
+ t.Fatalf("threadId default: got %q want %q", got, defaultThreadID)
+ }
+ if asSlice(out["input"]) == nil {
+ t.Fatalf("input array was not preserved: %#v", out["input"])
+ }
+ if _, ok := in["threadId"]; ok {
+ t.Fatalf("expected input map to remain unmodified; got %#v", in)
+ }
+}
+
+func TestTransport_ToThreadStartParams_FiltersFields(t *testing.T) {
+ turn := map[string]any{
+ "model": "codex-mini",
+ "cwd": "/tmp/repo",
+ "approvalPolicy": "never",
+ "personality": "strict",
+ "sandbox": "danger-full-access",
+ "ignored": true,
+ }
+ got := toThreadStartParams(turn)
+ if got["model"] != "codex-mini" || got["cwd"] != "/tmp/repo" || got["approvalPolicy"] != "never" || got["personality"] != "strict" {
+ t.Fatalf("unexpected mapped thread params: %#v", got)
+ }
+ if got["sandbox"] != "danger-full-access" {
+ t.Fatalf("sandbox: got %#v", got["sandbox"])
+ }
+ if _, ok := got["ignored"]; ok {
+ t.Fatalf("did not expect ignored key in thread params: %#v", got)
+ }
+
+ turn["sandbox"] = "unsupported"
+ got = toThreadStartParams(turn)
+ if _, ok := got["sandbox"]; ok {
+ t.Fatalf("unexpected sandbox for unsupported mode: %#v", got["sandbox"])
+ }
+}
+
+func TestTransport_TurnStatusAndNotificationMatching(t *testing.T) {
+ if !isTerminalTurnStatus("completed") || !isTerminalTurnStatus("failed") || !isTerminalTurnStatus("interrupted") {
+ t.Fatalf("expected terminal statuses to match")
+ }
+ if isTerminalTurnStatus("running") {
+ t.Fatalf("did not expect running to be terminal")
+ }
+
+ n1 := map[string]any{"params": map[string]any{"threadId": "thread_1", "turnId": "turn_1"}}
+ if got := extractThreadID(n1); got != "thread_1" {
+ t.Fatalf("extractThreadID: got %q", got)
+ }
+ if got := extractTurnID(n1); got != "turn_1" {
+ t.Fatalf("extractTurnID: got %q", got)
+ }
+ if !notificationBelongsToTurn(n1, "thread_1", "turn_1") {
+ t.Fatalf("expected notification to belong to matching thread/turn")
+ }
+ if notificationBelongsToTurn(n1, "thread_2", "turn_1") {
+ t.Fatalf("expected thread mismatch to reject notification")
+ }
+ if notificationBelongsToTurn(n1, "thread_1", "turn_2") {
+ t.Fatalf("expected turn mismatch to reject notification")
+ }
+
+ n2 := map[string]any{"params": map[string]any{"thread_id": "thread_1", "turn": map[string]any{"id": "turn_1"}}}
+ if got := extractThreadID(n2); got != "thread_1" {
+ t.Fatalf("extractThreadID snake_case: got %q", got)
+ }
+ if got := extractTurnID(n2); got != "turn_1" {
+ t.Fatalf("extractTurnID nested turn.id: got %q", got)
+ }
+ if !notificationBelongsToTurn(n2, "thread_1", "") {
+ t.Fatalf("expected turn-less matching to pass when thread matches")
+ }
+}
+
+func TestTransport_FindCompletedTurn_ReturnsLatestMatching(t *testing.T) {
+ notifications := []map[string]any{
+ {"method": "turn/progress", "params": map[string]any{"threadId": "thread_1"}},
+ {"method": "turn/completed", "params": map[string]any{"turnId": "turn_old", "turn": map[string]any{"id": "turn_old", "items": []any{"a"}}}},
+ {"method": "turn/completed", "params": map[string]any{"turnId": "turn_new", "turn": map[string]any{"id": "turn_new"}}}, // missing items
+ {"method": "turn/completed", "params": map[string]any{"turnId": "turn_new", "turn": map[string]any{"id": "turn_new", "items": []any{"x"}}}},
+ }
+
+ got := findCompletedTurn(notifications, "turn_new")
+ if got == nil {
+ t.Fatalf("expected completed turn")
+ }
+ if asString(got["id"]) != "turn_new" {
+ t.Fatalf("completed turn id: got %#v", got["id"])
+ }
+ if len(asSlice(got["items"])) != 1 {
+ t.Fatalf("completed turn items: %#v", got["items"])
+ }
+
+ if miss := findCompletedTurn(notifications, "turn_missing"); miss != nil {
+ t.Fatalf("expected nil for missing turn id; got %#v", miss)
+ }
+}
+
+func TestTransport_ProcessAliveAndRPCIDKey(t *testing.T) {
+ if processAlive(nil) {
+ t.Fatalf("nil command should not be alive")
+ }
+
+ cmd := aliveCmd(t)
+ if !processAlive(cmd) {
+ t.Fatalf("expected command with process to be alive")
+ }
+
+ finished := exec.Command(os.Args[0], "-test.run=TestTransport_HelperProcess")
+ finished.Env = append(os.Environ(),
+ "GO_WANT_TRANSPORT_HELPER=1",
+ "GO_TRANSPORT_HELPER_MODE=exit",
+ )
+ if err := finished.Run(); err != nil {
+ t.Fatalf("run finished command: %v", err)
+ }
+ if processAlive(finished) {
+ t.Fatalf("expected exited command to be not alive")
+ }
+
+ if got := rpcIDKey(" x "); got != "x" {
+ t.Fatalf("rpcIDKey string trim: got %q", got)
+ }
+ if got := rpcIDKey(12); got != "12" {
+ t.Fatalf("rpcIDKey int conversion: got %q", got)
+ }
+}
+
+func TestTransport_ResolveAndRejectPendingRequests(t *testing.T) {
+ tp := &stdioTransport{pending: map[string]*pendingRequest{}}
+
+ respCh := make(chan pendingResult, 1)
+ tp.pending["1"] = &pendingRequest{method: "turn/start", respCh: respCh}
+ tp.resolvePendingRequest(1, pendingResult{result: map[string]any{"ok": true}})
+
+ select {
+ case got := <-respCh:
+ if asMap(got.result)["ok"] != true {
+ t.Fatalf("unexpected pending result: %#v", got.result)
+ }
+ default:
+ t.Fatalf("expected resolved pending result")
+ }
+ if _, ok := tp.pending["1"]; ok {
+ t.Fatalf("expected pending request to be removed after resolution")
+ }
+
+ errCh := make(chan pendingResult, 1)
+ tp.pending["2"] = &pendingRequest{method: "turn/start", respCh: errCh}
+ wantErr := errors.New("transport failed")
+ tp.rejectAllPending(wantErr)
+
+ select {
+ case got := <-errCh:
+ if !errors.Is(got.err, wantErr) {
+ t.Fatalf("rejectAllPending err: got %v want %v", got.err, wantErr)
+ }
+ default:
+ t.Fatalf("expected rejected pending error")
+ }
+ if len(tp.pending) != 0 {
+ t.Fatalf("expected pending map to be cleared; got %#v", tp.pending)
+ }
+}
+
+func TestTransport_EmitNotificationAndSubscribeLifecycle(t *testing.T) {
+ tp := &stdioTransport{listeners: map[int]func(map[string]any){}}
+
+ got := make(chan string, 2)
+ unsubscribe := tp.subscribe(func(notification map[string]any) {
+ got <- asString(notification["method"])
+ })
+ _ = tp.subscribe(func(map[string]any) {
+ panic("listener panic should be recovered")
+ })
+
+ tp.emitNotification(map[string]any{"method": "turn/progress"})
+ select {
+ case method := <-got:
+ if method != "turn/progress" {
+ t.Fatalf("unexpected method: %q", method)
+ }
+ default:
+ t.Fatalf("expected listener notification")
+ }
+
+ unsubscribe()
+ tp.emitNotification(map[string]any{"method": "turn/completed"})
+ select {
+ case method := <-got:
+ t.Fatalf("did not expect method after unsubscribe: %q", method)
+ default:
+ }
+}
+
+func TestTransport_ToRPCError_MapsJSONRPCCodes(t *testing.T) {
+ tp := &stdioTransport{}
+ badReq := tp.toRPCError("turn/start", map[string]any{
+ "code": -32601,
+ "message": "method not found",
+ "data": map[string]any{"hint": "check schema"},
+ })
+ var llmErr llm.Error
+ if !errors.As(badReq, &llmErr) {
+ t.Fatalf("expected llm.Error, got %T", badReq)
+ }
+ if llmErr.StatusCode() != 400 {
+ t.Fatalf("status code: got %d want 400", llmErr.StatusCode())
+ }
+
+ serverErr := tp.toRPCError("turn/start", map[string]any{
+ "code": -32000,
+ "message": "internal",
+ })
+ if !errors.As(serverErr, &llmErr) {
+ t.Fatalf("expected llm.Error, got %T", serverErr)
+ }
+ if llmErr.StatusCode() != 500 {
+ t.Fatalf("status code: got %d want 500", llmErr.StatusCode())
+ }
+}
+
+func TestTransport_WriteJSONLine_ValidationAndSuccessPaths(t *testing.T) {
+ tp := &stdioTransport{}
+ if err := tp.writeJSONLine(map[string]any{"bad": func() {}}); err == nil {
+ t.Fatalf("expected marshal error")
+ }
+
+ if err := tp.writeJSONLine(map[string]any{"method": "x"}); err == nil {
+ t.Fatalf("expected non-writable stdin error without process")
+ }
+
+ writer := &recordingWriteCloser{}
+ tp = &stdioTransport{cmd: aliveCmd(t), stdin: writer}
+ if err := tp.writeJSONLine(map[string]any{"method": "turn/start"}); err != nil {
+ t.Fatalf("writeJSONLine success: %v", err)
+ }
+ lines := writer.lines()
+ if len(lines) != 1 || !strings.Contains(lines[0], `"method":"turn/start"`) {
+ t.Fatalf("unexpected written line: %#v", lines)
+ }
+
+ failingWriter := &recordingWriteCloser{err: errors.New("boom")}
+ tp = &stdioTransport{cmd: aliveCmd(t), stdin: failingWriter}
+ if err := tp.writeJSONLine(map[string]any{"method": "turn/start"}); err == nil {
+ t.Fatalf("expected write failure")
+ }
+}
+
+func TestTransport_SendRequest_CoversClosedWriteSuccessAndTimeout(t *testing.T) {
+ tp := &stdioTransport{
+ closed: true,
+ pending: map[string]*pendingRequest{},
+ }
+ if _, err := tp.sendRequest(context.Background(), "turn/start", nil, 50*time.Millisecond); err == nil {
+ t.Fatalf("expected closed transport error")
+ }
+
+ tp = &stdioTransport{
+ pending: map[string]*pendingRequest{},
+ }
+ if _, err := tp.sendRequest(context.Background(), "turn/start", nil, 50*time.Millisecond); err == nil {
+ t.Fatalf("expected write error when process is unavailable")
+ }
+ if len(tp.pending) != 0 {
+ t.Fatalf("expected pending to be cleaned after write failure; got %#v", tp.pending)
+ }
+
+ writer := &recordingWriteCloser{}
+ tp = &stdioTransport{
+ pending: map[string]*pendingRequest{},
+ cmd: aliveCmd(t),
+ stdin: writer,
+ }
+ resultCh := make(chan struct {
+ value map[string]any
+ err error
+ }, 1)
+ go func() {
+ value, err := tp.sendRequest(context.Background(), "turn/start", map[string]any{"input": []any{}}, 250*time.Millisecond)
+ resultCh <- struct {
+ value map[string]any
+ err error
+ }{value: value, err: err}
+ }()
+
+ deadline := time.Now().Add(150 * time.Millisecond)
+ for {
+ tp.mu.Lock()
+ _, ok := tp.pending["1"]
+ tp.mu.Unlock()
+ if ok {
+ break
+ }
+ if time.Now().After(deadline) {
+ t.Fatalf("timed out waiting for pending request registration")
+ }
+ time.Sleep(2 * time.Millisecond)
+ }
+ tp.resolvePendingRequest(1, pendingResult{result: map[string]any{"ok": true}})
+ got := <-resultCh
+ if got.err != nil {
+ t.Fatalf("sendRequest success path returned error: %v", got.err)
+ }
+ if asMap(got.value)["ok"] != true {
+ t.Fatalf("sendRequest success payload: %#v", got.value)
+ }
+
+ timeoutTransport := &stdioTransport{
+ pending: map[string]*pendingRequest{},
+ cmd: aliveCmd(t),
+ stdin: &recordingWriteCloser{},
+ }
+ _, err := timeoutTransport.sendRequest(context.Background(), "turn/start", nil, 25*time.Millisecond)
+ if err == nil {
+ t.Fatalf("expected timeout error")
+ }
+ tpErr := &llm.RequestTimeoutError{}
+ if !errors.As(err, &tpErr) {
+ t.Fatalf("expected RequestTimeoutError, got %T (%v)", err, err)
+ }
+ timeoutTransport.mu.Lock()
+ pendingLen := len(timeoutTransport.pending)
+ timeoutTransport.mu.Unlock()
+ if pendingLen != 0 {
+ t.Fatalf("expected timeout path to clear pending map, got len=%d", pendingLen)
+ }
+}
+
+func TestTransport_HandleIncomingMessage_ResolvesPendingAndForwardsNotifications(t *testing.T) {
+ tp := &stdioTransport{
+ pending: map[string]*pendingRequest{},
+ listeners: map[int]func(map[string]any){},
+ }
+
+ okCh := make(chan pendingResult, 1)
+ tp.pending["1"] = &pendingRequest{method: "turn/start", respCh: okCh}
+ tp.handleIncomingMessage(map[string]any{"id": 1, "result": map[string]any{"threadId": "thread_1"}})
+ select {
+ case got := <-okCh:
+ if asMap(got.result)["threadId"] != "thread_1" {
+ t.Fatalf("unexpected result payload: %#v", got.result)
+ }
+ default:
+ t.Fatalf("expected pending result resolution")
+ }
+
+ errCh := make(chan pendingResult, 1)
+ tp.pending["2"] = &pendingRequest{method: "turn/start", respCh: errCh}
+ tp.handleIncomingMessage(map[string]any{
+ "id": 2,
+ "method": "turn/start",
+ "error": map[string]any{"code": -32601, "message": "method not found"},
+ })
+ select {
+ case got := <-errCh:
+ if got.err == nil {
+ t.Fatalf("expected rpc error result")
+ }
+ default:
+ t.Fatalf("expected pending error resolution")
+ }
+
+ notifications := make(chan string, 1)
+ unsubscribe := tp.subscribe(func(notification map[string]any) {
+ notifications <- asString(notification["method"])
+ })
+ tp.handleIncomingMessage(map[string]any{
+ "method": "turn/progress",
+ "params": map[string]any{"threadId": "thread_1"},
+ })
+ select {
+ case method := <-notifications:
+ if method != "turn/progress" {
+ t.Fatalf("unexpected notification method: %q", method)
+ }
+ default:
+ t.Fatalf("expected notification fan-out")
+ }
+ unsubscribe()
+}
+
+func TestTransport_HandleServerRequest_SupportsKnownMethods(t *testing.T) {
+ writer := &recordingWriteCloser{}
+ tp := &stdioTransport{
+ cmd: aliveCmd(t),
+ stdin: writer,
+ }
+
+ cases := []struct {
+ id int
+ method string
+ params any
+ wantField string
+ wantValue string
+ wantErr int
+ }{
+ {id: 1, method: "item/tool/call", wantField: "success", wantValue: "false"},
+ {id: 2, method: "item/tool/requestUserInput", params: map[string]any{"questions": []any{map[string]any{"id": "q1"}}}, wantField: "answers", wantValue: "map"},
+ {id: 3, method: "item/commandExecution/requestApproval", wantField: "decision", wantValue: "decline"},
+ {id: 4, method: "item/fileChange/requestApproval", wantField: "decision", wantValue: "decline"},
+ {id: 5, method: "applyPatchApproval", wantField: "decision", wantValue: "denied"},
+ {id: 6, method: "execCommandApproval", wantField: "decision", wantValue: "denied"},
+ {id: 7, method: "account/chatgptAuthTokens/refresh", wantErr: -32001},
+ {id: 8, method: "unknown/method", wantErr: -32601},
+ }
+
+ for _, tc := range cases {
+ tp.handleServerRequest(tc.id, tc.method, tc.params)
+ }
+
+ lines := writer.lines()
+ if len(lines) != len(cases) {
+ t.Fatalf("written lines: got %d want %d (%#v)", len(lines), len(cases), lines)
+ }
+ for i, tc := range cases {
+ var msg map[string]any
+ if err := json.Unmarshal([]byte(lines[i]), &msg); err != nil {
+ t.Fatalf("unmarshal response line %d: %v", i, err)
+ }
+ if got := asInt(msg["id"], 0); got != tc.id {
+ t.Fatalf("line %d id: got %d want %d", i, got, tc.id)
+ }
+ if tc.wantErr != 0 {
+ errObj := asMap(msg["error"])
+ if got := asInt(errObj["code"], 0); got != tc.wantErr {
+ t.Fatalf("line %d error code: got %d want %d", i, got, tc.wantErr)
+ }
+ continue
+ }
+ result := asMap(msg["result"])
+ switch tc.wantValue {
+ case "false":
+ if got := fmt.Sprint(result[tc.wantField]); got != "false" {
+ t.Fatalf("line %d result[%q]: got %q want false", i, tc.wantField, got)
+ }
+ case "map":
+ if asMap(result[tc.wantField]) == nil {
+ t.Fatalf("line %d expected map field %q in result: %#v", i, tc.wantField, result)
+ }
+ default:
+ if got := asString(result[tc.wantField]); got != tc.wantValue {
+ t.Fatalf("line %d result[%q]: got %q want %q", i, tc.wantField, got, tc.wantValue)
+ }
+ }
+ }
+}
+
+func TestTransport_ShutdownProcess_WithRunningProcess(t *testing.T) {
+ cmd := exec.Command(os.Args[0], "-test.run=TestTransport_HelperProcess")
+ cmd.Env = append(os.Environ(),
+ "GO_WANT_TRANSPORT_HELPER=1",
+ "GO_TRANSPORT_HELPER_MODE=stdin",
+ )
+ stdin, err := cmd.StdinPipe()
+ if err != nil {
+ t.Fatalf("StdinPipe: %v", err)
+ }
+ if err := cmd.Start(); err != nil {
+ t.Fatalf("Start: %v", err)
+ }
+ done := make(chan struct{})
+ go func() {
+ _ = cmd.Wait()
+ close(done)
+ }()
+
+ tp := &stdioTransport{
+ cmd: cmd,
+ stdin: stdin,
+ procDone: done,
+ opts: TransportOptions{
+ ShutdownTimeout: 250 * time.Millisecond,
+ },
+ }
+ if err := tp.shutdownProcess(); err != nil {
+ t.Fatalf("shutdownProcess: %v", err)
+ }
+
+ select {
+ case <-done:
+ case <-time.After(500 * time.Millisecond):
+ t.Fatalf("helper process did not exit after shutdown")
+ }
+}
+
+func TestTransport_HelperProcess(t *testing.T) {
+ if os.Getenv("GO_WANT_TRANSPORT_HELPER") != "1" {
+ return
+ }
+ switch os.Getenv("GO_TRANSPORT_HELPER_MODE") {
+ case "stdin":
+ _, _ = io.Copy(io.Discard, os.Stdin)
+ case "exit":
+ return
+ default:
+ return
+ }
+}
+
+func TestProcessLifecycle_FinishIsIdempotent(t *testing.T) {
+ life := newProcessLifecycle()
+ firstErr := errors.New("first exit")
+ secondErr := errors.New("second exit")
+
+ life.finish(firstErr)
+ life.finish(secondErr)
+
+ select {
+ case <-life.doneCh():
+ default:
+ t.Fatalf("expected lifecycle done channel to close")
+ }
+ if !errors.Is(life.processError(), firstErr) {
+ t.Fatalf("expected first process error to win, got %v", life.processError())
+ }
+}
+
+func TestTransport_WaitForTurnCompletion_UnblocksOnProcessExit(t *testing.T) {
+ tp := &stdioTransport{}
+ life := newProcessLifecycle()
+ completed := make(chan struct{})
+ resultCh := make(chan struct {
+ outcome turnWaitOutcome
+ err error
+ }, 1)
+
+ go func() {
+ outcome, err := tp.waitForTurnCompletion(context.Background(), completed, life)
+ resultCh <- struct {
+ outcome turnWaitOutcome
+ err error
+ }{outcome: outcome, err: err}
+ }()
+
+ processErr := llm.NewNetworkError(providerName, "Codex app-server exited unexpectedly")
+ life.finish(processErr)
+
+ select {
+ case result := <-resultCh:
+ if result.outcome != turnWaitProcessTerminated {
+ t.Fatalf("wait outcome: got %v want %v", result.outcome, turnWaitProcessTerminated)
+ }
+ if !errors.Is(result.err, processErr) {
+ t.Fatalf("wait error: got %v want %v", result.err, processErr)
+ }
+ case <-time.After(500 * time.Millisecond):
+ t.Fatalf("timed out waiting for process-exit unblock")
+ }
+}
diff --git a/internal/llm/providers/codexappserver/transport_timeout_test.go b/internal/llm/providers/codexappserver/transport_timeout_test.go
new file mode 100644
index 00000000..e51afdec
--- /dev/null
+++ b/internal/llm/providers/codexappserver/transport_timeout_test.go
@@ -0,0 +1,82 @@
+package codexappserver
+
+import (
+ "context"
+ "testing"
+ "time"
+)
+
+func TestNewTransport_DefaultRequestTimeoutIsDisabled(t *testing.T) {
+ transport := NewTransport(TransportOptions{})
+ if transport.opts.RequestTimeout != 0 {
+ t.Fatalf("request timeout: got %v want 0 (disabled)", transport.opts.RequestTimeout)
+ }
+}
+
+func TestContextWithRequestTimeout_DisabledDoesNotInjectDeadline(t *testing.T) {
+ ctx := context.Background()
+
+ derivedCtx, cancel := contextWithRequestTimeout(ctx, 0)
+ defer cancel()
+
+ if _, ok := derivedCtx.Deadline(); ok {
+ t.Fatalf("expected no deadline when timeout is disabled")
+ }
+}
+
+func TestContextWithRequestTimeout_PositiveTimeoutInjectsDeadline(t *testing.T) {
+ ctx := context.Background()
+
+ derivedCtx, cancel := contextWithRequestTimeout(ctx, 500*time.Millisecond)
+ defer cancel()
+
+ deadline, ok := derivedCtx.Deadline()
+ if !ok {
+ t.Fatalf("expected derived context deadline")
+ }
+ remaining := time.Until(deadline)
+ if remaining <= 0 || remaining > time.Second {
+ t.Fatalf("derived deadline remaining=%v, expected within (0, 1s]", remaining)
+ }
+}
+
+func TestContextWithRequestTimeout_ParentDeadlineTakesPrecedence(t *testing.T) {
+ parentCtx, cancelParent := context.WithTimeout(context.Background(), 250*time.Millisecond)
+ defer cancelParent()
+
+ derivedCtx, cancelDerived := contextWithRequestTimeout(parentCtx, 5*time.Second)
+ defer cancelDerived()
+
+ deadline, ok := derivedCtx.Deadline()
+ if !ok {
+ t.Fatalf("expected derived context deadline from parent")
+ }
+ remaining := time.Until(deadline)
+ if remaining <= 0 || remaining > time.Second {
+ t.Fatalf("derived deadline remaining=%v, expected parent-sized deadline", remaining)
+ }
+}
+
+func TestInterruptTimeout_DefaultsWhenRequestTimeoutDisabled(t *testing.T) {
+ transport := NewTransport(TransportOptions{})
+ if got := transport.interruptTimeout(); got != defaultInterruptTimeout {
+ t.Fatalf("interrupt timeout: got %v want %v", got, defaultInterruptTimeout)
+ }
+}
+
+func TestInterruptTimeout_UsesRequestTimeoutWhenSet(t *testing.T) {
+ transport := NewTransport(TransportOptions{RequestTimeout: 3 * time.Second})
+ if got := transport.interruptTimeout(); got != 3*time.Second {
+ t.Fatalf("interrupt timeout: got %v want %v", got, 3*time.Second)
+ }
+}
+
+func TestInterruptTimeout_ClampsToShutdownTimeout(t *testing.T) {
+ transport := NewTransport(TransportOptions{
+ RequestTimeout: 7 * time.Second,
+ ShutdownTimeout: 1 * time.Second,
+ })
+ if got := transport.interruptTimeout(); got != 1*time.Second {
+ t.Fatalf("interrupt timeout: got %v want %v", got, 1*time.Second)
+ }
+}
diff --git a/internal/llm/providers/codexappserver/util.go b/internal/llm/providers/codexappserver/util.go
new file mode 100644
index 00000000..9ff0859a
--- /dev/null
+++ b/internal/llm/providers/codexappserver/util.go
@@ -0,0 +1,124 @@
+package codexappserver
+
+import (
+ "bytes"
+ "encoding/json"
+ "fmt"
+ "strings"
+)
+
+func asMap(v any) map[string]any {
+ m, _ := v.(map[string]any)
+ return m
+}
+
+func asSlice(v any) []any {
+ a, _ := v.([]any)
+ return a
+}
+
+func asString(v any) string {
+ switch x := v.(type) {
+ case string:
+ return x
+ case json.Number:
+ return x.String()
+ default:
+ return ""
+ }
+}
+
+func asInt(v any, def int) int {
+ switch x := v.(type) {
+ case int:
+ return x
+ case int8:
+ return int(x)
+ case int16:
+ return int(x)
+ case int32:
+ return int(x)
+ case int64:
+ return int(x)
+ case float64:
+ return int(x)
+ case float32:
+ return int(x)
+ case json.Number:
+ i, err := x.Int64()
+ if err == nil {
+ return int(i)
+ }
+ case string:
+ n := strings.TrimSpace(x)
+ if n == "" {
+ return def
+ }
+ var i int
+ if _, err := fmt.Sscanf(n, "%d", &i); err == nil {
+ return i
+ }
+ }
+ return def
+}
+
+func asBool(v any, def bool) bool {
+ b, ok := v.(bool)
+ if ok {
+ return b
+ }
+ return def
+}
+
+func deepCopyMap(in map[string]any) map[string]any {
+ if in == nil {
+ return nil
+ }
+ b, err := json.Marshal(in)
+ if err != nil {
+ out := make(map[string]any, len(in))
+ for k, v := range in {
+ out[k] = v
+ }
+ return out
+ }
+ return decodeJSONToMap(b)
+}
+
+func decodeJSONToMap(b []byte) map[string]any {
+ dec := json.NewDecoder(bytes.NewReader(b))
+ dec.UseNumber()
+ var out map[string]any
+ if err := dec.Decode(&out); err != nil {
+ return map[string]any{}
+ }
+ if out == nil {
+ return map[string]any{}
+ }
+ return out
+}
+
+func normalizeCode(value string) string {
+ value = strings.TrimSpace(strings.ToUpper(value))
+ if value == "" {
+ return ""
+ }
+ var b strings.Builder
+ for _, r := range value {
+ if (r >= 'A' && r <= 'Z') || (r >= '0' && r <= '9') {
+ b.WriteRune(r)
+ continue
+ }
+ b.WriteByte('_')
+ }
+ return b.String()
+}
+
+func firstNonEmpty(values ...string) string {
+ for _, v := range values {
+ if s := strings.TrimSpace(v); s != "" {
+ return s
+ }
+ }
+ return ""
+}
diff --git a/internal/llm/providers/codexappserver/util_test.go b/internal/llm/providers/codexappserver/util_test.go
new file mode 100644
index 00000000..adfabad2
--- /dev/null
+++ b/internal/llm/providers/codexappserver/util_test.go
@@ -0,0 +1,121 @@
+package codexappserver
+
+import (
+ "encoding/json"
+ "testing"
+)
+
+func TestUtil_AsHelpersAndNormalization(t *testing.T) {
+ m := map[string]any{"x": 1}
+ if got := asMap(m); got["x"] != 1 {
+ t.Fatalf("asMap mismatch: %#v", got)
+ }
+ if got := asMap("not-a-map"); got != nil {
+ t.Fatalf("expected nil for invalid map cast, got %#v", got)
+ }
+
+ s := []any{"a", 1}
+ if got := asSlice(s); len(got) != 2 {
+ t.Fatalf("asSlice mismatch: %#v", got)
+ }
+ if got := asSlice("not-a-slice"); got != nil {
+ t.Fatalf("expected nil for invalid slice cast, got %#v", got)
+ }
+
+ if got := asString("abc"); got != "abc" {
+ t.Fatalf("asString string mismatch: %q", got)
+ }
+ if got := asString(json.Number("123")); got != "123" {
+ t.Fatalf("asString json.Number mismatch: %q", got)
+ }
+ if got := asString(99); got != "" {
+ t.Fatalf("expected empty string for unsupported type, got %q", got)
+ }
+
+ if got := normalizeCode(" invalid-request "); got != "INVALID_REQUEST" {
+ t.Fatalf("normalizeCode mismatch: %q", got)
+ }
+ if got := firstNonEmpty("", " ", "x", "y"); got != "x" {
+ t.Fatalf("firstNonEmpty mismatch: %q", got)
+ }
+}
+
+func TestUtil_AsIntAndBool(t *testing.T) {
+ if got := asInt(int8(2), 0); got != 2 {
+ t.Fatalf("asInt int8 mismatch: %d", got)
+ }
+ if got := asInt(int16(3), 0); got != 3 {
+ t.Fatalf("asInt int16 mismatch: %d", got)
+ }
+ if got := asInt(int32(4), 0); got != 4 {
+ t.Fatalf("asInt int32 mismatch: %d", got)
+ }
+ if got := asInt(int64(5), 0); got != 5 {
+ t.Fatalf("asInt int64 mismatch: %d", got)
+ }
+ if got := asInt(float32(6.9), 0); got != 6 {
+ t.Fatalf("asInt float32 mismatch: %d", got)
+ }
+ if got := asInt(float64(7.9), 0); got != 7 {
+ t.Fatalf("asInt float64 mismatch: %d", got)
+ }
+ if got := asInt(json.Number("8"), 0); got != 8 {
+ t.Fatalf("asInt json.Number mismatch: %d", got)
+ }
+ if got := asInt(" 9 ", 0); got != 9 {
+ t.Fatalf("asInt string numeric mismatch: %d", got)
+ }
+ if got := asInt(" ", 42); got != 42 {
+ t.Fatalf("asInt empty string should use default: %d", got)
+ }
+ if got := asInt("not-numeric", 42); got != 42 {
+ t.Fatalf("asInt invalid string should use default: %d", got)
+ }
+
+ if got := asBool(true, false); !got {
+ t.Fatalf("asBool true mismatch")
+ }
+ if got := asBool("bad", true); !got {
+ t.Fatalf("asBool fallback mismatch")
+ }
+}
+
+func TestUtil_DeepCopyAndDecodeJSONToMap(t *testing.T) {
+ orig := map[string]any{"nested": map[string]any{"x": 1}}
+ cp := deepCopyMap(orig)
+ if cp == nil {
+ t.Fatalf("deepCopyMap returned nil")
+ }
+ nested := asMap(cp["nested"])
+ nested["x"] = 9
+ if asMap(orig["nested"])["x"] != 1 {
+ t.Fatalf("deepCopyMap should not alias nested map")
+ }
+
+ cp = deepCopyMap(nil)
+ if cp != nil {
+ t.Fatalf("deepCopyMap(nil) should return nil, got %#v", cp)
+ }
+
+ withUnmarshalable := map[string]any{
+ "f": func() {},
+ "x": "ok",
+ }
+ cp = deepCopyMap(withUnmarshalable)
+ if cp["x"] != "ok" {
+ t.Fatalf("deepCopyMap fallback copy mismatch: %#v", cp)
+ }
+ if _, ok := cp["f"]; !ok {
+ t.Fatalf("deepCopyMap fallback should preserve unmarshalable key")
+ }
+
+ if got := decodeJSONToMap([]byte(`{"x":1}`)); asString(got["x"]) != "1" {
+ t.Fatalf("decodeJSONToMap valid json mismatch: %#v", got)
+ }
+ if got := decodeJSONToMap([]byte(`not-json`)); len(got) != 0 {
+ t.Fatalf("decodeJSONToMap invalid json should return empty map, got %#v", got)
+ }
+ if got := decodeJSONToMap([]byte(`null`)); len(got) != 0 {
+ t.Fatalf("decodeJSONToMap null should return empty map, got %#v", got)
+ }
+}
diff --git a/internal/llm/providers/google/adapter.go b/internal/llm/providers/google/adapter.go
index e8c9334d..e377ce4f 100644
--- a/internal/llm/providers/google/adapter.go
+++ b/internal/llm/providers/google/adapter.go
@@ -203,7 +203,9 @@ func (a *Adapter) Complete(ctx context.Context, req llm.Request) (llm.Response,
return llm.Response{}, llm.ErrorFromHTTPStatus(a.Name(), resp.StatusCode, msg, raw, ra)
}
- return fromGeminiResponse(a.Name(), raw, req.Model), nil
+ out := fromGeminiResponse(a.Name(), raw, req.Model)
+ out.RateLimit = llm.ParseRateLimitInfo(resp.Header, time.Now())
+ return out, nil
}
func (a *Adapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, error) {
@@ -338,6 +340,7 @@ func (a *Adapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, erro
cancel()
return nil, llm.ErrorFromHTTPStatus(a.Name(), resp.StatusCode, msg, raw, ra)
}
+ rateLimit := llm.ParseRateLimitInfo(resp.Header, time.Now())
s := llm.NewChanStream(cancel)
s.Send(llm.StreamEvent{Type: llm.StreamEventStreamStart})
@@ -435,12 +438,13 @@ func (a *Adapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, erro
flushTextPart()
msg := llm.Message{Role: llm.RoleAssistant, Content: contentParts}
r := llm.Response{
- Provider: a.Name(),
- Model: req.Model,
- Message: msg,
- Finish: finish,
- Usage: usage,
- Raw: raw,
+ Provider: a.Name(),
+ Model: req.Model,
+ Message: msg,
+ Finish: finish,
+ Usage: usage,
+ RateLimit: rateLimit,
+ Raw: raw,
}
if r.Finish.Reason == "" {
if len(r.ToolCalls()) > 0 {
diff --git a/internal/llm/providers/google/adapter_test.go b/internal/llm/providers/google/adapter_test.go
index e800fecf..3490a109 100644
--- a/internal/llm/providers/google/adapter_test.go
+++ b/internal/llm/providers/google/adapter_test.go
@@ -17,6 +17,28 @@ import (
"github.com/danshapiro/kilroy/internal/llm"
)
+func assertRateLimitInfo(t *testing.T, rl *llm.RateLimitInfo) {
+ t.Helper()
+ if rl == nil {
+ t.Fatalf("expected rate limit info, got nil")
+ }
+ if rl.RequestsRemaining == nil || *rl.RequestsRemaining != 9 {
+ t.Fatalf("requests_remaining: %#v", rl.RequestsRemaining)
+ }
+ if rl.RequestsLimit == nil || *rl.RequestsLimit != 10 {
+ t.Fatalf("requests_limit: %#v", rl.RequestsLimit)
+ }
+ if rl.TokensRemaining == nil || *rl.TokensRemaining != 90 {
+ t.Fatalf("tokens_remaining: %#v", rl.TokensRemaining)
+ }
+ if rl.TokensLimit == nil || *rl.TokensLimit != 100 {
+ t.Fatalf("tokens_limit: %#v", rl.TokensLimit)
+ }
+ if rl.ResetAt != "2025-01-01T00:00:10Z" {
+ t.Fatalf("reset_at: %q", rl.ResetAt)
+ }
+}
+
func TestAdapter_Complete_MapsToGeminiGenerateContent(t *testing.T) {
var gotBody map[string]any
gotKey := ""
@@ -34,6 +56,11 @@ func TestAdapter_Complete_MapsToGeminiGenerateContent(t *testing.T) {
_ = json.Unmarshal(b, &gotBody)
w.Header().Set("Content-Type", "application/json")
+ w.Header().Set("x-ratelimit-remaining-requests", "9")
+ w.Header().Set("x-ratelimit-limit-requests", "10")
+ w.Header().Set("x-ratelimit-remaining-tokens", "90")
+ w.Header().Set("x-ratelimit-limit-tokens", "100")
+ w.Header().Set("x-ratelimit-reset-requests", "Wed, 01 Jan 2025 00:00:10 GMT")
_, _ = w.Write([]byte(`{
"candidates": [{"content": {"parts": [{"text":"Hello"}]}, "finishReason":"STOP"}],
"usageMetadata": {"promptTokenCount": 1, "candidatesTokenCount": 2, "totalTokenCount": 3}
@@ -64,6 +91,7 @@ func TestAdapter_Complete_MapsToGeminiGenerateContent(t *testing.T) {
if strings.TrimSpace(resp.Text()) != "Hello" {
t.Fatalf("resp text: %q", resp.Text())
}
+ assertRateLimitInfo(t, resp.RateLimit)
if gotKey != "k" {
t.Fatalf("key param: %q", gotKey)
}
@@ -529,6 +557,11 @@ func TestAdapter_Stream_YieldsTextDeltasAndFinish(t *testing.T) {
_ = json.Unmarshal(b, &gotBody)
w.Header().Set("Content-Type", "text/event-stream")
+ w.Header().Set("x-ratelimit-remaining-requests", "9")
+ w.Header().Set("x-ratelimit-limit-requests", "10")
+ w.Header().Set("x-ratelimit-remaining-tokens", "90")
+ w.Header().Set("x-ratelimit-limit-tokens", "100")
+ w.Header().Set("x-ratelimit-reset-requests", "Wed, 01 Jan 2025 00:00:10 GMT")
f, _ := w.(http.Flusher)
write := func(data string) {
_, _ = io.WriteString(w, "data: "+data+"\n\n")
@@ -570,6 +603,7 @@ func TestAdapter_Stream_YieldsTextDeltasAndFinish(t *testing.T) {
if finish == nil || strings.TrimSpace(finish.Text()) != "Hello" {
t.Fatalf("finish response: %+v", finish)
}
+ assertRateLimitInfo(t, finish.RateLimit)
if gotKey != "k" {
t.Fatalf("key param: %q", gotKey)
}
diff --git a/internal/llm/providers/openai/adapter.go b/internal/llm/providers/openai/adapter.go
index e5426298..f4428ca6 100644
--- a/internal/llm/providers/openai/adapter.go
+++ b/internal/llm/providers/openai/adapter.go
@@ -157,7 +157,9 @@ func (a *Adapter) Complete(ctx context.Context, req llm.Request) (llm.Response,
return llm.Response{}, llm.ErrorFromHTTPStatus(a.Name(), resp.StatusCode, msg, raw, ra)
}
- return fromResponses(a.Name(), raw, req.Model), nil
+ out := fromResponses(a.Name(), raw, req.Model)
+ out.RateLimit = llm.ParseRateLimitInfo(resp.Header, time.Now())
+ return out, nil
}
func (a *Adapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, error) {
@@ -173,7 +175,7 @@ func (a *Adapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, erro
}
body := map[string]any{
- "model": req.Model,
+ "model": modelmeta.ProviderRelativeModelID("openai", req.Model),
"instructions": instructions,
"input": inputItems,
"parallel_tool_calls": false,
@@ -255,6 +257,7 @@ func (a *Adapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, erro
s := llm.NewChanStream(cancel)
// STREAM_START
s.Send(llm.StreamEvent{Type: llm.StreamEventStreamStart})
+ rateLimit := llm.ParseRateLimitInfo(resp.Header, time.Now())
go func() {
defer func() {
@@ -399,6 +402,7 @@ func (a *Adapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, erro
rawResp = payload
}
r := fromResponses(a.Name(), rawResp, req.Model)
+ r.RateLimit = rateLimit
// Ensure text segment is closed.
if textStarted {
s.Send(llm.StreamEvent{Type: llm.StreamEventTextEnd, TextID: textID})
diff --git a/internal/llm/providers/openai/adapter_test.go b/internal/llm/providers/openai/adapter_test.go
index c2e19ba3..7b6191f8 100644
--- a/internal/llm/providers/openai/adapter_test.go
+++ b/internal/llm/providers/openai/adapter_test.go
@@ -16,6 +16,28 @@ import (
"github.com/danshapiro/kilroy/internal/llm"
)
+func assertRateLimitInfo(t *testing.T, rl *llm.RateLimitInfo) {
+ t.Helper()
+ if rl == nil {
+ t.Fatalf("expected rate limit info, got nil")
+ }
+ if rl.RequestsRemaining == nil || *rl.RequestsRemaining != 9 {
+ t.Fatalf("requests_remaining: %#v", rl.RequestsRemaining)
+ }
+ if rl.RequestsLimit == nil || *rl.RequestsLimit != 10 {
+ t.Fatalf("requests_limit: %#v", rl.RequestsLimit)
+ }
+ if rl.TokensRemaining == nil || *rl.TokensRemaining != 90 {
+ t.Fatalf("tokens_remaining: %#v", rl.TokensRemaining)
+ }
+ if rl.TokensLimit == nil || *rl.TokensLimit != 100 {
+ t.Fatalf("tokens_limit: %#v", rl.TokensLimit)
+ }
+ if rl.ResetAt != "2025-01-01T00:00:10Z" {
+ t.Fatalf("reset_at: %q", rl.ResetAt)
+ }
+}
+
func TestAdapter_Complete_MapsToResponsesAPI(t *testing.T) {
var gotBody map[string]any
@@ -29,6 +51,11 @@ func TestAdapter_Complete_MapsToResponsesAPI(t *testing.T) {
_ = json.Unmarshal(b, &gotBody)
w.Header().Set("Content-Type", "application/json")
+ w.Header().Set("x-ratelimit-remaining-requests", "9")
+ w.Header().Set("x-ratelimit-limit-requests", "10")
+ w.Header().Set("x-ratelimit-remaining-tokens", "90")
+ w.Header().Set("x-ratelimit-limit-tokens", "100")
+ w.Header().Set("x-ratelimit-reset-requests", "Wed, 01 Jan 2025 00:00:10 GMT")
_, _ = w.Write([]byte(`{
"id": "resp_1",
"model": "gpt-5.4",
@@ -67,6 +94,7 @@ func TestAdapter_Complete_MapsToResponsesAPI(t *testing.T) {
if strings.TrimSpace(resp.Text()) != "Hello" {
t.Fatalf("resp text: %q", resp.Text())
}
+ assertRateLimitInfo(t, resp.RateLimit)
// Assert request mapping.
if gotBody == nil {
@@ -89,6 +117,48 @@ func TestAdapter_Complete_MapsToResponsesAPI(t *testing.T) {
}
}
+func TestAdapter_Complete_NormalizesProviderPrefixedModelID(t *testing.T) {
+ var gotBody map[string]any
+
+ srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ if r.Method != http.MethodPost || r.URL.Path != "/v1/responses" {
+ w.WriteHeader(http.StatusNotFound)
+ return
+ }
+ b, _ := io.ReadAll(r.Body)
+ _ = r.Body.Close()
+ _ = json.Unmarshal(b, &gotBody)
+
+ w.Header().Set("Content-Type", "application/json")
+ _, _ = w.Write([]byte(`{
+ "id": "resp_1",
+ "model": "gpt-5.2",
+ "output": [{"type": "message", "content": [{"type":"output_text", "text":"ok"}]}],
+ "usage": {"input_tokens": 1, "output_tokens": 1, "total_tokens": 2}
+}`))
+ }))
+ t.Cleanup(srv.Close)
+
+ a := &Adapter{APIKey: "k", BaseURL: srv.URL, Client: srv.Client()}
+ ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+ defer cancel()
+
+ _, err := a.Complete(ctx, llm.Request{Model: "openai/gpt-5.2", Messages: []llm.Message{llm.User("hi")}})
+ if err != nil {
+ t.Fatalf("Complete: %v", err)
+ }
+ if gotBody == nil {
+ t.Fatalf("server did not capture request body")
+ }
+ gotModel, ok := gotBody["model"].(string)
+ if !ok {
+ t.Fatalf("request model field type: %#v", gotBody["model"])
+ }
+ if got := strings.TrimSpace(gotModel); got != "gpt-5.2" {
+ t.Fatalf("request model: got %q want %q", got, "gpt-5.2")
+ }
+}
+
func TestOpenAIAdapter_NewWithProvider_UsesConfiguredName(t *testing.T) {
a := NewWithProvider("kimi", "k", "https://api.example.com")
if got := a.Name(); got != "kimi" {
@@ -333,6 +403,11 @@ func TestAdapter_Stream_YieldsTextDeltasAndFinish(t *testing.T) {
_ = json.Unmarshal(b, &gotBody)
w.Header().Set("Content-Type", "text/event-stream")
+ w.Header().Set("x-ratelimit-remaining-requests", "9")
+ w.Header().Set("x-ratelimit-limit-requests", "10")
+ w.Header().Set("x-ratelimit-remaining-tokens", "90")
+ w.Header().Set("x-ratelimit-limit-tokens", "100")
+ w.Header().Set("x-ratelimit-reset-requests", "Wed, 01 Jan 2025 00:00:10 GMT")
f, _ := w.(http.Flusher)
write := func(event string, data string) {
@@ -377,6 +452,7 @@ func TestAdapter_Stream_YieldsTextDeltasAndFinish(t *testing.T) {
if finish == nil || strings.TrimSpace(finish.Text()) != "Hello" {
t.Fatalf("finish response: %+v", finish)
}
+ assertRateLimitInfo(t, finish.RateLimit)
if gotBody == nil {
t.Fatalf("server did not capture request body")
@@ -411,6 +487,52 @@ func TestAdapter_Stream_YieldsTextDeltasAndFinish(t *testing.T) {
}
}
+func TestAdapter_Stream_NormalizesProviderPrefixedModelID(t *testing.T) {
+ var gotBody map[string]any
+
+ srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+ if r.Method != http.MethodPost || r.URL.Path != "/v1/responses" {
+ w.WriteHeader(http.StatusNotFound)
+ return
+ }
+ b, _ := io.ReadAll(r.Body)
+ _ = r.Body.Close()
+ _ = json.Unmarshal(b, &gotBody)
+
+ w.Header().Set("Content-Type", "text/event-stream")
+ f, _ := w.(http.Flusher)
+ _, _ = io.WriteString(w, "event: response.completed\n")
+ _, _ = io.WriteString(w, `data: {"type":"response.completed","response":{"id":"resp_1","model":"gpt-5.2","output":[{"type":"message","content":[{"type":"output_text","text":"Hello"}]}],"usage":{"input_tokens":1,"output_tokens":2,"total_tokens":3}}}`+"\n\n")
+ if f != nil {
+ f.Flush()
+ }
+ }))
+ t.Cleanup(srv.Close)
+
+ a := &Adapter{APIKey: "k", BaseURL: srv.URL, Client: srv.Client()}
+ ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
+ defer cancel()
+
+ stream, err := a.Stream(ctx, llm.Request{Model: "openai/gpt-5.2", Messages: []llm.Message{llm.User("hi")}})
+ if err != nil {
+ t.Fatalf("Stream: %v", err)
+ }
+ defer stream.Close()
+ for range stream.Events() {
+ }
+
+ if gotBody == nil {
+ t.Fatalf("server did not capture request body")
+ }
+ gotModel, ok := gotBody["model"].(string)
+ if !ok {
+ t.Fatalf("request model field type: %#v", gotBody["model"])
+ }
+ if got := strings.TrimSpace(gotModel); got != "gpt-5.2" {
+ t.Fatalf("request model: got %q want %q", got, "gpt-5.2")
+ }
+}
+
func TestAdapter_Stream_TranslatesToolCalls(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost || r.URL.Path != "/v1/responses" {
diff --git a/internal/llm/providers/openaicompat/adapter.go b/internal/llm/providers/openaicompat/adapter.go
index 6f6c0d61..db445085 100644
--- a/internal/llm/providers/openaicompat/adapter.go
+++ b/internal/llm/providers/openaicompat/adapter.go
@@ -118,6 +118,7 @@ func (a *Adapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, erro
_, perr := parseChatCompletionsResponse(a.cfg.Provider, req.Model, resp)
return nil, perr
}
+ rateLimit := llm.ParseRateLimitInfo(resp.Header, time.Now())
s := llm.NewChanStream(cancelAll)
go func() {
@@ -127,9 +128,10 @@ func (a *Adapter) Stream(ctx context.Context, req llm.Request) (llm.Stream, erro
s.Send(llm.StreamEvent{Type: llm.StreamEventStreamStart})
state := &chatStreamState{
- Provider: a.cfg.Provider,
- Model: req.Model,
- TextID: "assistant_text",
+ Provider: a.cfg.Provider,
+ Model: req.Model,
+ TextID: "assistant_text",
+ RateLimit: rateLimit,
}
err := llm.ParseSSE(sctx, resp.Body, func(ev llm.SSEEvent) error {
@@ -235,7 +237,12 @@ func parseChatCompletionsResponse(provider, model string, resp *http.Response) (
if err := dec.Decode(&raw); err != nil {
return llm.Response{}, llm.WrapContextError(provider, err)
}
- return fromChatCompletions(provider, model, raw)
+ out, err := fromChatCompletions(provider, model, raw)
+ if err != nil {
+ return llm.Response{}, err
+ }
+ out.RateLimit = llm.ParseRateLimitInfo(resp.Header, time.Now())
+ return out, nil
}
func toChatCompletionsMessages(msgs []llm.Message) []map[string]any {
@@ -441,9 +448,10 @@ func normalizeFinishReason(in string) string {
}
type chatStreamState struct {
- Provider string
- Model string
- TextID string
+ Provider string
+ Model string
+ TextID string
+ RateLimit *llm.RateLimitInfo
Text strings.Builder
TextOpen bool
@@ -490,11 +498,12 @@ func (st *chatStreamState) FinalResponse() llm.Response {
finish = llm.FinishReason{Reason: "stop", Raw: "stop"}
}
return llm.Response{
- Provider: st.Provider,
- Model: st.Model,
- Message: msg,
- Finish: finish,
- Usage: st.Usage,
+ Provider: st.Provider,
+ Model: st.Model,
+ Message: msg,
+ Finish: finish,
+ Usage: st.Usage,
+ RateLimit: st.RateLimit,
}
}
diff --git a/internal/llm/providers/openaicompat/adapter_test.go b/internal/llm/providers/openaicompat/adapter_test.go
index 7c75997b..7df3f6be 100644
--- a/internal/llm/providers/openaicompat/adapter_test.go
+++ b/internal/llm/providers/openaicompat/adapter_test.go
@@ -12,11 +12,38 @@ import (
"github.com/danshapiro/kilroy/internal/llm"
)
+func assertRateLimitInfo(t *testing.T, rl *llm.RateLimitInfo) {
+ t.Helper()
+ if rl == nil {
+ t.Fatalf("expected rate limit info, got nil")
+ }
+ if rl.RequestsRemaining == nil || *rl.RequestsRemaining != 9 {
+ t.Fatalf("requests_remaining: %#v", rl.RequestsRemaining)
+ }
+ if rl.RequestsLimit == nil || *rl.RequestsLimit != 10 {
+ t.Fatalf("requests_limit: %#v", rl.RequestsLimit)
+ }
+ if rl.TokensRemaining == nil || *rl.TokensRemaining != 90 {
+ t.Fatalf("tokens_remaining: %#v", rl.TokensRemaining)
+ }
+ if rl.TokensLimit == nil || *rl.TokensLimit != 100 {
+ t.Fatalf("tokens_limit: %#v", rl.TokensLimit)
+ }
+ if rl.ResetAt != "2025-01-01T00:00:10Z" {
+ t.Fatalf("reset_at: %q", rl.ResetAt)
+ }
+}
+
func TestAdapter_Complete_ChatCompletionsMapsToolCalls(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/v1/chat/completions" {
t.Fatalf("path: %s", r.URL.Path)
}
+ w.Header().Set("x-ratelimit-remaining-requests", "9")
+ w.Header().Set("x-ratelimit-limit-requests", "10")
+ w.Header().Set("x-ratelimit-remaining-tokens", "90")
+ w.Header().Set("x-ratelimit-limit-tokens", "100")
+ w.Header().Set("x-ratelimit-reset-requests", "Wed, 01 Jan 2025 00:00:10 GMT")
_, _ = w.Write([]byte(`{"id":"c1","model":"m","choices":[{"finish_reason":"tool_calls","message":{"role":"assistant","content":"","tool_calls":[{"id":"call_1","type":"function","function":{"name":"read_file","arguments":"{\"file_path\":\"README.md\"}"}}]}}],"usage":{"prompt_tokens":10,"completion_tokens":3,"total_tokens":13}}`))
}))
defer srv.Close()
@@ -39,11 +66,17 @@ func TestAdapter_Complete_ChatCompletionsMapsToolCalls(t *testing.T) {
if len(resp.ToolCalls()) != 1 {
t.Fatalf("tool call mapping failed")
}
+ assertRateLimitInfo(t, resp.RateLimit)
}
func TestAdapter_Stream_EmitsFinishEvent(t *testing.T) {
srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "text/event-stream")
+ w.Header().Set("x-ratelimit-remaining-requests", "9")
+ w.Header().Set("x-ratelimit-limit-requests", "10")
+ w.Header().Set("x-ratelimit-remaining-tokens", "90")
+ w.Header().Set("x-ratelimit-limit-tokens", "100")
+ w.Header().Set("x-ratelimit-reset-requests", "Wed, 01 Jan 2025 00:00:10 GMT")
_, _ = w.Write([]byte("data: {\"id\":\"c2\",\"choices\":[{\"delta\":{\"content\":\"ok\"},\"finish_reason\":null}]}\n\n"))
_, _ = w.Write([]byte("data: {\"id\":\"c2\",\"choices\":[{\"delta\":{},\"finish_reason\":\"stop\"}],\"usage\":{\"prompt_tokens\":1,\"completion_tokens\":1,\"total_tokens\":2}}\n\n"))
_, _ = w.Write([]byte("data: [DONE]\n\n"))
@@ -61,16 +94,17 @@ func TestAdapter_Stream_EmitsFinishEvent(t *testing.T) {
}
defer stream.Close()
- sawFinish := false
+ var finish *llm.Response
for ev := range stream.Events() {
if ev.Type == llm.StreamEventFinish {
- sawFinish = true
+ finish = ev.Response
break
}
}
- if !sawFinish {
+ if finish == nil {
t.Fatalf("expected finish event")
}
+ assertRateLimitInfo(t, finish.RateLimit)
}
func TestAdapter_Stream_MapsToolCallDeltasToEventsAndFinalResponse(t *testing.T) {
diff --git a/internal/llm/rate_limit.go b/internal/llm/rate_limit.go
new file mode 100644
index 00000000..247a8ba3
--- /dev/null
+++ b/internal/llm/rate_limit.go
@@ -0,0 +1,164 @@
+package llm
+
+import (
+ "net/http"
+ "regexp"
+ "strconv"
+ "strings"
+ "time"
+)
+
+var firstIntRe = regexp.MustCompile(`[-+]?\d+`)
+
+// ParseRateLimitInfo extracts informational rate limit metadata from response headers.
+// The result is best-effort and intended for observability, not proactive throttling.
+func ParseRateLimitInfo(headers http.Header, now time.Time) *RateLimitInfo {
+ if headers == nil {
+ return nil
+ }
+
+ reqRemaining := parseHeaderInt(headers,
+ "x-ratelimit-remaining-requests",
+ "x-ratelimit-remaining-request",
+ )
+ reqLimit := parseHeaderInt(headers,
+ "x-ratelimit-limit-requests",
+ "x-ratelimit-limit-request",
+ )
+ tokRemaining := parseHeaderInt(headers,
+ "x-ratelimit-remaining-tokens",
+ "x-ratelimit-remaining-token",
+ )
+ tokLimit := parseHeaderInt(headers,
+ "x-ratelimit-limit-tokens",
+ "x-ratelimit-limit-token",
+ )
+
+ // Fallback headers that do not distinguish requests vs tokens.
+ if reqRemaining == nil && tokRemaining == nil {
+ reqRemaining = parseHeaderInt(headers,
+ "x-ratelimit-remaining",
+ "ratelimit-remaining",
+ )
+ }
+ if reqLimit == nil && tokLimit == nil {
+ reqLimit = parseHeaderInt(headers,
+ "x-ratelimit-limit",
+ "ratelimit-limit",
+ )
+ }
+
+ resetRaw := firstHeaderValue(headers,
+ "x-ratelimit-reset-requests",
+ "x-ratelimit-reset-request",
+ "x-ratelimit-reset-tokens",
+ "x-ratelimit-reset-token",
+ "x-ratelimit-reset",
+ "ratelimit-reset",
+ )
+ resetAt := parseRateLimitReset(resetRaw, now)
+
+ if reqRemaining == nil && reqLimit == nil && tokRemaining == nil && tokLimit == nil && resetAt == "" {
+ return nil
+ }
+ return &RateLimitInfo{
+ RequestsRemaining: reqRemaining,
+ RequestsLimit: reqLimit,
+ TokensRemaining: tokRemaining,
+ TokensLimit: tokLimit,
+ ResetAt: resetAt,
+ }
+}
+
+func firstHeaderValue(headers http.Header, keys ...string) string {
+ for _, key := range keys {
+ v := strings.TrimSpace(headers.Get(key))
+ if v != "" {
+ return v
+ }
+ }
+ return ""
+}
+
+func parseHeaderInt(headers http.Header, keys ...string) *int {
+ for _, key := range keys {
+ if n, ok := parseIntLikeHeaderValue(headers.Get(key)); ok {
+ return &n
+ }
+ }
+ return nil
+}
+
+func parseIntLikeHeaderValue(v string) (int, bool) {
+ v = strings.TrimSpace(v)
+ if v == "" {
+ return 0, false
+ }
+ if i, err := strconv.Atoi(v); err == nil {
+ return i, true
+ }
+ token := v
+ if idx := strings.IndexAny(token, ",;"); idx >= 0 {
+ token = strings.TrimSpace(token[:idx])
+ if i, err := strconv.Atoi(token); err == nil {
+ return i, true
+ }
+ }
+ if f, err := strconv.ParseFloat(token, 64); err == nil {
+ return int(f), true
+ }
+ if m := firstIntRe.FindString(v); m != "" {
+ if i, err := strconv.Atoi(m); err == nil {
+ return i, true
+ }
+ }
+ return 0, false
+}
+
+func parseRateLimitReset(v string, now time.Time) string {
+ v = strings.TrimSpace(v)
+ if v == "" {
+ return ""
+ }
+ if t, err := http.ParseTime(v); err == nil {
+ return t.UTC().Format(time.RFC3339)
+ }
+ if d, err := time.ParseDuration(v); err == nil {
+ if d < 0 {
+ d = 0
+ }
+ return now.Add(d).UTC().Format(time.RFC3339)
+ }
+ if f, ok := parseFloatLikeHeaderValue(v); ok {
+ switch {
+ case f >= 1e12:
+ // Unix epoch in milliseconds.
+ return time.UnixMilli(int64(f)).UTC().Format(time.RFC3339)
+ case f >= 1e9:
+ // Unix epoch in seconds.
+ return time.Unix(int64(f), 0).UTC().Format(time.RFC3339)
+ case f >= 0:
+ // Relative seconds.
+ return now.Add(time.Duration(f * float64(time.Second))).UTC().Format(time.RFC3339)
+ }
+ }
+ return ""
+}
+
+func parseFloatLikeHeaderValue(v string) (float64, bool) {
+ v = strings.TrimSpace(v)
+ if v == "" {
+ return 0, false
+ }
+ if f, err := strconv.ParseFloat(v, 64); err == nil {
+ return f, true
+ }
+ token := v
+ if idx := strings.IndexAny(token, ",;"); idx >= 0 {
+ token = strings.TrimSpace(token[:idx])
+ if f, err := strconv.ParseFloat(token, 64); err == nil {
+ return f, true
+ }
+ }
+ return 0, false
+}
diff --git a/internal/llm/rate_limit_test.go b/internal/llm/rate_limit_test.go
new file mode 100644
index 00000000..8c709351
--- /dev/null
+++ b/internal/llm/rate_limit_test.go
@@ -0,0 +1,98 @@
+package llm
+
+import (
+ "net/http"
+ "testing"
+ "time"
+)
+
+func TestParseRateLimitInfo_ProviderHeaders(t *testing.T) {
+ h := http.Header{}
+ h.Set("x-ratelimit-remaining-requests", "9")
+ h.Set("x-ratelimit-limit-requests", "10")
+ h.Set("x-ratelimit-remaining-tokens", "90")
+ h.Set("x-ratelimit-limit-tokens", "100")
+ h.Set("x-ratelimit-reset-requests", "Wed, 01 Jan 2025 00:00:10 GMT")
+
+ got := ParseRateLimitInfo(h, time.Date(2025, 1, 1, 0, 0, 0, 0, time.UTC))
+ if got == nil {
+ t.Fatalf("expected rate limit info")
+ }
+ if got.RequestsRemaining == nil || *got.RequestsRemaining != 9 {
+ t.Fatalf("requests_remaining: %#v", got.RequestsRemaining)
+ }
+ if got.RequestsLimit == nil || *got.RequestsLimit != 10 {
+ t.Fatalf("requests_limit: %#v", got.RequestsLimit)
+ }
+ if got.TokensRemaining == nil || *got.TokensRemaining != 90 {
+ t.Fatalf("tokens_remaining: %#v", got.TokensRemaining)
+ }
+ if got.TokensLimit == nil || *got.TokensLimit != 100 {
+ t.Fatalf("tokens_limit: %#v", got.TokensLimit)
+ }
+ if got.ResetAt != "2025-01-01T00:00:10Z" {
+ t.Fatalf("reset_at: %q", got.ResetAt)
+ }
+}
+
+func TestParseRateLimitInfo_FallbackHeaders(t *testing.T) {
+ now := time.Date(2025, 1, 1, 0, 0, 0, 0, time.UTC)
+ h := http.Header{}
+ h.Set("x-ratelimit-remaining", "5")
+ h.Set("x-ratelimit-limit", "8")
+ h.Set("ratelimit-reset", "5")
+
+ got := ParseRateLimitInfo(h, now)
+ if got == nil {
+ t.Fatalf("expected rate limit info")
+ }
+ if got.RequestsRemaining == nil || *got.RequestsRemaining != 5 {
+ t.Fatalf("requests_remaining: %#v", got.RequestsRemaining)
+ }
+ if got.RequestsLimit == nil || *got.RequestsLimit != 8 {
+ t.Fatalf("requests_limit: %#v", got.RequestsLimit)
+ }
+ if got.TokensRemaining != nil {
+ t.Fatalf("tokens_remaining: %#v", got.TokensRemaining)
+ }
+ if got.TokensLimit != nil {
+ t.Fatalf("tokens_limit: %#v", got.TokensLimit)
+ }
+ if got.ResetAt != "2025-01-01T00:00:05Z" {
+ t.Fatalf("reset_at: %q", got.ResetAt)
+ }
+}
+
+func TestParseRateLimitInfo_ResetFormats(t *testing.T) {
+ now := time.Date(2025, 1, 1, 0, 0, 0, 0, time.UTC)
+ cases := []struct {
+ name string
+ value string
+ expect string
+ }{
+ {name: "duration", value: "2s", expect: "2025-01-01T00:00:02Z"},
+ {name: "epoch_seconds", value: "1735689610", expect: "2025-01-01T00:00:10Z"},
+ {name: "epoch_millis", value: "1735689610000", expect: "2025-01-01T00:00:10Z"},
+ {name: "relative_seconds_float", value: "1.5", expect: "2025-01-01T00:00:01Z"},
+ }
+ for _, tc := range cases {
+ t.Run(tc.name, func(t *testing.T) {
+ h := http.Header{}
+ h.Set("x-ratelimit-reset", tc.value)
+ got := ParseRateLimitInfo(h, now)
+ if got == nil {
+ t.Fatalf("expected rate limit info")
+ }
+ if got.ResetAt != tc.expect {
+ t.Fatalf("reset_at: got %q want %q", got.ResetAt, tc.expect)
+ }
+ })
+ }
+}
+
+func TestParseRateLimitInfo_Empty(t *testing.T) {
+ got := ParseRateLimitInfo(http.Header{}, time.Now())
+ if got != nil {
+ t.Fatalf("expected nil, got %#v", got)
+ }
+}
diff --git a/internal/llm/stream.go b/internal/llm/stream.go
index e8f41d89..42584d15 100644
--- a/internal/llm/stream.go
+++ b/internal/llm/stream.go
@@ -29,12 +29,19 @@ const (
type StreamEvent struct {
Type StreamEventType `json:"type"`
+ // Stream start metadata
+ ID string `json:"id,omitempty"`
+ Model string `json:"model,omitempty"`
+ Warnings []Warning `json:"warnings,omitempty"`
+
// Text events
Delta string `json:"delta,omitempty"`
TextID string `json:"text_id,omitempty"`
// Reasoning events
ReasoningDelta string `json:"reasoning_delta,omitempty"`
+ ReasoningID string `json:"reasoning_id,omitempty"`
+ Redacted *bool `json:"redacted,omitempty"`
// Tool call events
ToolCall *ToolCallData `json:"tool_call,omitempty"`
@@ -48,5 +55,6 @@ type StreamEvent struct {
Err error `json:"-"`
// Passthrough
- Raw map[string]any `json:"raw,omitempty"`
+ EventType string `json:"event_type,omitempty"`
+ Raw map[string]any `json:"raw,omitempty"`
}
diff --git a/internal/llm/stream_accumulator.go b/internal/llm/stream_accumulator.go
index 6dae18b0..efa95bcc 100644
--- a/internal/llm/stream_accumulator.go
+++ b/internal/llm/stream_accumulator.go
@@ -1,22 +1,46 @@
package llm
-import "strings"
+import (
+ "encoding/json"
+ "strconv"
+ "strings"
+)
// StreamAccumulator collects StreamEvent values and produces a complete Response.
// It primarily exists to bridge streaming mode back to code that expects a Response.
type StreamAccumulator struct {
- textByID map[string]*strings.Builder
- textOrder []string
- finish *FinishReason
- usage *Usage
- final *Response
- partial *Response
+ textByID map[string]*strings.Builder
+ toolByID map[string]*toolCallStreamState
+ contentLog []streamContentRef
+ seenParts map[string]struct{}
+
+ finish *FinishReason
+ usage *Usage
+ final *Response
+ partial *Response
+ nextToolID int
+}
+
+type streamContentRef struct {
+ kind string
+ id string
+}
+
+type toolCallStreamState struct {
+ id string
+ name string
+ typ string
+ args strings.Builder
+ sawDelta bool
}
func NewStreamAccumulator() *StreamAccumulator {
return &StreamAccumulator{
- textByID: map[string]*strings.Builder{},
- textOrder: nil,
+ textByID: map[string]*strings.Builder{},
+ toolByID: map[string]*toolCallStreamState{},
+ contentLog: nil,
+ seenParts: map[string]struct{}{},
+ nextToolID: 1,
}
}
@@ -26,29 +50,41 @@ func (a *StreamAccumulator) Process(ev StreamEvent) {
}
switch ev.Type {
case StreamEventTextStart:
- id := strings.TrimSpace(ev.TextID)
- if id == "" {
- id = "text_0"
- }
- if _, ok := a.textByID[id]; !ok {
- a.textByID[id] = &strings.Builder{}
- a.textOrder = append(a.textOrder, id)
- }
+ _ = a.ensureText(strings.TrimSpace(ev.TextID))
case StreamEventTextDelta:
- id := strings.TrimSpace(ev.TextID)
- if id == "" {
- id = "text_0"
- }
- b, ok := a.textByID[id]
- if !ok {
- b = &strings.Builder{}
- a.textByID[id] = b
- a.textOrder = append(a.textOrder, id)
- }
+ b := a.ensureText(strings.TrimSpace(ev.TextID))
if ev.Delta != "" {
b.WriteString(ev.Delta)
a.partial = a.buildResponse()
}
+ case StreamEventToolCallStart:
+ tc := a.ensureToolCall(ev.ToolCall)
+ if tc == nil {
+ return
+ }
+ if !tc.sawDelta && tc.args.Len() == 0 && len(ev.ToolCall.Arguments) > 0 {
+ tc.args.Write(ev.ToolCall.Arguments)
+ }
+ a.partial = a.buildResponse()
+ case StreamEventToolCallDelta:
+ tc := a.ensureToolCall(ev.ToolCall)
+ if tc == nil {
+ return
+ }
+ if len(ev.ToolCall.Arguments) > 0 {
+ tc.sawDelta = true
+ tc.args.Write(ev.ToolCall.Arguments)
+ }
+ a.partial = a.buildResponse()
+ case StreamEventToolCallEnd:
+ tc := a.ensureToolCall(ev.ToolCall)
+ if tc == nil {
+ return
+ }
+ if !tc.sawDelta && tc.args.Len() == 0 && len(ev.ToolCall.Arguments) > 0 {
+ tc.args.Write(ev.ToolCall.Arguments)
+ }
+ a.partial = a.buildResponse()
case StreamEventFinish:
a.finish = ev.FinishReason
a.usage = ev.Usage
@@ -91,13 +127,43 @@ func (a *StreamAccumulator) buildResponse() *Response {
if a == nil {
return nil
}
- var b strings.Builder
- for _, id := range a.textOrder {
- if tb := a.textByID[id]; tb != nil {
- b.WriteString(tb.String())
+ content := make([]ContentPart, 0, len(a.contentLog))
+ for _, ref := range a.contentLog {
+ switch ref.kind {
+ case string(ContentText):
+ if tb := a.textByID[ref.id]; tb != nil {
+ txt := tb.String()
+ if txt != "" {
+ content = append(content, ContentPart{Kind: ContentText, Text: txt})
+ }
+ }
+ case string(ContentToolCall):
+ tc := a.toolByID[ref.id]
+ if tc == nil || strings.TrimSpace(tc.id) == "" || strings.TrimSpace(tc.name) == "" {
+ continue
+ }
+ args := strings.TrimSpace(tc.args.String())
+ var raw json.RawMessage
+ if args != "" {
+ raw = json.RawMessage(args)
+ }
+ typ := strings.TrimSpace(tc.typ)
+ if typ == "" {
+ typ = "function"
+ }
+ call := ToolCallData{
+ ID: tc.id,
+ Name: tc.name,
+ Type: typ,
+ Arguments: raw,
+ }
+ content = append(content, ContentPart{Kind: ContentToolCall, ToolCall: &call})
}
}
- msg := Message{Role: RoleAssistant, Content: []ContentPart{{Kind: ContentText, Text: b.String()}}}
+ if len(content) == 0 {
+ content = []ContentPart{{Kind: ContentText, Text: ""}}
+ }
+ msg := Message{Role: RoleAssistant, Content: content}
r := &Response{Message: msg}
if a.finish != nil {
r.Finish = *a.finish
@@ -107,3 +173,49 @@ func (a *StreamAccumulator) buildResponse() *Response {
}
return r
}
+
+func (a *StreamAccumulator) ensureText(id string) *strings.Builder {
+ if strings.TrimSpace(id) == "" {
+ id = "text_0"
+ }
+ b, ok := a.textByID[id]
+ if !ok {
+ b = &strings.Builder{}
+ a.textByID[id] = b
+ a.recordContent(string(ContentText), id)
+ }
+ return b
+}
+
+func (a *StreamAccumulator) ensureToolCall(call *ToolCallData) *toolCallStreamState {
+ if call == nil {
+ return nil
+ }
+ id := strings.TrimSpace(call.ID)
+ if id == "" {
+ id = "tool_call_" + strconv.Itoa(a.nextToolID)
+ a.nextToolID++
+ }
+ tc, ok := a.toolByID[id]
+ if !ok {
+ tc = &toolCallStreamState{id: id}
+ a.toolByID[id] = tc
+ a.recordContent(string(ContentToolCall), id)
+ }
+ if name := strings.TrimSpace(call.Name); name != "" {
+ tc.name = name
+ }
+ if typ := strings.TrimSpace(call.Type); typ != "" {
+ tc.typ = typ
+ }
+ return tc
+}
+
+func (a *StreamAccumulator) recordContent(kind, id string) {
+ key := kind + ":" + id
+ if _, ok := a.seenParts[key]; ok {
+ return
+ }
+ a.seenParts[key] = struct{}{}
+ a.contentLog = append(a.contentLog, streamContentRef{kind: kind, id: id})
+}
diff --git a/internal/llm/stream_accumulator_test.go b/internal/llm/stream_accumulator_test.go
index 883c8a0b..3aceb60a 100644
--- a/internal/llm/stream_accumulator_test.go
+++ b/internal/llm/stream_accumulator_test.go
@@ -1,6 +1,9 @@
package llm
-import "testing"
+import (
+ "encoding/json"
+ "testing"
+)
func TestStreamAccumulator_FinishWithResponse_UsesIt(t *testing.T) {
acc := NewStreamAccumulator()
@@ -52,3 +55,51 @@ func TestStreamAccumulator_NoFinishResponse_BuildsFromText(t *testing.T) {
t.Fatalf("usage: %+v", got.Usage)
}
}
+
+func TestStreamAccumulator_NoFinishResponse_BuildsToolCalls(t *testing.T) {
+ acc := NewStreamAccumulator()
+ acc.Process(StreamEvent{
+ Type: StreamEventToolCallStart,
+ ToolCall: &ToolCallData{
+ ID: "call_1",
+ Name: "write_file",
+ Type: "function",
+ },
+ })
+ acc.Process(StreamEvent{
+ Type: StreamEventToolCallDelta,
+ ToolCall: &ToolCallData{
+ ID: "call_1",
+ Name: "write_file",
+ Type: "function",
+ Arguments: json.RawMessage(`{"path":"a.txt"}`),
+ },
+ })
+ // Some providers send args again on TOOL_CALL_END; accumulator should avoid duplicating.
+ acc.Process(StreamEvent{
+ Type: StreamEventToolCallEnd,
+ ToolCall: &ToolCallData{
+ ID: "call_1",
+ Name: "write_file",
+ Type: "function",
+ Arguments: json.RawMessage(`{"path":"a.txt"}`),
+ },
+ })
+ finish := FinishReason{Reason: FinishReasonToolCalls}
+ acc.Process(StreamEvent{Type: StreamEventFinish, FinishReason: &finish})
+
+ got := acc.Response()
+ if got == nil {
+ t.Fatalf("expected response")
+ }
+ calls := got.ToolCalls()
+ if len(calls) != 1 {
+ t.Fatalf("tool calls: got %d want 1 (%+v)", len(calls), calls)
+ }
+ if calls[0].ID != "call_1" || calls[0].Name != "write_file" {
+ t.Fatalf("tool call identity mismatch: %+v", calls[0])
+ }
+ if string(calls[0].Arguments) != `{"path":"a.txt"}` {
+ t.Fatalf("tool call args mismatch: %q", string(calls[0].Arguments))
+ }
+}
diff --git a/internal/llm/types.go b/internal/llm/types.go
index 634d8227..cd23be52 100644
--- a/internal/llm/types.go
+++ b/internal/llm/types.go
@@ -94,6 +94,9 @@ type ContentPart struct {
ToolCall *ToolCallData `json:"tool_call,omitempty"`
ToolResult *ToolResultData `json:"tool_result,omitempty"`
Thinking *ThinkingData `json:"thinking,omitempty"`
+
+ // Data carries provider-specific payload for custom content kinds.
+ Data any `json:"data,omitempty"`
}
type ImageData struct {
diff --git a/internal/llmclient/env.go b/internal/llmclient/env.go
index 3df1b0bc..8b7142f1 100644
--- a/internal/llmclient/env.go
+++ b/internal/llmclient/env.go
@@ -3,6 +3,7 @@ package llmclient
import (
"github.com/danshapiro/kilroy/internal/llm"
_ "github.com/danshapiro/kilroy/internal/llm/providers/anthropic"
+ _ "github.com/danshapiro/kilroy/internal/llm/providers/codexappserver"
_ "github.com/danshapiro/kilroy/internal/llm/providers/google"
_ "github.com/danshapiro/kilroy/internal/llm/providers/openai"
)
diff --git a/internal/llmclient/env_test.go b/internal/llmclient/env_test.go
index 65ae3f06..27aabc93 100644
--- a/internal/llmclient/env_test.go
+++ b/internal/llmclient/env_test.go
@@ -7,8 +7,30 @@ func TestNewFromEnv_ErrorsWhenNoProvidersConfigured(t *testing.T) {
t.Setenv("ANTHROPIC_API_KEY", "")
t.Setenv("GEMINI_API_KEY", "")
t.Setenv("GOOGLE_API_KEY", "")
+ t.Setenv("CODEX_APP_SERVER_COMMAND", "")
+ t.Setenv("CODEX_APP_SERVER_ARGS", "")
+ t.Setenv("CODEX_APP_SERVER_COMMAND_ARGS", "")
+ t.Setenv("CODEX_APP_SERVER_AUTO_DISCOVER", "")
_, err := NewFromEnv()
if err == nil {
t.Fatalf("expected error, got nil")
}
}
+func TestNewFromEnv_RegistersCodexAppServerWhenCommandOverrideIsSet(t *testing.T) {
+ t.Setenv("OPENAI_API_KEY", "")
+ t.Setenv("ANTHROPIC_API_KEY", "")
+ t.Setenv("GEMINI_API_KEY", "")
+ t.Setenv("GOOGLE_API_KEY", "")
+ t.Setenv("CODEX_APP_SERVER_COMMAND", "codex")
+ t.Setenv("CODEX_APP_SERVER_ARGS", "")
+ t.Setenv("CODEX_APP_SERVER_COMMAND_ARGS", "")
+ t.Setenv("CODEX_APP_SERVER_AUTO_DISCOVER", "")
+ c, err := NewFromEnv()
+ if err != nil {
+ t.Fatalf("NewFromEnv: %v", err)
+ }
+ names := c.ProviderNames()
+ if len(names) != 1 || names[0] != "codex-app-server" {
+ t.Fatalf("provider names: got %v want [codex-app-server]", names)
+ }
+}
diff --git a/internal/providerspec/builtin.go b/internal/providerspec/builtin.go
index 789a7f1d..450a51fd 100644
--- a/internal/providerspec/builtin.go
+++ b/internal/providerspec/builtin.go
@@ -19,6 +19,16 @@ var builtinSpecs = map[string]Spec{
CapabilityAll: []string{"--json"},
},
},
+ "codex-app-server": {
+ Key: "codex-app-server",
+ Aliases: []string{"codex_app_server"},
+ API: &APISpec{
+ Protocol: ProtocolCodexAppServer,
+ DefaultAPIKeyEnv: "",
+ ProviderOptionsKey: "codex_app_server",
+ ProfileFamily: "codex-app-server",
+ },
+ },
"anthropic": {
Key: "anthropic",
API: &APISpec{
diff --git a/internal/providerspec/spec.go b/internal/providerspec/spec.go
index 63d53161..dd388401 100644
--- a/internal/providerspec/spec.go
+++ b/internal/providerspec/spec.go
@@ -12,6 +12,7 @@ const (
ProtocolOpenAIChatCompletions APIProtocol = "openai_chat_completions"
ProtocolAnthropicMessages APIProtocol = "anthropic_messages"
ProtocolGoogleGenerateContent APIProtocol = "google_generate_content"
+ ProtocolCodexAppServer APIProtocol = "codex_app_server"
)
type APISpec struct {
diff --git a/internal/providerspec/spec_test.go b/internal/providerspec/spec_test.go
index 2788bad8..1affea88 100644
--- a/internal/providerspec/spec_test.go
+++ b/internal/providerspec/spec_test.go
@@ -4,7 +4,7 @@ import "testing"
func TestBuiltinSpecsIncludeCoreAndNewProviders(t *testing.T) {
s := Builtins()
- for _, key := range []string{"openai", "anthropic", "google", "kimi", "zai", "cerebras", "minimax", "inception"} {
+ for _, key := range []string{"openai", "codex-app-server", "anthropic", "google", "kimi", "zai", "cerebras", "minimax", "inception"} {
if _, ok := s[key]; !ok {
t.Fatalf("missing builtin provider %q", key)
}
@@ -33,6 +33,9 @@ func TestCanonicalProviderKey_Aliases(t *testing.T) {
if got := CanonicalProviderKey("minimax-ai"); got != "minimax" {
t.Fatalf("minimax-ai alias: got %q want %q", got, "minimax")
}
+ if got := CanonicalProviderKey("codex_app_server"); got != "codex-app-server" {
+ t.Fatalf("codex_app_server alias: got %q want %q", got, "codex-app-server")
+ }
if got := CanonicalProviderKey("inceptionlabs"); got != "inception" {
t.Fatalf("inceptionlabs alias: got %q want %q", got, "inception")
}
@@ -44,6 +47,28 @@ func TestCanonicalProviderKey_Aliases(t *testing.T) {
}
}
+func TestBuiltinCodexAppServerDefaults(t *testing.T) {
+ spec, ok := Builtin("codex-app-server")
+ if !ok {
+ t.Fatalf("expected codex-app-server builtin")
+ }
+ if spec.API == nil {
+ t.Fatalf("expected codex-app-server api spec")
+ }
+ if got := spec.API.Protocol; got != ProtocolCodexAppServer {
+ t.Fatalf("codex-app-server protocol: got %q want %q", got, ProtocolCodexAppServer)
+ }
+ if got := spec.API.DefaultAPIKeyEnv; got != "" {
+ t.Fatalf("codex-app-server api_key_env: got %q want empty", got)
+ }
+ if got := spec.API.ProviderOptionsKey; got != "codex_app_server" {
+ t.Fatalf("codex-app-server provider_options_key: got %q want %q", got, "codex_app_server")
+ }
+ if got := spec.API.ProfileFamily; got != "codex-app-server" {
+ t.Fatalf("codex-app-server profile_family: got %q want %q", got, "codex-app-server")
+ }
+}
+
func TestBuiltinCerebrasDefaultsToOpenAICompatAPI(t *testing.T) {
spec, ok := Builtin("cerebras")
if !ok {
diff --git a/scripts/refresh-openrouter-provider-catalog.sh b/scripts/refresh-openrouter-provider-catalog.sh
new file mode 100755
index 00000000..07947bdc
--- /dev/null
+++ b/scripts/refresh-openrouter-provider-catalog.sh
@@ -0,0 +1,164 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+ROOT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
+PINNED_PATH="${ROOT_DIR}/internal/attractor/modeldb/pinned/openrouter_models.json"
+PROVIDER_REGEX='^(openai|anthropic|google)/'
+LIVE_URL="https://openrouter.ai/api/v1/models"
+DRY_RUN=0
+
+usage() {
+ cat <<'USAGE'
+Usage:
+ scripts/refresh-openrouter-provider-catalog.sh [--dry-run] [--providers-regex ] [--pinned ]
+
+Description:
+ Refreshes provider entries in the pinned OpenRouter catalog from live data.
+ By default, refreshes:
+ - openai/*
+ - anthropic/*
+ - google/*
+
+Options:
+ --dry-run Show planned changes, do not write file
+ --providers-regex Provider ID regex (default: ^(openai|anthropic|google)/)
+ --pinned Path to pinned catalog JSON file
+USAGE
+}
+
+while [[ $# -gt 0 ]]; do
+ case "$1" in
+ --dry-run)
+ DRY_RUN=1
+ shift
+ ;;
+ --providers-regex)
+ PROVIDER_REGEX="${2:-}"
+ if [[ -z "${PROVIDER_REGEX}" ]]; then
+ echo "error: --providers-regex requires a value" >&2
+ exit 1
+ fi
+ shift 2
+ ;;
+ --pinned)
+ PINNED_PATH="${2:-}"
+ if [[ -z "${PINNED_PATH}" ]]; then
+ echo "error: --pinned requires a value" >&2
+ exit 1
+ fi
+ shift 2
+ ;;
+ -h|--help)
+ usage
+ exit 0
+ ;;
+ *)
+ echo "error: unknown argument: $1" >&2
+ usage >&2
+ exit 1
+ ;;
+ esac
+done
+
+if [[ ! -f "${PINNED_PATH}" ]]; then
+ echo "error: pinned catalog not found: ${PINNED_PATH}" >&2
+ exit 1
+fi
+
+if ! command -v jq >/dev/null 2>&1; then
+ echo "error: jq is required" >&2
+ exit 1
+fi
+
+if ! command -v curl >/dev/null 2>&1; then
+ echo "error: curl is required" >&2
+ exit 1
+fi
+
+tmp_live="$(mktemp)"
+tmp_new="$(mktemp)"
+tmp_old_ids="$(mktemp)"
+tmp_live_ids="$(mktemp)"
+tmp_added="$(mktemp)"
+tmp_removed="$(mktemp)"
+trap 'rm -f "$tmp_live" "$tmp_new" "$tmp_old_ids" "$tmp_live_ids" "$tmp_added" "$tmp_removed"' EXIT
+
+curl -fsSL "${LIVE_URL}" > "${tmp_live}"
+
+jq -e '.data and (.data | type == "array")' "${tmp_live}" >/dev/null
+jq -e '.data and (.data | type == "array")' "${PINNED_PATH}" >/dev/null
+
+jq -r --arg re "${PROVIDER_REGEX}" '.data[] | select(.id | test($re)) | .id' "${PINNED_PATH}" | sort > "${tmp_old_ids}"
+jq -r --arg re "${PROVIDER_REGEX}" '.data[] | select(.id | test($re)) | .id' "${tmp_live}" | sort > "${tmp_live_ids}"
+
+comm -13 "${tmp_old_ids}" "${tmp_live_ids}" > "${tmp_added}"
+comm -23 "${tmp_old_ids}" "${tmp_live_ids}" > "${tmp_removed}"
+
+jq --slurpfile live "${tmp_live}" --arg re "${PROVIDER_REGEX}" '
+ ($live[0].data
+ | map(select(.id | test($re)))
+ | map({key: .id, value: .})
+ | from_entries) as $freshByID
+ | ($freshByID | keys) as $freshIDs
+ | (.data
+ | map(
+ if (.id | test($re)) then
+ ($freshByID[.id] // empty)
+ else
+ .
+ end
+ )) as $replaced
+ | ($replaced
+ | map(.id)
+ | map({key: ., value: true})
+ | from_entries) as $present
+ | ($freshIDs
+ | map(select($present[.] | not) | $freshByID[.])) as $missing
+ | .data = ($replaced + $missing)
+' "${PINNED_PATH}" > "${tmp_new}"
+
+dups="$(jq -r '.data[].id' "${tmp_new}" | sort | uniq -d)"
+if [[ -n "${dups}" ]]; then
+ echo "error: duplicate model IDs introduced:" >&2
+ echo "${dups}" >&2
+ exit 1
+fi
+
+old_total="$(jq '.data | length' "${PINNED_PATH}")"
+new_total="$(jq '.data | length' "${tmp_new}")"
+old_provider_total="$(wc -l < "${tmp_old_ids}" | tr -d ' ')"
+new_provider_total="$(wc -l < "${tmp_live_ids}" | tr -d ' ')"
+
+echo "Pinned total models: ${old_total} -> ${new_total}"
+echo "Target-provider models: ${old_provider_total} -> ${new_provider_total}"
+echo "Providers regex: ${PROVIDER_REGEX}"
+echo
+
+echo "Added IDs:"
+if [[ -s "${tmp_added}" ]]; then
+ cat "${tmp_added}"
+else
+ echo "(none)"
+fi
+echo
+
+echo "Removed IDs:"
+if [[ -s "${tmp_removed}" ]]; then
+ cat "${tmp_removed}"
+else
+ echo "(none)"
+fi
+echo
+
+if cmp -s "${PINNED_PATH}" "${tmp_new}"; then
+ echo "No changes detected."
+ exit 0
+fi
+
+if [[ "${DRY_RUN}" == "1" ]]; then
+ echo "Dry run only, no file was written."
+ exit 0
+fi
+
+mv "${tmp_new}" "${PINNED_PATH}"
+echo "Updated ${PINNED_PATH}"