Skip to content

Commit 3115fbd

Browse files
mcheemaaphantom
authored andcommitted
agent: pass adaptive thinking config to SDK query() (ghostwright#104)
* agent: pass adaptive thinking config to SDK query() Opus 4.7 (and likely future models) reject the legacy `thinking.type: enabled` shape in API requests. Without an explicit `thinking` option on the SDK query(), the bundled CLI falls back to that legacy shape and the API returns a 400: invalid_request_error: "thinking.type.enabled" is not supported for this model. Use "thinking.type.adaptive" and "output_config.effort" to control thinking behavior. Pass `thinking: { type: "adaptive" }` from chat-query.ts and judge-query.ts so the SDK forwards the supported shape on every query, regardless of which Opus or Sonnet variant is configured. The existing `effort` option continues to control thinking depth under the adaptive contract. * agent: pass adaptive thinking on AgentRuntime query (Codex P1) Round 1 of ghostwright#104 review caught the runtime.ts handleMessage call site was untouched, leaving inbound Slack/trigger/scheduler/MCP requests on Opus 4.7 still defaulting to the legacy thinking.type.enabled shape that returns 400 invalid_request_error. Mirror the explicit adaptive shape on this third call site so every entry point that reaches the SDK query() forwards the supported request shape. Also patch reflection-subprocess.ts, the fourth query() site, which runs on the same Opus tier during memory drains and would 400 the same way without the adaptive thinking option. * agent: model-aware thinking config (Codex round 2 P1) Round 2 found unconditional adaptive thinking breaks Haiku 4.5 in the reflection subprocess (and any future haiku-tier callsite). Adaptive is supported on Sonnet 4.6 and Opus 4.7; Haiku 4.5 needs the legacy enabled + budget_tokens shape. The fix replaces the four scattered adaptive-stamp lines with a single-source-of-truth getThinkingConfig(model) helper so every call site picks the right shape based on its model. Verified end-to-end against the live Anthropic API on opus-4-7, sonnet-4-6, and haiku-4-5: opus rejects enabled (400), haiku rejects adaptive (400), sonnet accepts both. The helper maps each model family to the shape the API actually accepts, defaulting unknown models to adaptive because every model since Opus 4.7 has been adaptive-only. Helper is covered by 12 new unit tests against the full matrix; chat, judge, runtime, and reflection callsites now spread it instead of hard-coding a literal.
1 parent f7e18aa commit 3115fbd

6 files changed

Lines changed: 147 additions & 0 deletions

File tree

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
// Coverage for every cell of the model x thinking-shape matrix the live
2+
// Messages API enforces (verified 2026-04-29). The helper is the single
3+
// source of truth for SDK `query()` thinking options across chat, judge,
4+
// runtime, and reflection callsites.
5+
6+
import { describe, expect, test } from "bun:test";
7+
import { JUDGE_MODEL_HAIKU, JUDGE_MODEL_OPUS, JUDGE_MODEL_SONNET } from "../../evolution/judge-models.ts";
8+
import { getThinkingConfig } from "../thinking-config.ts";
9+
10+
describe("getThinkingConfig", () => {
11+
test("Opus 4.7 returns adaptive (Opus 4.7 rejects manual enabled with 400)", () => {
12+
expect(getThinkingConfig(JUDGE_MODEL_OPUS)).toEqual({ type: "adaptive" });
13+
expect(getThinkingConfig("claude-opus-4-7")).toEqual({ type: "adaptive" });
14+
});
15+
16+
test("Opus 4.6 returns adaptive (recommended; manual is deprecated)", () => {
17+
expect(getThinkingConfig("claude-opus-4-6")).toEqual({ type: "adaptive" });
18+
});
19+
20+
test("Sonnet 4.6 returns adaptive (recommended; manual still functional)", () => {
21+
expect(getThinkingConfig(JUDGE_MODEL_SONNET)).toEqual({ type: "adaptive" });
22+
expect(getThinkingConfig("claude-sonnet-4-6")).toEqual({ type: "adaptive" });
23+
});
24+
25+
test("Mythos preview returns adaptive", () => {
26+
expect(getThinkingConfig("claude-mythos-preview")).toEqual({ type: "adaptive" });
27+
});
28+
29+
test("Haiku 4.5 returns enabled + budgetTokens (Haiku rejects adaptive with 400)", () => {
30+
const config = getThinkingConfig(JUDGE_MODEL_HAIKU);
31+
expect(config.type).toBe("enabled");
32+
if (config.type === "enabled") {
33+
expect(config.budgetTokens).toBeGreaterThan(0);
34+
}
35+
});
36+
37+
test("older Haiku 3.x returns enabled + budgetTokens", () => {
38+
const config = getThinkingConfig("claude-haiku-3-5");
39+
expect(config.type).toBe("enabled");
40+
});
41+
42+
test("older Sonnet 3.x returns enabled + budgetTokens", () => {
43+
const config = getThinkingConfig("claude-sonnet-3-7");
44+
expect(config.type).toBe("enabled");
45+
});
46+
47+
test("legacy Opus 4.5 returns enabled + budgetTokens", () => {
48+
const config = getThinkingConfig("claude-opus-4-5");
49+
expect(config.type).toBe("enabled");
50+
});
51+
52+
test("undefined model defaults to adaptive (safe for all new models)", () => {
53+
expect(getThinkingConfig(undefined)).toEqual({ type: "adaptive" });
54+
});
55+
56+
test("null model defaults to adaptive", () => {
57+
expect(getThinkingConfig(null)).toEqual({ type: "adaptive" });
58+
});
59+
60+
test("empty string defaults to adaptive", () => {
61+
expect(getThinkingConfig("")).toEqual({ type: "adaptive" });
62+
});
63+
64+
test("unknown future model defaults to adaptive", () => {
65+
// Every new model since Opus 4.7 has been adaptive-only, so when
66+
// we do not recognise the prefix we send adaptive. A wrong guess
67+
// returns a clear 400 with the required shape, which is preferable
68+
// to silent breakage in reflection.
69+
expect(getThinkingConfig("claude-future-model-2027")).toEqual({ type: "adaptive" });
70+
});
71+
72+
test("provider-prefixed names still match by suffix-free comparison fail-safe", () => {
73+
// Some operators set `model: "anthropic/claude-haiku-4-5"` via
74+
// LiteLLM. The helper currently does prefix-match on the bare
75+
// Anthropic id. If a slash-prefix is used, we fall through to
76+
// adaptive default, which is the safer of the two failure modes
77+
// (adaptive will 400 with a clear error on Haiku rather than
78+
// silently downgrading thinking).
79+
expect(getThinkingConfig("anthropic/claude-haiku-4-5")).toEqual({ type: "adaptive" });
80+
});
81+
82+
test("returned object is a fresh value (callers may spread it)", () => {
83+
const a = getThinkingConfig(JUDGE_MODEL_OPUS);
84+
const b = getThinkingConfig(JUDGE_MODEL_OPUS);
85+
expect(a).toEqual(b);
86+
});
87+
});

src/agent/chat-query.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ import { extractCost, extractTextFromMessage } from "./message-utils.ts";
1818
import { permissionOptionsFromConfig } from "./permission-options.ts";
1919
import { assemblePrompt } from "./prompt-assembler.ts";
2020
import type { Session, SessionStore } from "./session-store.ts";
21+
import { getThinkingConfig } from "./thinking-config.ts";
2122

2223
export type ChatQueryDeps = {
2324
config: PhantomConfig;
@@ -106,6 +107,7 @@ export async function executeChatQuery(
106107
},
107108
persistSession: true,
108109
effort: deps.config.effort,
110+
thinking: getThinkingConfig(deps.config.model),
109111
includePartialMessages: true,
110112
agentProgressSummaries: true,
111113
promptSuggestions: true,

src/agent/judge-query.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ import { z } from "zod/v4";
33
import { buildProviderEnv } from "../config/providers.ts";
44
import type { PhantomConfig } from "../config/types.ts";
55
import { extractTextFromMessage } from "./message-utils.ts";
6+
import { getThinkingConfig } from "./thinking-config.ts";
67

78
// Judge subprocess integration. Routes LLM judge calls through the same
89
// Agent SDK `query()` subprocess as the main agent so that auth, provider,
@@ -164,6 +165,7 @@ export async function runJudgeQuery<T>(
164165
systemPrompt,
165166
maxTurns: 1,
166167
effort: "low",
168+
thinking: getThinkingConfig(resolvedModel),
167169
persistSession: false,
168170
env: { ...process.env, ...providerEnv },
169171
},

src/agent/runtime.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ import { extractCost, extractTextFromMessage } from "./message-utils.ts";
1919
import { permissionOptionsFromConfig } from "./permission-options.ts";
2020
import { assemblePrompt } from "./prompt-assembler.ts";
2121
import { SessionStore } from "./session-store.ts";
22+
import { getThinkingConfig } from "./thinking-config.ts";
2223

2324
export type RuntimeEvent =
2425
| { type: "init"; sessionId: string }
@@ -206,6 +207,7 @@ export class AgentRuntime {
206207
systemPrompt: { type: "preset" as const, preset: "claude_code" as const, append: appendPrompt },
207208
persistSession: true,
208209
effort: this.config.effort,
210+
thinking: getThinkingConfig(this.config.model),
209211
...(this.config.max_budget_usd > 0 ? { maxBudgetUsd: this.config.max_budget_usd } : {}),
210212
abortController: controller,
211213
env: { ...process.env, ...providerEnv },

src/agent/thinking-config.ts

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
// Single-source-of-truth picker for the Agent SDK `thinking` option.
2+
//
3+
// The matrix is non-uniform across models:
4+
// - Opus 4.7 only accepts `{ type: "adaptive" }`. Manual `enabled +
5+
// budget_tokens` is rejected with a 400.
6+
// - Haiku 4.5 only accepts `{ type: "enabled", budget_tokens: N }`.
7+
// Adaptive is rejected with a 400.
8+
// - Sonnet 4.6 accepts both shapes (manual is deprecated but still
9+
// functional).
10+
//
11+
// Verified against the live Messages API on 2026-04-29; see the design
12+
// note at local/2026-04-29-thinking-config-design.md (local-only).
13+
//
14+
// Every SDK `query()` callsite spreads `getThinkingConfig(model)` instead
15+
// of hard-coding a single shape, so reflection (Haiku tier), chat (Opus
16+
// tier), judges (Sonnet tier), and the AgentRuntime path all pick the
17+
// correct shape. New models default to adaptive because every model
18+
// Anthropic has shipped since 4.7 only accepts adaptive.
19+
20+
import type { ThinkingConfig } from "@anthropic-ai/claude-agent-sdk";
21+
22+
const ADAPTIVE_PREFIXES: readonly string[] = [
23+
"claude-opus-4-7",
24+
"claude-opus-4-6",
25+
"claude-sonnet-4-6",
26+
"claude-mythos",
27+
];
28+
29+
const MANUAL_ONLY_PREFIXES: readonly string[] = [
30+
"claude-haiku-4",
31+
"claude-haiku-3",
32+
"claude-sonnet-3",
33+
"claude-sonnet-4-5",
34+
"claude-opus-4-5",
35+
];
36+
37+
const MANUAL_BUDGET_TOKENS = 8192;
38+
39+
export function getThinkingConfig(model: string | undefined | null): ThinkingConfig {
40+
if (!model) return { type: "adaptive" };
41+
if (ADAPTIVE_PREFIXES.some((p) => model.startsWith(p))) {
42+
return { type: "adaptive" };
43+
}
44+
if (MANUAL_ONLY_PREFIXES.some((p) => model.startsWith(p))) {
45+
return { type: "enabled", budgetTokens: MANUAL_BUDGET_TOKENS };
46+
}
47+
// Unknown model: prefer adaptive. Every model Anthropic has released
48+
// since Opus 4.7 only accepts adaptive, so a new variant is far more
49+
// likely to require adaptive than to require manual mode. If wrong,
50+
// the API returns a clear 400 error with the required shape.
51+
return { type: "adaptive" };
52+
}

src/evolution/reflection-subprocess.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import { appendFileSync, existsSync, mkdirSync, readdirSync, unlinkSync, writeFileSync } from "node:fs";
22
import { dirname, join } from "node:path";
33
import { query } from "@anthropic-ai/claude-agent-sdk";
4+
import { getThinkingConfig } from "../agent/thinking-config.ts";
45
import { buildProviderEnv } from "../config/providers.ts";
56
import type { PhantomConfig } from "../config/types.ts";
67
import type { EvolutionConfig } from "./config.ts";
@@ -647,6 +648,7 @@ async function defaultRunner(input: SpawnQueryInput): Promise<SpawnQueryResult>
647648
permissionMode: "bypassPermissions",
648649
allowDangerouslySkipPermissions: true,
649650
tools: ["Read", "Write", "Edit", "Glob", "Grep"],
651+
thinking: getThinkingConfig(model),
650652
systemPrompt,
651653
settings: {
652654
permissions: { allow, deny },

0 commit comments

Comments
 (0)