GH-2432: feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70%#2437
Merged
alekspetrov merged 1 commit intomainfrom Apr 29, 2026
Merged
GH-2432: feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70%#2437alekspetrov merged 1 commit intomainfrom
alekspetrov merged 1 commit intomainfrom
Conversation
…H-2432) Routes Opus to where reasoning matters (epic planning) and Sonnet to verbose execution to address the 4× token spike in Pilot subprocess sessions. - ClaudeCodeConfig gains AllowedTools + MCPConfigPath; ExecuteOptions plumbs them through and the Claude Code args builder emits --allowedTools and --mcp-config when set. Default execution toolbox excludes MCPs (Read, Write, Edit, Bash, Grep, Glob, Task) to drop per-turn context bloat. - New PlanningConfig (default model claude-opus-4-7). PlanEpic now passes --model + --allowedTools=Read,Grep,Glob and exports ANTHROPIC_MODEL so Pilot's global Sonnet env doesn't override it on Node's last-write lookup. - model_routing.complex default flipped from claude-opus-4-6 to claude-sonnet-4-6 — Opus is reserved for planning only. - Retry counter persisted via labels (pilot-retry-1, pilot-retry-2, pilot-retry-exhausted). Survives `pilot start` restarts; previous in-memory retryReadyCount silently reset on restart and let pathological issues consume Opus indefinitely. Auto-merger strips retry labels on successful merge so a future regression starts fresh. - HooksConfig.RunTestsOnStop default flipped to false — Stop-hook tests forced long unproductive turns into the Claude session.
15 tasks
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Automated PR created by Pilot for task GH-2432.
Closes #2432
Changes
GitHub Issue #2432: feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70%
Context
Pilot is consuming 4× baseline tokens on spike days ($80–110/day vs $20–25 baseline). Root cause analysis confirmed via ccusage + pilot.db audit: session names match GH issue numbers — Pilot subprocess executions are the spend, not interactive work. Cache-read tokens dominate (~93%), confirming massive system context replay across thousands of turns and dozens of subprocess restarts.
GLM/Z.AI is already disabled; Anthropic-only. Goal: cut Pilot's per-task cost ~70% by routing Opus to where reasoning matters (epic planning) and Sonnet to verbose execution.
Acceptance Criteria
backend.Executerunner subprocesses default toclaude-sonnet-4-6PlanEpic(epic.go:194) passes--model claude-opus-4-7explicitly + setsANTHROPIC_MODELenv per-invocationexecutor.planning.model(defaultclaude-opus-4-7)model_routing.complexdefaults toclaude-sonnet-4-6(was opus)ExecuteOptionsgainsAllowedTools []stringandMCPConfigPath stringfields--allowedToolsand--mcp-configwhen setbackend.Executecall sites +PlanEpicpass these fieldsAllowedToolsfor execution:[Read, Write, Edit, Bash, Grep, Glob, Task]AllowedToolsfor planning:[Read, Grep, Glob](planning shouldn't write code)MCPConfigPathempty (no MCP servers loaded by subprocess)pilot-retry-1/2/exhausted) — survivespilot startrestartauto_merger.gohooks.RunTestsOnStopdefault flipped tofalseinhooks.go:54go build ./... && go test ./...passImplementation
Phase 1 — Config additions
internal/executor/config.go— extendClaudeCodeConfig:Add new top-level
PlanningConfig:Add
Planning *PlanningConfigtoExecutorConfig.Defaults:
AllowedTools→[\"Read\", \"Write\", \"Edit\", \"Bash\", \"Grep\", \"Glob\", \"Task\"]MCPConfigPath→\"\"(no MCPs)Planning.Model→\"claude-opus-4-7\"Phase 2 — Backend wiring
internal/executor/backend.go:25–72— extendExecuteOptions:internal/executor/backend_claudecode.go:336–358— append flags:Phase 3 — Runner call sites
internal/executor/runner.go— at all 6backend.Executesites (lines 1606, 1854, 2148, 2419, 2695, 3308), populateAllowedToolsandMCPConfigPathfromr.config.ClaudeCode. Single helper would be cleaner — extractr.buildExecutorOptions(...).Phase 4 — Planning gets Opus
internal/executor/epic.go:194–238— modifyPlanEpic:The env override is critical: Pilot's global
ANTHROPIC_MODEL=sonnetwould otherwise win on Node's last-write lookup inside Claude Code (perbackend_claudecode.go:367–370comment).Phase 5 — Retry-counter via GitHub labels
internal/github/labels.go— add constants:internal/poller/poller.go:117,270— replaceretryReadyCountmap lookup withgh issue view --json labelscheck. Logic:pilot-retry-ready, no retry-N → addpilot-retry-1, dispatchpilot-retry-1→ replace withpilot-retry-2, dispatchpilot-retry-2→ replace withpilot-retry-exhausted, skip dispatchpilot-retry-exhausted→ skip dispatch (terminal)internal/autopilot/auto_merger.go— on successful merge, strip all retry labels from the issue.Phase 6 — Stop hook default
internal/executor/hooks.go:54— flipRunTestsOnStopdefault fromtruetofalse.Out of Scope
complex: highstays — Sonnet'shighalready cheap)~/.pilot/config.yaml) andMEMORY.mdtrim — manualVerification
After merge +
pilot startrestart:sqlite3 ~/.pilot/data/pilot.db \"SELECT model_name FROM executions ORDER BY created_at DESC LIMIT 1\"→ expectclaude-sonnet-4-6ps aux | grep claudeduring execution → expect--allowedToolsand--mcp-configflags presentps aux | grep claudeduring planning → expect--model claude-opus-4-7pilot startrestart in between → expectpilot-retry-exhaustedlabel, 4th retry blockedRefs
internal/executor/epic.go:194–238(PlanEpic)internal/executor/backend_claudecode.go:307–360(args builder)internal/executor/runner.go(6 backend.Execute sites)internal/poller/poller.go:117,270(retryReadyCount)/Users/aleks.petrov/.claude/plans/unified-splashing-kite.md