You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Pilot is consuming 4× baseline tokens on spike days ($80–110/day vs $20–25 baseline). Root cause analysis confirmed via ccusage + pilot.db audit: session names match GH issue numbers — Pilot subprocess executions are the spend, not interactive work. Cache-read tokens dominate (~93%), confirming massive system context replay across thousands of turns and dozens of subprocess restarts.
GLM/Z.AI is already disabled; Anthropic-only. Goal: cut Pilot's per-task cost ~70% by routing Opus to where reasoning matters (epic planning) and Sonnet to verbose execution.
Acceptance Criteria
All backend.Execute runner subprocesses default to claude-sonnet-4-6
internal/executor/runner.go — at all 6 backend.Execute sites (lines 1606, 1854, 2148, 2419, 2695, 3308), populate AllowedTools and MCPConfigPath from r.config.ClaudeCode. Single helper would be cleaner — extract r.buildExecutorOptions(...).
The env override is critical: Pilot's global ANTHROPIC_MODEL=sonnet would otherwise win on Node's last-write lookup inside Claude Code (per backend_claudecode.go:367–370 comment).
User-side config edits (~/.pilot/config.yaml) and MEMORY.md trim — manual
Verification
After merge + pilot start restart:
Trigger small fix issue → sqlite3 ~/.pilot/data/pilot.db \"SELECT model_name FROM executions ORDER BY created_at DESC LIMIT 1\" → expect claude-sonnet-4-6
ps aux | grep claude during execution → expect --allowedTools and --mcp-config flags present
Trigger epic issue (5+ numbered steps) → ps aux | grep claude during planning → expect --model claude-opus-4-7
Fail a PR 3× with pilot start restart in between → expect pilot-retry-exhausted label, 4th retry blocked
ccusage 1 week post-deploy → daily spend $20–30, no $80+ days; Opus only on planning sessions
Context
Pilot is consuming 4× baseline tokens on spike days ($80–110/day vs $20–25 baseline). Root cause analysis confirmed via ccusage + pilot.db audit: session names match GH issue numbers — Pilot subprocess executions are the spend, not interactive work. Cache-read tokens dominate (~93%), confirming massive system context replay across thousands of turns and dozens of subprocess restarts.
GLM/Z.AI is already disabled; Anthropic-only. Goal: cut Pilot's per-task cost ~70% by routing Opus to where reasoning matters (epic planning) and Sonnet to verbose execution.
Acceptance Criteria
backend.Executerunner subprocesses default toclaude-sonnet-4-6PlanEpic(epic.go:194) passes--model claude-opus-4-7explicitly + setsANTHROPIC_MODELenv per-invocationexecutor.planning.model(defaultclaude-opus-4-7)model_routing.complexdefaults toclaude-sonnet-4-6(was opus)ExecuteOptionsgainsAllowedTools []stringandMCPConfigPath stringfields--allowedToolsand--mcp-configwhen setbackend.Executecall sites +PlanEpicpass these fieldsAllowedToolsfor execution:[Read, Write, Edit, Bash, Grep, Glob, Task]AllowedToolsfor planning:[Read, Grep, Glob](planning shouldn't write code)MCPConfigPathempty (no MCP servers loaded by subprocess)pilot-retry-1/2/exhausted) — survivespilot startrestartauto_merger.gohooks.RunTestsOnStopdefault flipped tofalseinhooks.go:54go build ./... && go test ./...passImplementation
Phase 1 — Config additions
internal/executor/config.go— extendClaudeCodeConfig:Add new top-level
PlanningConfig:Add
Planning *PlanningConfigtoExecutorConfig.Defaults:
AllowedTools→[\"Read\", \"Write\", \"Edit\", \"Bash\", \"Grep\", \"Glob\", \"Task\"]MCPConfigPath→\"\"(no MCPs)Planning.Model→\"claude-opus-4-7\"Phase 2 — Backend wiring
internal/executor/backend.go:25–72— extendExecuteOptions:internal/executor/backend_claudecode.go:336–358— append flags:Phase 3 — Runner call sites
internal/executor/runner.go— at all 6backend.Executesites (lines 1606, 1854, 2148, 2419, 2695, 3308), populateAllowedToolsandMCPConfigPathfromr.config.ClaudeCode. Single helper would be cleaner — extractr.buildExecutorOptions(...).Phase 4 — Planning gets Opus
internal/executor/epic.go:194–238— modifyPlanEpic:The env override is critical: Pilot's global
ANTHROPIC_MODEL=sonnetwould otherwise win on Node's last-write lookup inside Claude Code (perbackend_claudecode.go:367–370comment).Phase 5 — Retry-counter via GitHub labels
internal/github/labels.go— add constants:internal/poller/poller.go:117,270— replaceretryReadyCountmap lookup withgh issue view --json labelscheck. Logic:pilot-retry-ready, no retry-N → addpilot-retry-1, dispatchpilot-retry-1→ replace withpilot-retry-2, dispatchpilot-retry-2→ replace withpilot-retry-exhausted, skip dispatchpilot-retry-exhausted→ skip dispatch (terminal)internal/autopilot/auto_merger.go— on successful merge, strip all retry labels from the issue.Phase 6 — Stop hook default
internal/executor/hooks.go:54— flipRunTestsOnStopdefault fromtruetofalse.Out of Scope
complex: highstays — Sonnet'shighalready cheap)~/.pilot/config.yaml) andMEMORY.mdtrim — manualVerification
After merge +
pilot startrestart:sqlite3 ~/.pilot/data/pilot.db \"SELECT model_name FROM executions ORDER BY created_at DESC LIMIT 1\"→ expectclaude-sonnet-4-6ps aux | grep claudeduring execution → expect--allowedToolsand--mcp-configflags presentps aux | grep claudeduring planning → expect--model claude-opus-4-7pilot startrestart in between → expectpilot-retry-exhaustedlabel, 4th retry blockedRefs
internal/executor/epic.go:194–238(PlanEpic)internal/executor/backend_claudecode.go:307–360(args builder)internal/executor/runner.go(6 backend.Execute sites)internal/poller/poller.go:117,270(retryReadyCount)/Users/aleks.petrov/.claude/plans/unified-splashing-kite.md