Skip to content

GH-2432: feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70%#2437

Merged
alekspetrov merged 1 commit intomainfrom
pilot/GH-2432
Apr 29, 2026
Merged

GH-2432: feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70%#2437
alekspetrov merged 1 commit intomainfrom
pilot/GH-2432

Conversation

@alekspetrov
Copy link
Copy Markdown
Collaborator

Summary

Automated PR created by Pilot for task GH-2432.

Closes #2432

Changes

GitHub Issue #2432: feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70%

Context

Pilot is consuming 4× baseline tokens on spike days ($80–110/day vs $20–25 baseline). Root cause analysis confirmed via ccusage + pilot.db audit: session names match GH issue numbers — Pilot subprocess executions are the spend, not interactive work. Cache-read tokens dominate (~93%), confirming massive system context replay across thousands of turns and dozens of subprocess restarts.

GLM/Z.AI is already disabled; Anthropic-only. Goal: cut Pilot's per-task cost ~70% by routing Opus to where reasoning matters (epic planning) and Sonnet to verbose execution.

Acceptance Criteria

  1. All backend.Execute runner subprocesses default to claude-sonnet-4-6
  2. PlanEpic (epic.go:194) passes --model claude-opus-4-7 explicitly + sets ANTHROPIC_MODEL env per-invocation
  3. New config field executor.planning.model (default claude-opus-4-7)
  4. model_routing.complex defaults to claude-sonnet-4-6 (was opus)
  5. ExecuteOptions gains AllowedTools []string and MCPConfigPath string fields
  6. Subprocess args builder emits --allowedTools and --mcp-config when set
  7. All 6 backend.Execute call sites + PlanEpic pass these fields
  8. Default AllowedTools for execution: [Read, Write, Edit, Bash, Grep, Glob, Task]
  9. Default AllowedTools for planning: [Read, Grep, Glob] (planning shouldn't write code)
  10. Default MCPConfigPath empty (no MCP servers loaded by subprocess)
  11. Retry counter persisted via GitHub labels (pilot-retry-1/2/exhausted) — survives pilot start restart
  12. Retry labels cleaned up on PR merge in auto_merger.go
  13. hooks.RunTestsOnStop default flipped to false in hooks.go:54
  14. Tests added for: planning model routing, AllowedTools/MCPConfigPath wiring, retry-label state
  15. go build ./... && go test ./... pass

Implementation

Phase 1 — Config additions

internal/executor/config.go — extend ClaudeCodeConfig:

type ClaudeCodeConfig struct {
    // ... existing fields ...
    AllowedTools  []string `yaml:\"allowed_tools\"`
    MCPConfigPath string   `yaml:\"mcp_config_path\"`
}

Add new top-level PlanningConfig:

type PlanningConfig struct {
    Model string \`yaml:\"model\"\`
}

Add Planning *PlanningConfig to ExecutorConfig.

Defaults:

  • AllowedTools[\"Read\", \"Write\", \"Edit\", \"Bash\", \"Grep\", \"Glob\", \"Task\"]
  • MCPConfigPath\"\" (no MCPs)
  • Planning.Model\"claude-opus-4-7\"

Phase 2 — Backend wiring

internal/executor/backend.go:25–72 — extend ExecuteOptions:

type ExecuteOptions struct {
    // ... existing fields ...
    AllowedTools  []string
    MCPConfigPath string
}

internal/executor/backend_claudecode.go:336–358 — append flags:

if len(opts.AllowedTools) > 0 {
    args = append(args, \"--allowedTools\", strings.Join(opts.AllowedTools, \",\"))
}
if opts.MCPConfigPath != \"\" {
    args = append(args, \"--mcp-config\", opts.MCPConfigPath)
}

Phase 3 — Runner call sites

internal/executor/runner.go — at all 6 backend.Execute sites (lines 1606, 1854, 2148, 2419, 2695, 3308), populate AllowedTools and MCPConfigPath from r.config.ClaudeCode. Single helper would be cleaner — extract r.buildExecutorOptions(...).

Phase 4 — Planning gets Opus

internal/executor/epic.go:194–238 — modify PlanEpic:

planningModel := \"claude-opus-4-7\"
if r.config != nil && r.config.Planning != nil && r.config.Planning.Model != \"\" {
    planningModel = r.config.Planning.Model
}
args := []string{
    \"--print\", \"-p\", prompt,
    \"--model\", planningModel,
    \"--allowedTools\", \"Read,Grep,Glob\",
}
cmd := exec.CommandContext(ctx, claudeCmd, args...)
cmd.Env = append(os.Environ(), \"ANTHROPIC_MODEL=\"+planningModel)

The env override is critical: Pilot's global ANTHROPIC_MODEL=sonnet would otherwise win on Node's last-write lookup inside Claude Code (per backend_claudecode.go:367–370 comment).

Phase 5 — Retry-counter via GitHub labels

internal/github/labels.go — add constants:

LabelRetry1         = \"pilot-retry-1\"
LabelRetry2         = \"pilot-retry-2\"
LabelRetryExhausted = \"pilot-retry-exhausted\"

internal/poller/poller.go:117,270 — replace retryReadyCount map lookup with gh issue view --json labels check. Logic:

  • See pilot-retry-ready, no retry-N → add pilot-retry-1, dispatch
  • See pilot-retry-1 → replace with pilot-retry-2, dispatch
  • See pilot-retry-2 → replace with pilot-retry-exhausted, skip dispatch
  • See pilot-retry-exhausted → skip dispatch (terminal)

internal/autopilot/auto_merger.go — on successful merge, strip all retry labels from the issue.

Phase 6 — Stop hook default

internal/executor/hooks.go:54 — flip RunTestsOnStop default from true to false.

Out of Scope

Verification

After merge + pilot start restart:

  1. Trigger small fix issue → sqlite3 ~/.pilot/data/pilot.db \"SELECT model_name FROM executions ORDER BY created_at DESC LIMIT 1\" → expect claude-sonnet-4-6
  2. ps aux | grep claude during execution → expect --allowedTools and --mcp-config flags present
  3. Trigger epic issue (5+ numbered steps) → ps aux | grep claude during planning → expect --model claude-opus-4-7
  4. Fail a PR 3× with pilot start restart in between → expect pilot-retry-exhausted label, 4th retry blocked
  5. ccusage 1 week post-deploy → daily spend $20–30, no $80+ days; Opus only on planning sessions

Refs

  • internal/executor/epic.go:194–238 (PlanEpic)
  • internal/executor/backend_claudecode.go:307–360 (args builder)
  • internal/executor/runner.go (6 backend.Execute sites)
  • internal/poller/poller.go:117,270 (retryReadyCount)
  • Plan: /Users/aleks.petrov/.claude/plans/unified-splashing-kite.md

…H-2432)

Routes Opus to where reasoning matters (epic planning) and Sonnet to verbose
execution to address the 4× token spike in Pilot subprocess sessions.

- ClaudeCodeConfig gains AllowedTools + MCPConfigPath; ExecuteOptions plumbs
  them through and the Claude Code args builder emits --allowedTools and
  --mcp-config when set. Default execution toolbox excludes MCPs (Read,
  Write, Edit, Bash, Grep, Glob, Task) to drop per-turn context bloat.
- New PlanningConfig (default model claude-opus-4-7). PlanEpic now passes
  --model + --allowedTools=Read,Grep,Glob and exports ANTHROPIC_MODEL so
  Pilot's global Sonnet env doesn't override it on Node's last-write lookup.
- model_routing.complex default flipped from claude-opus-4-6 to
  claude-sonnet-4-6 — Opus is reserved for planning only.
- Retry counter persisted via labels (pilot-retry-1, pilot-retry-2,
  pilot-retry-exhausted). Survives `pilot start` restarts; previous
  in-memory retryReadyCount silently reset on restart and let pathological
  issues consume Opus indefinitely. Auto-merger strips retry labels on
  successful merge so a future regression starts fresh.
- HooksConfig.RunTestsOnStop default flipped to false — Stop-hook tests
  forced long unproductive turns into the Claude session.
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 44.23077% with 58 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/adapters/github/poller.go 38.63% 23 Missing and 4 partials ⚠️
internal/executor/runner.go 31.25% 22 Missing ⚠️
internal/executor/backend_claudecode.go 0.00% 2 Missing and 2 partials ⚠️
internal/autopilot/auto_merger.go 40.00% 2 Missing and 1 partial ⚠️
internal/executor/epic.go 75.00% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@alekspetrov alekspetrov merged commit 5a965bc into main Apr 29, 2026
4 checks passed
@alekspetrov alekspetrov deleted the pilot/GH-2432 branch April 29, 2026 20:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70%

2 participants