Skip to content

feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70% #2432

@alekspetrov

Description

@alekspetrov

Context

Pilot is consuming 4× baseline tokens on spike days ($80–110/day vs $20–25 baseline). Root cause analysis confirmed via ccusage + pilot.db audit: session names match GH issue numbers — Pilot subprocess executions are the spend, not interactive work. Cache-read tokens dominate (~93%), confirming massive system context replay across thousands of turns and dozens of subprocess restarts.

GLM/Z.AI is already disabled; Anthropic-only. Goal: cut Pilot's per-task cost ~70% by routing Opus to where reasoning matters (epic planning) and Sonnet to verbose execution.

Acceptance Criteria

  1. All backend.Execute runner subprocesses default to claude-sonnet-4-6
  2. PlanEpic (epic.go:194) passes --model claude-opus-4-7 explicitly + sets ANTHROPIC_MODEL env per-invocation
  3. New config field executor.planning.model (default claude-opus-4-7)
  4. model_routing.complex defaults to claude-sonnet-4-6 (was opus)
  5. ExecuteOptions gains AllowedTools []string and MCPConfigPath string fields
  6. Subprocess args builder emits --allowedTools and --mcp-config when set
  7. All 6 backend.Execute call sites + PlanEpic pass these fields
  8. Default AllowedTools for execution: [Read, Write, Edit, Bash, Grep, Glob, Task]
  9. Default AllowedTools for planning: [Read, Grep, Glob] (planning shouldn't write code)
  10. Default MCPConfigPath empty (no MCP servers loaded by subprocess)
  11. Retry counter persisted via GitHub labels (pilot-retry-1/2/exhausted) — survives pilot start restart
  12. Retry labels cleaned up on PR merge in auto_merger.go
  13. hooks.RunTestsOnStop default flipped to false in hooks.go:54
  14. Tests added for: planning model routing, AllowedTools/MCPConfigPath wiring, retry-label state
  15. go build ./... && go test ./... pass

Implementation

Phase 1 — Config additions

internal/executor/config.go — extend ClaudeCodeConfig:

type ClaudeCodeConfig struct {
    // ... existing fields ...
    AllowedTools  []string `yaml:\"allowed_tools\"`
    MCPConfigPath string   `yaml:\"mcp_config_path\"`
}

Add new top-level PlanningConfig:

type PlanningConfig struct {
    Model string \`yaml:\"model\"\`
}

Add Planning *PlanningConfig to ExecutorConfig.

Defaults:

  • AllowedTools[\"Read\", \"Write\", \"Edit\", \"Bash\", \"Grep\", \"Glob\", \"Task\"]
  • MCPConfigPath\"\" (no MCPs)
  • Planning.Model\"claude-opus-4-7\"

Phase 2 — Backend wiring

internal/executor/backend.go:25–72 — extend ExecuteOptions:

type ExecuteOptions struct {
    // ... existing fields ...
    AllowedTools  []string
    MCPConfigPath string
}

internal/executor/backend_claudecode.go:336–358 — append flags:

if len(opts.AllowedTools) > 0 {
    args = append(args, \"--allowedTools\", strings.Join(opts.AllowedTools, \",\"))
}
if opts.MCPConfigPath != \"\" {
    args = append(args, \"--mcp-config\", opts.MCPConfigPath)
}

Phase 3 — Runner call sites

internal/executor/runner.go — at all 6 backend.Execute sites (lines 1606, 1854, 2148, 2419, 2695, 3308), populate AllowedTools and MCPConfigPath from r.config.ClaudeCode. Single helper would be cleaner — extract r.buildExecutorOptions(...).

Phase 4 — Planning gets Opus

internal/executor/epic.go:194–238 — modify PlanEpic:

planningModel := \"claude-opus-4-7\"
if r.config != nil && r.config.Planning != nil && r.config.Planning.Model != \"\" {
    planningModel = r.config.Planning.Model
}
args := []string{
    \"--print\", \"-p\", prompt,
    \"--model\", planningModel,
    \"--allowedTools\", \"Read,Grep,Glob\",
}
cmd := exec.CommandContext(ctx, claudeCmd, args...)
cmd.Env = append(os.Environ(), \"ANTHROPIC_MODEL=\"+planningModel)

The env override is critical: Pilot's global ANTHROPIC_MODEL=sonnet would otherwise win on Node's last-write lookup inside Claude Code (per backend_claudecode.go:367–370 comment).

Phase 5 — Retry-counter via GitHub labels

internal/github/labels.go — add constants:

LabelRetry1         = \"pilot-retry-1\"
LabelRetry2         = \"pilot-retry-2\"
LabelRetryExhausted = \"pilot-retry-exhausted\"

internal/poller/poller.go:117,270 — replace retryReadyCount map lookup with gh issue view --json labels check. Logic:

  • See pilot-retry-ready, no retry-N → add pilot-retry-1, dispatch
  • See pilot-retry-1 → replace with pilot-retry-2, dispatch
  • See pilot-retry-2 → replace with pilot-retry-exhausted, skip dispatch
  • See pilot-retry-exhausted → skip dispatch (terminal)

internal/autopilot/auto_merger.go — on successful merge, strip all retry labels from the issue.

Phase 6 — Stop hook default

internal/executor/hooks.go:54 — flip RunTestsOnStop default from true to false.

Out of Scope

Verification

After merge + pilot start restart:

  1. Trigger small fix issue → sqlite3 ~/.pilot/data/pilot.db \"SELECT model_name FROM executions ORDER BY created_at DESC LIMIT 1\" → expect claude-sonnet-4-6
  2. ps aux | grep claude during execution → expect --allowedTools and --mcp-config flags present
  3. Trigger epic issue (5+ numbered steps) → ps aux | grep claude during planning → expect --model claude-opus-4-7
  4. Fail a PR 3× with pilot start restart in between → expect pilot-retry-exhausted label, 4th retry blocked
  5. ccusage 1 week post-deploy → daily spend $20–30, no $80+ days; Opus only on planning sessions

Refs

  • internal/executor/epic.go:194–238 (PlanEpic)
  • internal/executor/backend_claudecode.go:307–360 (args builder)
  • internal/executor/runner.go (6 backend.Execute sites)
  • internal/poller/poller.go:117,270 (retryReadyCount)
  • Plan: /Users/aleks.petrov/.claude/plans/unified-splashing-kite.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    no-decomposeSkip epic detection and decompositionpilotPilot AI will work on thispilot-donePilot completed this

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions