GH-2432: feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70% by alekspetrov · Pull Request #2437 · qf-studio/pilot

alekspetrov · 2026-04-29T20:28:32Z

Summary

Automated PR created by Pilot for task GH-2432.

Changes

GitHub Issue #2432: feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70%

Context

Pilot is consuming 4× baseline tokens on spike days ($80–110/day vs $20–25 baseline). Root cause analysis confirmed via ccusage + pilot.db audit: session names match GH issue numbers — Pilot subprocess executions are the spend, not interactive work. Cache-read tokens dominate (~93%), confirming massive system context replay across thousands of turns and dozens of subprocess restarts.

GLM/Z.AI is already disabled; Anthropic-only. Goal: cut Pilot's per-task cost ~70% by routing Opus to where reasoning matters (epic planning) and Sonnet to verbose execution.

Acceptance Criteria

Implementation

Phase 1 — Config additions

internal/executor/config.go — extend ClaudeCodeConfig:

type ClaudeCodeConfig struct {
    // ... existing fields ...
    AllowedTools  []string `yaml:\"allowed_tools\"`
    MCPConfigPath string   `yaml:\"mcp_config_path\"`
}

Add new top-level PlanningConfig:

type PlanningConfig struct {
    Model string \`yaml:\"model\"\`
}

Add Planning *PlanningConfig to ExecutorConfig.

Defaults:

AllowedTools → [\"Read\", \"Write\", \"Edit\", \"Bash\", \"Grep\", \"Glob\", \"Task\"]
MCPConfigPath → \"\" (no MCPs)
Planning.Model → \"claude-opus-4-7\"

Phase 2 — Backend wiring

internal/executor/backend.go:25–72 — extend ExecuteOptions:

type ExecuteOptions struct {
    // ... existing fields ...
    AllowedTools  []string
    MCPConfigPath string
}

internal/executor/backend_claudecode.go:336–358 — append flags:

if len(opts.AllowedTools) > 0 {
    args = append(args, \"--allowedTools\", strings.Join(opts.AllowedTools, \",\"))
}
if opts.MCPConfigPath != \"\" {
    args = append(args, \"--mcp-config\", opts.MCPConfigPath)
}

Phase 3 — Runner call sites

internal/executor/runner.go — at all 6 backend.Execute sites (lines 1606, 1854, 2148, 2419, 2695, 3308), populate AllowedTools and MCPConfigPath from r.config.ClaudeCode. Single helper would be cleaner — extract r.buildExecutorOptions(...).

Phase 4 — Planning gets Opus

internal/executor/epic.go:194–238 — modify PlanEpic:

planningModel := \"claude-opus-4-7\"
if r.config != nil && r.config.Planning != nil && r.config.Planning.Model != \"\" {
    planningModel = r.config.Planning.Model
}
args := []string{
    \"--print\", \"-p\", prompt,
    \"--model\", planningModel,
    \"--allowedTools\", \"Read,Grep,Glob\",
}
cmd := exec.CommandContext(ctx, claudeCmd, args...)
cmd.Env = append(os.Environ(), \"ANTHROPIC_MODEL=\"+planningModel)

The env override is critical: Pilot's global ANTHROPIC_MODEL=sonnet would otherwise win on Node's last-write lookup inside Claude Code (per backend_claudecode.go:367–370 comment).

Phase 5 — Retry-counter via GitHub labels

internal/github/labels.go — add constants:

LabelRetry1         = \"pilot-retry-1\"
LabelRetry2         = \"pilot-retry-2\"
LabelRetryExhausted = \"pilot-retry-exhausted\"

internal/poller/poller.go:117,270 — replace retryReadyCount map lookup with gh issue view --json labels check. Logic:

See pilot-retry-ready, no retry-N → add pilot-retry-1, dispatch
See pilot-retry-1 → replace with pilot-retry-2, dispatch
See pilot-retry-2 → replace with pilot-retry-exhausted, skip dispatch
See pilot-retry-exhausted → skip dispatch (terminal)

internal/autopilot/auto_merger.go — on successful merge, strip all retry labels from the issue.

Phase 6 — Stop hook default

internal/executor/hooks.go:54 — flip RunTestsOnStop default from true to false.

Out of Scope

Self-review consolidation (kept as-is)
Effort routing change (complex: high stays — Sonnet's high already cheap)
Pre-existing telemetry gaps (separate issues fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens #2428, feat(memory): wire usage_events table — billing pipeline scaffolded but never called #2429)
User-side config edits (~/.pilot/config.yaml) and MEMORY.md trim — manual

Verification

After merge + pilot start restart:

Trigger small fix issue → sqlite3 ~/.pilot/data/pilot.db \"SELECT model_name FROM executions ORDER BY created_at DESC LIMIT 1\" → expect claude-sonnet-4-6
ps aux | grep claude during execution → expect --allowedTools and --mcp-config flags present
Trigger epic issue (5+ numbered steps) → ps aux | grep claude during planning → expect --model claude-opus-4-7
Fail a PR 3× with pilot start restart in between → expect pilot-retry-exhausted label, 4th retry blocked
ccusage 1 week post-deploy → daily spend $20–30, no $80+ days; Opus only on planning sessions

Refs

internal/executor/epic.go:194–238 (PlanEpic)
internal/executor/backend_claudecode.go:307–360 (args builder)
internal/executor/runner.go (6 backend.Execute sites)
internal/poller/poller.go:117,270 (retryReadyCount)
Plan: /Users/aleks.petrov/.claude/plans/unified-splashing-kite.md

…H-2432) Routes Opus to where reasoning matters (epic planning) and Sonnet to verbose execution to address the 4× token spike in Pilot subprocess sessions. - ClaudeCodeConfig gains AllowedTools + MCPConfigPath; ExecuteOptions plumbs them through and the Claude Code args builder emits --allowedTools and --mcp-config when set. Default execution toolbox excludes MCPs (Read, Write, Edit, Bash, Grep, Glob, Task) to drop per-turn context bloat. - New PlanningConfig (default model claude-opus-4-7). PlanEpic now passes --model + --allowedTools=Read,Grep,Glob and exports ANTHROPIC_MODEL so Pilot's global Sonnet env doesn't override it on Node's last-write lookup. - model_routing.complex default flipped from claude-opus-4-6 to claude-sonnet-4-6 — Opus is reserved for planning only. - Retry counter persisted via labels (pilot-retry-1, pilot-retry-2, pilot-retry-exhausted). Survives `pilot start` restarts; previous in-memory retryReadyCount silently reset on restart and let pathological issues consume Opus indefinitely. Auto-merger strips retry labels on successful merge so a future regression starts fresh. - HooksConfig.RunTestsOnStop default flipped to false — Stop-hook tests forced long unproductive turns into the Claude session.

codecov-commenter · 2026-04-29T20:33:06Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 44.23077% with 58 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
internal/adapters/github/poller.go	38.63%	23 Missing and 4 partials ⚠️
internal/executor/runner.go	31.25%	22 Missing ⚠️
internal/executor/backend_claudecode.go	0.00%	2 Missing and 2 partials ⚠️
internal/autopilot/auto_merger.go	40.00%	2 Missing and 1 partial ⚠️
internal/executor/epic.go	75.00%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

alekspetrov mentioned this pull request Apr 29, 2026

feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70% #2432

Closed

15 tasks

alekspetrov merged commit 5a965bc into main Apr 29, 2026
4 checks passed

alekspetrov deleted the pilot/GH-2432 branch April 29, 2026 20:34

alekspetrov mentioned this pull request Apr 30, 2026

docs(content): sync to v2.102.2 — planning split, OpenCode attached, ExecuteOptions, version component #2441

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GH-2432: feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70%#2437

GH-2432: feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70%#2437
alekspetrov merged 1 commit intomainfrom
pilot/GH-2432

alekspetrov commented Apr 29, 2026

Uh oh!

codecov-commenter commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alekspetrov commented Apr 29, 2026

Summary

Changes

Context

Acceptance Criteria

Implementation

Phase 1 — Config additions

Phase 2 — Backend wiring

Phase 3 — Runner call sites

Phase 4 — Planning gets Opus

Phase 5 — Retry-counter via GitHub labels

Phase 6 — Stop hook default

Out of Scope

Verification

Refs

Uh oh!

codecov-commenter commented Apr 29, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants