feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70%

## Context

Pilot is consuming 4× baseline tokens on spike days ($80–110/day vs $20–25 baseline). Root cause analysis confirmed via ccusage + pilot.db audit: **session names match GH issue numbers** — Pilot subprocess executions are the spend, not interactive work. Cache-read tokens dominate (~93%), confirming massive system context replay across thousands of turns and dozens of subprocess restarts.

GLM/Z.AI is already disabled; Anthropic-only. Goal: **cut Pilot's per-task cost ~70%** by routing Opus to where reasoning matters (epic planning) and Sonnet to verbose execution.

## Acceptance Criteria

1. [ ] All `backend.Execute` runner subprocesses default to `claude-sonnet-4-6`
2. [ ] `PlanEpic` (`epic.go:194`) passes `--model claude-opus-4-7` explicitly + sets `ANTHROPIC_MODEL` env per-invocation
3. [ ] New config field `executor.planning.model` (default `claude-opus-4-7`)
4. [ ] `model_routing.complex` defaults to `claude-sonnet-4-6` (was opus)
5. [ ] `ExecuteOptions` gains `AllowedTools []string` and `MCPConfigPath string` fields
6. [ ] Subprocess args builder emits `--allowedTools` and `--mcp-config` when set
7. [ ] All 6 `backend.Execute` call sites + `PlanEpic` pass these fields
8. [ ] Default `AllowedTools` for execution: `[Read, Write, Edit, Bash, Grep, Glob, Task]`
9. [ ] Default `AllowedTools` for planning: `[Read, Grep, Glob]` (planning shouldn't write code)
10. [ ] Default `MCPConfigPath` empty (no MCP servers loaded by subprocess)
11. [ ] Retry counter persisted via GitHub labels (`pilot-retry-1/2/exhausted`) — survives `pilot start` restart
12. [ ] Retry labels cleaned up on PR merge in `auto_merger.go`
13. [ ] `hooks.RunTestsOnStop` default flipped to `false` in `hooks.go:54`
14. [ ] Tests added for: planning model routing, AllowedTools/MCPConfigPath wiring, retry-label state
15. [ ] `go build ./... && go test ./...` pass

## Implementation

### Phase 1 — Config additions

**`internal/executor/config.go`** — extend `ClaudeCodeConfig`:
```go
type ClaudeCodeConfig struct {
    // ... existing fields ...
    AllowedTools  []string `yaml:\"allowed_tools\"`
    MCPConfigPath string   `yaml:\"mcp_config_path\"`
}
```
Add new top-level `PlanningConfig`:
```go
type PlanningConfig struct {
    Model string \`yaml:\"model\"\`
}
```
Add `Planning *PlanningConfig` to `ExecutorConfig`.

Defaults:
- `AllowedTools` → `[\"Read\", \"Write\", \"Edit\", \"Bash\", \"Grep\", \"Glob\", \"Task\"]`
- `MCPConfigPath` → `\"\"` (no MCPs)
- `Planning.Model` → `\"claude-opus-4-7\"`

### Phase 2 — Backend wiring

**`internal/executor/backend.go:25–72`** — extend `ExecuteOptions`:
```go
type ExecuteOptions struct {
    // ... existing fields ...
    AllowedTools  []string
    MCPConfigPath string
}
```

**`internal/executor/backend_claudecode.go:336–358`** — append flags:
```go
if len(opts.AllowedTools) > 0 {
    args = append(args, \"--allowedTools\", strings.Join(opts.AllowedTools, \",\"))
}
if opts.MCPConfigPath != \"\" {
    args = append(args, \"--mcp-config\", opts.MCPConfigPath)
}
```

### Phase 3 — Runner call sites

**`internal/executor/runner.go`** — at all 6 `backend.Execute` sites (lines 1606, 1854, 2148, 2419, 2695, 3308), populate `AllowedTools` and `MCPConfigPath` from `r.config.ClaudeCode`. Single helper would be cleaner — extract `r.buildExecutorOptions(...)`.

### Phase 4 — Planning gets Opus

**`internal/executor/epic.go:194–238`** — modify `PlanEpic`:
```go
planningModel := \"claude-opus-4-7\"
if r.config != nil && r.config.Planning != nil && r.config.Planning.Model != \"\" {
    planningModel = r.config.Planning.Model
}
args := []string{
    \"--print\", \"-p\", prompt,
    \"--model\", planningModel,
    \"--allowedTools\", \"Read,Grep,Glob\",
}
cmd := exec.CommandContext(ctx, claudeCmd, args...)
cmd.Env = append(os.Environ(), \"ANTHROPIC_MODEL=\"+planningModel)
```
The env override is critical: Pilot's global `ANTHROPIC_MODEL=sonnet` would otherwise win on Node's last-write lookup inside Claude Code (per `backend_claudecode.go:367–370` comment).

### Phase 5 — Retry-counter via GitHub labels

**`internal/github/labels.go`** — add constants:
```go
LabelRetry1         = \"pilot-retry-1\"
LabelRetry2         = \"pilot-retry-2\"
LabelRetryExhausted = \"pilot-retry-exhausted\"
```

**`internal/poller/poller.go:117,270`** — replace `retryReadyCount` map lookup with `gh issue view --json labels` check. Logic:
- See `pilot-retry-ready`, no retry-N → add `pilot-retry-1`, dispatch
- See `pilot-retry-1` → replace with `pilot-retry-2`, dispatch
- See `pilot-retry-2` → replace with `pilot-retry-exhausted`, **skip dispatch**
- See `pilot-retry-exhausted` → skip dispatch (terminal)

**`internal/autopilot/auto_merger.go`** — on successful merge, strip all retry labels from the issue.

### Phase 6 — Stop hook default

**`internal/executor/hooks.go:54`** — flip `RunTestsOnStop` default from `true` to `false`.

## Out of Scope

- Self-review consolidation (kept as-is)
- Effort routing change (`complex: high` stays — Sonnet's `high` already cheap)
- Pre-existing telemetry gaps (separate issues #2428, #2429)
- User-side config edits (`~/.pilot/config.yaml`) and `MEMORY.md` trim — manual

## Verification

After merge + `pilot start` restart:
1. Trigger small fix issue → `sqlite3 ~/.pilot/data/pilot.db \"SELECT model_name FROM executions ORDER BY created_at DESC LIMIT 1\"` → expect `claude-sonnet-4-6`
2. `ps aux | grep claude` during execution → expect `--allowedTools` and `--mcp-config` flags present
3. Trigger epic issue (5+ numbered steps) → `ps aux | grep claude` during planning → expect `--model claude-opus-4-7`
4. Fail a PR 3× with `pilot start` restart in between → expect `pilot-retry-exhausted` label, 4th retry blocked
5. ccusage 1 week post-deploy → daily spend $20–30, no $80+ days; Opus only on planning sessions

## Refs

- `internal/executor/epic.go:194–238` (PlanEpic)
- `internal/executor/backend_claudecode.go:307–360` (args builder)
- `internal/executor/runner.go` (6 backend.Execute sites)
- `internal/poller/poller.go:117,270` (retryReadyCount)
- Plan: `/Users/aleks.petrov/.claude/plans/unified-splashing-kite.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70% #2432

Context

Acceptance Criteria

Implementation

Phase 1 — Config additions

Phase 2 — Backend wiring

Phase 3 — Runner call sites

Phase 4 — Planning gets Opus

Phase 5 — Retry-counter via GitHub labels

Phase 6 — Stop hook default

Out of Scope

Verification

Refs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

feat(executor): "Opus plans, Sonnet executes" — cut Pilot token spend ~70% #2432

Description

Context

Acceptance Criteria

Implementation

Phase 1 — Config additions

Phase 2 — Backend wiring

Phase 3 — Runner call sites

Phase 4 — Planning gets Opus

Phase 5 — Retry-counter via GitHub labels

Phase 6 — Stop hook default

Out of Scope

Verification

Refs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions