fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens

## Problem

Audit of `~/.pilot/data/pilot.db` for Apr 17–27 shows **110 of 168 executions (65%) record `tokens_total=0` and `estimated_cost_usd=0`**, including 94 *completed* runs with valid `commit_sha` / `pr_url`. Real work, no usage data.

```
bucket       runs  status
-----------  ----  ---------
no_tokens     94   completed
no_tokens     16   failed
with_tokens   44   completed
with_tokens   14   failed
```

These 0-token rows also have empty `model_name`, suggesting the SDK message stream never produced a `result` event with token usage that `runner.go` could parse.

## Likely root cause

Pilot now runs alternate backends (OpenCode → GLM/Z.AI per recent commits like `0bc54c6f fix(executor): send OpenCode model as {providerID, modelID}`). The token-aggregation code paths in `internal/executor/runner.go` (lines 1642, 1942, 2010, 2148, 2487, 2687) appear tied to the Claude-Code SDK message format. If the OpenCode adapter doesn't emit equivalent usage events — or emits them in a different shape — `state.tokensInput/tokensOutput` stay at 0 and `SaveExecutionMetrics` writes zeros.

Sample affected runs (real work, 0 tokens):

| exec_id | task | duration | commit |
|---|---|---|---|
| 87d57fa4… | GH-2401 | 17m | 457d79b9 |
| a52f2110… | GH-2396 | — | — |
| adf04ea8… | GH-2392 | — | — |

## Why it matters

- Cost reporting (`bench-status.py`, dashboard, lifetime tokens via GH-533) understates actual spend. Recorded $104.85 vs. account-level ccusage $574 over the same window — Pilot looks like 18% of spend when actual share is much higher.
- Pattern-learning model-outcome scoring uses `estimated_cost_usd` for cost-per-PR — biased toward backends that don't report.
- Budget enforcement (`gateway/budget.go`) can't enforce per-task limits on backends with no token telemetry.

## Suggested fix

1. Audit `internal/executor/runner.go` token-capture sites against each backend (Claude Code SDK, OpenCode, Codex). Confirm where `state.tokensInput/Output` and `cacheCreationInputTokens` are set per backend.
2. For OpenCode/GLM: parse the backend's usage event format and map to `state.tokens*` fields. If the backend doesn't emit usage, tokenize the prompt + output locally as a fallback estimate (mark as `estimated`).
3. Set `result.ModelName` even when token usage is missing, so we can filter "telemetry-missing" runs from "true-zero" runs.
4. Add a startup check: if last N completed executions all have `tokens_total=0` for a given backend, log a warning.

## Verification

After fix: rerun a known issue against each backend, confirm `tokens_total > 0` and `model_name != ''` for all completed executions. Backfill not required.

## Refs

- `internal/executor/runner.go:1642,1942,2010,2148,2487,2687`
- `internal/executor/dispatcher.go:670` (SaveExecutionMetrics call site)
- `internal/memory/metrics.go:169`
- Recent commit `0bc54c6f` (OpenCode model object fix)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens #2428

Problem

Likely root cause

Why it matters

Suggested fix

Verification

Refs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

exec_id	task	duration	commit
87d57fa4…	GH-2401	17m	`457d79b`
a52f2110…	GH-2396	—	—
adf04ea8…	GH-2392	—	—

Uh oh!

fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens #2428

Description

Problem

Likely root cause

Why it matters

Suggested fix

Verification

Refs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions