GH-2428: fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens by alekspetrov · Pull Request #2430 · qf-studio/pilot

alekspetrov · 2026-04-27T11:22:08Z

Summary

Automated PR created by Pilot for task GH-2428.

Changes

GitHub Issue #2428: fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens

Problem

Audit of ~/.pilot/data/pilot.db for Apr 17–27 shows 110 of 168 executions (65%) record tokens_total=0 and estimated_cost_usd=0, including 94 completed runs with valid commit_sha / pr_url. Real work, no usage data.

bucket       runs  status
-----------  ----  ---------
no_tokens     94   completed
no_tokens     16   failed
with_tokens   44   completed
with_tokens   14   failed

These 0-token rows also have empty model_name, suggesting the SDK message stream never produced a result event with token usage that runner.go could parse.

Likely root cause

Pilot now runs alternate backends (OpenCode → GLM/Z.AI per recent commits like 0bc54c6f fix(executor): send OpenCode model as {providerID, modelID}). The token-aggregation code paths in internal/executor/runner.go (lines 1642, 1942, 2010, 2148, 2487, 2687) appear tied to the Claude-Code SDK message format. If the OpenCode adapter doesn't emit equivalent usage events — or emits them in a different shape — state.tokensInput/tokensOutput stay at 0 and SaveExecutionMetrics writes zeros.

Sample affected runs (real work, 0 tokens):

exec_id	task	duration	commit
87d57fa4…	GH-2401	17m	`457d79b`
a52f2110…	GH-2396	—	—
adf04ea8…	GH-2392	—	—

Why it matters

Cost reporting (bench-status.py, dashboard, lifetime tokens via fix(dashboard): metrics cards show inconsistent data across restarts #533) understates actual spend. Recorded $104.85 vs. account-level ccusage $574 over the same window — Pilot looks like 18% of spend when actual share is much higher.
Pattern-learning model-outcome scoring uses estimated_cost_usd for cost-per-PR — biased toward backends that don't report.
Budget enforcement (gateway/budget.go) can't enforce per-task limits on backends with no token telemetry.

Suggested fix

Audit internal/executor/runner.go token-capture sites against each backend (Claude Code SDK, OpenCode, Codex). Confirm where state.tokensInput/Output and cacheCreationInputTokens are set per backend.
For OpenCode/GLM: parse the backend's usage event format and map to state.tokens* fields. If the backend doesn't emit usage, tokenize the prompt + output locally as a fallback estimate (mark as estimated).
Set result.ModelName even when token usage is missing, so we can filter "telemetry-missing" runs from "true-zero" runs.
Add a startup check: if last N completed executions all have tokens_total=0 for a given backend, log a warning.

Verification

After fix: rerun a known issue against each backend, confirm tokens_total > 0 and model_name != '' for all completed executions. Backfill not required.

Refs

internal/executor/runner.go:1642,1942,2010,2148,2487,2687
internal/executor/dispatcher.go:670 (SaveExecutionMetrics call site)
internal/memory/metrics.go:169
Recent commit 0bc54c6f (OpenCode model object fix)

… runs (GH-2428) 65% of executions on Apr 17–27 wrote tokens_total=0 and model_name='' to ~/.pilot/data/pilot.db, including completed runs with real commits. Recorded $104.85 vs. ccusage $574 over the same window — Pilot looked like 18% of spend when actual share is much higher. Cost reporting, model-outcome scoring, and per-task budget enforcement all degraded. Three concrete sources: 1. Epic parent path returned epicResult with no ModelName/TokensTotal (runner.go ~1243). The orchestrator never calls a backend, but the row is indistinguishable from "telemetry missing" downstream. 2. Hardcoded "claude-opus-4-6" fallback in runner.go (lines 1647, 2007) was stale (real CC runs report 4-7) and silently labelled OpenCode/ GLM runs as Claude Opus, biasing model-outcome cost-per-PR. 3. OpenCode SSE path (parseSSEStream) accumulated TokensInput/Output but dropped CacheCreationInputTokens / CacheReadInputTokens — parity gap vs. parseAssistantResponse for v1.4.x usage events. Changes: - Add Runner.fallbackModelName() — config.DefaultModel → OpenCode.Model → backend type. Used in epic result and the two stale hardcoded sites. - Set ModelName on epic orchestrator result so audit queries can split "epic, no backend call" from "telemetry missing". - Accumulate cache tokens in OpenCode SSE; openCodeUsage now accepts both flat (cache_creation_input_tokens) and nested (cache.{read,write}) layouts that different OpenCode builds emit. - Add Store.RecentCompletedTelemetryStats + Dispatcher.checkTelemetryGap: on startup, sample the last 50 completed runs (with commit_sha) and log a warning when ≥50% report tokens_total=0 — early signal that the backend's usage events aren't being parsed. - Tests: fallbackModelName resolution, SSE cache-token capture (flat + nested), telemetry-gap query (commit-only filter excludes epics). Verification: go build ./... + go test ./... clean.

codecov-commenter · 2026-04-27T11:26:00Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 63.01370% with 27 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
internal/executor/dispatcher.go	31.81%	14 Missing and 1 partial ⚠️
internal/executor/runner.go	50.00%	8 Missing ⚠️
internal/memory/metrics.go	80.95%	2 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

alekspetrov mentioned this pull request Apr 27, 2026

fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens #2428

Closed

alekspetrov merged commit 8f3a4e7 into main Apr 27, 2026
4 checks passed

alekspetrov deleted the pilot/GH-2428 branch April 27, 2026 11:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GH-2428: fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens#2430

GH-2428: fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens#2430
alekspetrov merged 1 commit intomainfrom
pilot/GH-2428

alekspetrov commented Apr 27, 2026

Uh oh!

codecov-commenter commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alekspetrov commented Apr 27, 2026

Summary

Changes

Problem

Likely root cause

Why it matters

Suggested fix

Verification

Refs

Uh oh!

codecov-commenter commented Apr 27, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants