Skip to content

GH-2428: fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens#2430

Merged
alekspetrov merged 1 commit intomainfrom
pilot/GH-2428
Apr 27, 2026
Merged

GH-2428: fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens#2430
alekspetrov merged 1 commit intomainfrom
pilot/GH-2428

Conversation

@alekspetrov
Copy link
Copy Markdown
Collaborator

Summary

Automated PR created by Pilot for task GH-2428.

Closes #2428

Changes

GitHub Issue #2428: fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens

Problem

Audit of ~/.pilot/data/pilot.db for Apr 17–27 shows 110 of 168 executions (65%) record tokens_total=0 and estimated_cost_usd=0, including 94 completed runs with valid commit_sha / pr_url. Real work, no usage data.

bucket       runs  status
-----------  ----  ---------
no_tokens     94   completed
no_tokens     16   failed
with_tokens   44   completed
with_tokens   14   failed

These 0-token rows also have empty model_name, suggesting the SDK message stream never produced a result event with token usage that runner.go could parse.

Likely root cause

Pilot now runs alternate backends (OpenCode → GLM/Z.AI per recent commits like 0bc54c6f fix(executor): send OpenCode model as {providerID, modelID}). The token-aggregation code paths in internal/executor/runner.go (lines 1642, 1942, 2010, 2148, 2487, 2687) appear tied to the Claude-Code SDK message format. If the OpenCode adapter doesn't emit equivalent usage events — or emits them in a different shape — state.tokensInput/tokensOutput stay at 0 and SaveExecutionMetrics writes zeros.

Sample affected runs (real work, 0 tokens):

exec_id task duration commit
87d57fa4… GH-2401 17m 457d79b
a52f2110… GH-2396
adf04ea8… GH-2392

Why it matters

  • Cost reporting (bench-status.py, dashboard, lifetime tokens via fix(dashboard): metrics cards show inconsistent data across restarts #533) understates actual spend. Recorded $104.85 vs. account-level ccusage $574 over the same window — Pilot looks like 18% of spend when actual share is much higher.
  • Pattern-learning model-outcome scoring uses estimated_cost_usd for cost-per-PR — biased toward backends that don't report.
  • Budget enforcement (gateway/budget.go) can't enforce per-task limits on backends with no token telemetry.

Suggested fix

  1. Audit internal/executor/runner.go token-capture sites against each backend (Claude Code SDK, OpenCode, Codex). Confirm where state.tokensInput/Output and cacheCreationInputTokens are set per backend.
  2. For OpenCode/GLM: parse the backend's usage event format and map to state.tokens* fields. If the backend doesn't emit usage, tokenize the prompt + output locally as a fallback estimate (mark as estimated).
  3. Set result.ModelName even when token usage is missing, so we can filter "telemetry-missing" runs from "true-zero" runs.
  4. Add a startup check: if last N completed executions all have tokens_total=0 for a given backend, log a warning.

Verification

After fix: rerun a known issue against each backend, confirm tokens_total > 0 and model_name != '' for all completed executions. Backfill not required.

Refs

  • internal/executor/runner.go:1642,1942,2010,2148,2487,2687
  • internal/executor/dispatcher.go:670 (SaveExecutionMetrics call site)
  • internal/memory/metrics.go:169
  • Recent commit 0bc54c6f (OpenCode model object fix)

… runs (GH-2428)

65% of executions on Apr 17–27 wrote tokens_total=0 and model_name=''
to ~/.pilot/data/pilot.db, including completed runs with real commits.
Recorded $104.85 vs. ccusage $574 over the same window — Pilot looked
like 18% of spend when actual share is much higher. Cost reporting,
model-outcome scoring, and per-task budget enforcement all degraded.

Three concrete sources:
1. Epic parent path returned epicResult with no ModelName/TokensTotal
   (runner.go ~1243). The orchestrator never calls a backend, but the
   row is indistinguishable from "telemetry missing" downstream.
2. Hardcoded "claude-opus-4-6" fallback in runner.go (lines 1647, 2007)
   was stale (real CC runs report 4-7) and silently labelled OpenCode/
   GLM runs as Claude Opus, biasing model-outcome cost-per-PR.
3. OpenCode SSE path (parseSSEStream) accumulated TokensInput/Output
   but dropped CacheCreationInputTokens / CacheReadInputTokens —
   parity gap vs. parseAssistantResponse for v1.4.x usage events.

Changes:
- Add Runner.fallbackModelName() — config.DefaultModel → OpenCode.Model
  → backend type. Used in epic result and the two stale hardcoded sites.
- Set ModelName on epic orchestrator result so audit queries can split
  "epic, no backend call" from "telemetry missing".
- Accumulate cache tokens in OpenCode SSE; openCodeUsage now accepts
  both flat (cache_creation_input_tokens) and nested (cache.{read,write})
  layouts that different OpenCode builds emit.
- Add Store.RecentCompletedTelemetryStats + Dispatcher.checkTelemetryGap:
  on startup, sample the last 50 completed runs (with commit_sha) and
  log a warning when ≥50% report tokens_total=0 — early signal that the
  backend's usage events aren't being parsed.
- Tests: fallbackModelName resolution, SSE cache-token capture (flat +
  nested), telemetry-gap query (commit-only filter excludes epics).

Verification: go build ./... + go test ./... clean.
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 63.01370% with 27 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
internal/executor/dispatcher.go 31.81% 14 Missing and 1 partial ⚠️
internal/executor/runner.go 50.00% 8 Missing ⚠️
internal/memory/metrics.go 80.95% 2 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@alekspetrov alekspetrov merged commit 8f3a4e7 into main Apr 27, 2026
4 checks passed
@alekspetrov alekspetrov deleted the pilot/GH-2428 branch April 27, 2026 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens

2 participants