GH-2428: fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens#2430
Merged
alekspetrov merged 1 commit intomainfrom Apr 27, 2026
Merged
GH-2428: fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens#2430alekspetrov merged 1 commit intomainfrom
alekspetrov merged 1 commit intomainfrom
Conversation
… runs (GH-2428) 65% of executions on Apr 17–27 wrote tokens_total=0 and model_name='' to ~/.pilot/data/pilot.db, including completed runs with real commits. Recorded $104.85 vs. ccusage $574 over the same window — Pilot looked like 18% of spend when actual share is much higher. Cost reporting, model-outcome scoring, and per-task budget enforcement all degraded. Three concrete sources: 1. Epic parent path returned epicResult with no ModelName/TokensTotal (runner.go ~1243). The orchestrator never calls a backend, but the row is indistinguishable from "telemetry missing" downstream. 2. Hardcoded "claude-opus-4-6" fallback in runner.go (lines 1647, 2007) was stale (real CC runs report 4-7) and silently labelled OpenCode/ GLM runs as Claude Opus, biasing model-outcome cost-per-PR. 3. OpenCode SSE path (parseSSEStream) accumulated TokensInput/Output but dropped CacheCreationInputTokens / CacheReadInputTokens — parity gap vs. parseAssistantResponse for v1.4.x usage events. Changes: - Add Runner.fallbackModelName() — config.DefaultModel → OpenCode.Model → backend type. Used in epic result and the two stale hardcoded sites. - Set ModelName on epic orchestrator result so audit queries can split "epic, no backend call" from "telemetry missing". - Accumulate cache tokens in OpenCode SSE; openCodeUsage now accepts both flat (cache_creation_input_tokens) and nested (cache.{read,write}) layouts that different OpenCode builds emit. - Add Store.RecentCompletedTelemetryStats + Dispatcher.checkTelemetryGap: on startup, sample the last 50 completed runs (with commit_sha) and log a warning when ≥50% report tokens_total=0 — early signal that the backend's usage events aren't being parsed. - Tests: fallbackModelName resolution, SSE cache-token capture (flat + nested), telemetry-gap query (commit-only filter excludes epics). Verification: go build ./... + go test ./... clean.
|
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Automated PR created by Pilot for task GH-2428.
Closes #2428
Changes
GitHub Issue #2428: fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens
Problem
Audit of
~/.pilot/data/pilot.dbfor Apr 17–27 shows 110 of 168 executions (65%) recordtokens_total=0andestimated_cost_usd=0, including 94 completed runs with validcommit_sha/pr_url. Real work, no usage data.These 0-token rows also have empty
model_name, suggesting the SDK message stream never produced aresultevent with token usage thatrunner.gocould parse.Likely root cause
Pilot now runs alternate backends (OpenCode → GLM/Z.AI per recent commits like
0bc54c6f fix(executor): send OpenCode model as {providerID, modelID}). The token-aggregation code paths ininternal/executor/runner.go(lines 1642, 1942, 2010, 2148, 2487, 2687) appear tied to the Claude-Code SDK message format. If the OpenCode adapter doesn't emit equivalent usage events — or emits them in a different shape —state.tokensInput/tokensOutputstay at 0 andSaveExecutionMetricswrites zeros.Sample affected runs (real work, 0 tokens):
Why it matters
bench-status.py, dashboard, lifetime tokens via fix(dashboard): metrics cards show inconsistent data across restarts #533) understates actual spend. Recorded $104.85 vs. account-level ccusage $574 over the same window — Pilot looks like 18% of spend when actual share is much higher.estimated_cost_usdfor cost-per-PR — biased toward backends that don't report.gateway/budget.go) can't enforce per-task limits on backends with no token telemetry.Suggested fix
internal/executor/runner.gotoken-capture sites against each backend (Claude Code SDK, OpenCode, Codex). Confirm wherestate.tokensInput/OutputandcacheCreationInputTokensare set per backend.state.tokens*fields. If the backend doesn't emit usage, tokenize the prompt + output locally as a fallback estimate (mark asestimated).result.ModelNameeven when token usage is missing, so we can filter "telemetry-missing" runs from "true-zero" runs.tokens_total=0for a given backend, log a warning.Verification
After fix: rerun a known issue against each backend, confirm
tokens_total > 0andmodel_name != ''for all completed executions. Backfill not required.Refs
internal/executor/runner.go:1642,1942,2010,2148,2487,2687internal/executor/dispatcher.go:670(SaveExecutionMetrics call site)internal/memory/metrics.go:1690bc54c6f(OpenCode model object fix)