Skip to content

fix(executor): token/cost telemetry gap — 65% of executions report 0 tokens #2428

@alekspetrov

Description

@alekspetrov

Problem

Audit of ~/.pilot/data/pilot.db for Apr 17–27 shows 110 of 168 executions (65%) record tokens_total=0 and estimated_cost_usd=0, including 94 completed runs with valid commit_sha / pr_url. Real work, no usage data.

bucket       runs  status
-----------  ----  ---------
no_tokens     94   completed
no_tokens     16   failed
with_tokens   44   completed
with_tokens   14   failed

These 0-token rows also have empty model_name, suggesting the SDK message stream never produced a result event with token usage that runner.go could parse.

Likely root cause

Pilot now runs alternate backends (OpenCode → GLM/Z.AI per recent commits like 0bc54c6f fix(executor): send OpenCode model as {providerID, modelID}). The token-aggregation code paths in internal/executor/runner.go (lines 1642, 1942, 2010, 2148, 2487, 2687) appear tied to the Claude-Code SDK message format. If the OpenCode adapter doesn't emit equivalent usage events — or emits them in a different shape — state.tokensInput/tokensOutput stay at 0 and SaveExecutionMetrics writes zeros.

Sample affected runs (real work, 0 tokens):

exec_id task duration commit
87d57fa4… GH-2401 17m 457d79b
a52f2110… GH-2396
adf04ea8… GH-2392

Why it matters

  • Cost reporting (bench-status.py, dashboard, lifetime tokens via fix(dashboard): metrics cards show inconsistent data across restarts #533) understates actual spend. Recorded $104.85 vs. account-level ccusage $574 over the same window — Pilot looks like 18% of spend when actual share is much higher.
  • Pattern-learning model-outcome scoring uses estimated_cost_usd for cost-per-PR — biased toward backends that don't report.
  • Budget enforcement (gateway/budget.go) can't enforce per-task limits on backends with no token telemetry.

Suggested fix

  1. Audit internal/executor/runner.go token-capture sites against each backend (Claude Code SDK, OpenCode, Codex). Confirm where state.tokensInput/Output and cacheCreationInputTokens are set per backend.
  2. For OpenCode/GLM: parse the backend's usage event format and map to state.tokens* fields. If the backend doesn't emit usage, tokenize the prompt + output locally as a fallback estimate (mark as estimated).
  3. Set result.ModelName even when token usage is missing, so we can filter "telemetry-missing" runs from "true-zero" runs.
  4. Add a startup check: if last N completed executions all have tokens_total=0 for a given backend, log a warning.

Verification

After fix: rerun a known issue against each backend, confirm tokens_total > 0 and model_name != '' for all completed executions. Backfill not required.

Refs

  • internal/executor/runner.go:1642,1942,2010,2148,2487,2687
  • internal/executor/dispatcher.go:670 (SaveExecutionMetrics call site)
  • internal/memory/metrics.go:169
  • Recent commit 0bc54c6f (OpenCode model object fix)

Metadata

Metadata

Assignees

No one assigned

    Labels

    pilotPilot AI will work on thispilot-donePilot completed this

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions