Skip to content

feat(chat): 前端显示上下文真实token消耗,/usage命令结果与cli格式同步#1136

Open
GoldenFish123321 wants to merge 32 commits into
EKKOLearnAI:mainfrom
GoldenFish123321:feature/bridge-real-token-usage
Open

feat(chat): 前端显示上下文真实token消耗,/usage命令结果与cli格式同步#1136
GoldenFish123321 wants to merge 32 commits into
EKKOLearnAI:mainfrom
GoldenFish123321:feature/bridge-real-token-usage

Conversation

@GoldenFish123321

@GoldenFish123321 GoldenFish123321 commented May 29, 2026

Copy link
Copy Markdown
Contributor

问题

  • bridge 模式下,前端进度条和 /usage 命令都使用本地 tiktoken 估算 token,而非上游 API 返回的真实消耗。
  • /usage 只输出一行简单结果,和 CLI 的详细格式差距很大

根因

hermes agent conversation_loop.py:4156-4183 的返回值已包含完整 token 数据:

result = {
    "input_tokens": ..., "output_tokens": ...,
    "cache_read_tokens": ..., "cache_write_tokens": ...,
    "reasoning_tokens": ..., "total_tokens": ...,
    "api_calls": ..., "model": ...,
    "estimated_cost_usd": ..., "cost_status": ...,
}

bridge 原样透传到 chunk.result,但 TypeScript 端此前只用本地 countTokens() 重新估算。

效果

image

做了什么

提取真实 API 数据:新增 extractBridgeUsage()chunk.result 中提取上游 token 消耗,写入 BridgeUsageState。这个数据通过 run.completed / run.failed 事件发给前端,同时持久化到 sessions 表,页面刷新后也能恢复。

修正进度条:hermes agent 返回的 session_prompt_tokens 会在多轮 tool call 时重复累加共享上下文,进度条显示的值是实际占用的 N 倍。改用 last_prompt_tokens(单次调用的真实 prompt 消耗),进度条终于显示准确的上下文占用。

重写 /usage:输出格式完全对齐 TUI CLI,包含完整的字段列表、成本估算、上下文占用百分比、消息数和压缩次数。数据优先从 live bridge 读取,其次从 DB 恢复,最后回退到本地 tiktoken 估算。

防止数据污染:DB 回退时拿到的 session_* 是上一轮的累加值(可能几百万 token),不能直接写进进度条。通过 lastPromptTokens != null 守卫严格隔离两类数据:有 lastPromptTokens 的是 live 数据,可以放心用;没有的是 DB 回退数据,只用于 /usage 的静态展示,不会覆盖实时进度条。这个守卫覆盖了所有注入路径——run.completed/usage handler、切换会话的 resume、WebSocket 重连的 resume。

清理了几个边缘问题:每轮 run 开始清空 bridgeUsage 防上一轮失败数据泄露、updateSession() 跳过 undefined 参数防 SQLite binding error、loadSessions() 刷新时保留内存中的 apiUsagerun.failed 也提取 API 真实数据、duration 在 started_at 缺失时显示 N/A 而非 Date.now() 假值。

涉及文件

文件 改了什么
types.ts 新增 BridgeUsageState 接口
handle-bridge-run.ts extractBridgeUsage() 提取 token、updateSession() 持久化、每 run 清空 stale data
session-command.ts /usage 重写:live → DB → tiktoken 三层 fallback,格式对齐 CLI
index.ts resume payload 携带 bridgeUsage
schemas.ts sessions 表加 last_prompt_tokens 列(自动迁移)
session-store.ts updateSession() 跳过 undefined 参数
chat.ts (client) apiUsage 字段、提取 applyApiUsage()、resume/reconnect 路径统一恢复
chat.ts (api) ResumeSessionPayloadbridgeUsage 字段
ChatInput.vue 进度条优先读 apiUsage.lastPromptTokens

兼容性

  • 无 bridge 数据时自动 fallback 到原有 tiktoken 估算
  • 进度条 fallback 链:lastPromptTokensinputTokens+cachecontextTokensinputTokens+outputTokens

… command

- Extract real API token consumption from hermes agent's
  run_conversation() result (conversation_loop.py:4156-4183)
  instead of local tiktoken estimation
- Add BridgeUsageState interface for typed usage data
- Enhanced /usage command output to match hermes CLI format:
  model, input/output/cache/reasoning tokens, cost info,
  context window, messages, compressions
- Frontend now displays actual API input/output tokens
  when bridge mode apiUsage data is available
- Fallback to local tiktoken estimate when bridge data
  is not yet available (old sessions, error cases)
When upstream API usage data is available (bridge mode), use
apiUsage.inputTokens as contextTokens for the progress bar display
instead of the local tiktoken estimate. API input_tokens already
includes system prompt + tools + messages — the actual context
window consumption.

Also guard all mid-run contextTokens updates (usage.updated,
compression.completed) against overwriting apiUsage-based values.
Instead of patching contextTokens in 6 event handlers with fragile
guards, simply make the progress bar (ChatInput.vue totalTokens)
check apiUsage.inputTokens first. When upstream API data is available,
it's the authoritative context window consumption — no syncing needed.
Previously the 'Current context' line showed the local tiktoken
estimate (e.g. 15,073), which didn't match the API input_tokens
displayed above (e.g. 16,928). Now it prefers bu.inputTokens.
- After run.completed, update session store with bridge API
  token values (input/output/cache/reasoning/cost) so they
  survive page reloads.
- When /usage falls back to DB because bridgeUsage is null
  (e.g. after reload), reconstruct BridgeUsageState from
  the persisted session row. Missing fields (prompt_tokens,
  completion_tokens, api_calls, cost_source) are omitted
  from output rather than shown as zero.
- Add BridgeUsageState type annotation on dbBu
- Add Prompt/Completion/API calls/Cost source lines (N/A)
  to keep output format consistent with live bridge path
Extract resolveBridgeUsageFromDb() and formatCost() helpers.
Resolve bu via a single ?? chain: live state → DB fallback.
Cut 55 lines of duplicated message construction.
…inding error

When estimatedCostUsd or actualCostUsd is undefined (bridge result
lacks cost data), updateSession silently pushed undefined into the
SQLite parameter array causing 'Provided value cannot be bound'.
When all input tokens are cache hits, input_tokens can be 0 but
prompt_tokens reflects the real context window consumption including
cache reads. Use promptTokens as the primary context display value.
- Fix ||/?? mixing syntax error in session-command.ts
- ChatInput.vue: compute promptTokens from input+cache_read+cache_write
  instead of checking apiUsage.inputTokens > 0 (fails when all cached)
…ssion

- Drop !row.input_tokens check: input_tokens can legitimately be 0
  when the entire prompt is served from cache. cost_status alone
  is sufficient to detect persisted bridge data.
- Use ?? instead of || for estimatedCostUsd to preserve 0.
- Reuse sessionRow from the ?? chain to avoid duplicate DB read.
When a run in the same session fails before any chunk is processed
(e.g. bridge connection drops), state.bridgeUsage still holds the
previous successful run's data. The run.failed event then includes
stale apiUsage, misleading the frontend.

Clear bridgeUsage along with the other bridge state in the reset block.
… duration

- Extract applyApiUsage() helper to eliminate duplicated token
  extraction across 4 run.* handlers (2 completed, 2 failed)
- Add apiUsage handling to both run.failed handlers so failed
  runs that do have bridge usage data (e.g. terminal error after
  chunk processing) surface real API token consumption
- Fix session duration showing bogus 0s when started_at is null;
  now shows 'N/A' instead of computing from Date.now() fallback
- Document resolveBridgeUsageFromDb hardcoded-zero fields and
  state.bridgeUsage caching behavior
RunEvent is an interface without an index signature, incompatible
with Record<string,unknown> under strict TypeScript checks. Use any
to match the existing pattern used by all event handlers in chat.ts.

Also remove redundant (evt as any) casts now that evt is already any.
loadSessions() recreates all Session objects from the API response,
only preserving messages and contextTokens from old objects. This
causes apiUsage (upstream API token data) to be lost when switching
sessions or refreshing the session list, making the progress bar
fall back to local tiktoken estimates.

Add apiUsage to the runtime preservation map so it survives session
list refreshes.
…on client

Server's resumeSession() sent inputTokens/outputTokens/contextTokens
(local tiktoken estimates) but omitted bridgeUsage (real API data).
When the client switched sessions or refreshed the page, apiUsage was
lost and the progress bar fell back to local estimates.

Now:
- Server includes state.bridgeUsage in the resumed payload
- Client rebuilds apiUsage from bridgeUsage on resume, keeping the
  progress bar accurate without needing /usage to repair it
…counting

session_prompt_tokens is a cumulative accumulator across all API
calls within a turn. When there are multiple tool calls, it sums
the shared context N times (e.g. 17k + 17k = 34k instead of 17k).

hermes-agent already exposes last_prompt_tokens (line 4179 in
conversation_loop.py) — the prompt cost of the single most recent
API call, which IS the true context window consumption.

- Add lastPromptTokens to BridgeUsageState and client apiUsage
- Extract r.last_prompt_tokens in extractBridgeUsage
- ChatInput.vue: prefer apiUsage.lastPromptTokens > cumulative sum
- Thread through /usage emit and resume handler
Same issue as the progress bar — session_prompt_tokens is cumulative
across all API calls within a turn, double-counting shared context.
Prefer lastPromptTokens (single-call) for the context display line.
…lback

|| treats 0 as falsy, so lastPromptTokens=0 would incorrectly
fall through to the cumulative promptTokens. Use an explicit
!= null && > 0 guard so only undefined/null/0 triggers the
fallback chain.
- Remove Prompt tokens (total) — redundant, = input + cache
- Remove Completion tokens — redundant, = output_tokens
- Remove Cost status / Cost source — merge into Cost line
- Remove Pricing unknown note — Cost line already shows n/a
- Remove separator line before Current context
- Cache read/write tokens only shown when >0 (like CLI)
- Compressions only shown when >0 (like CLI)
- Session duration only shown when valid
- formatCost: drop '(estimated)' suffix, add 'included' status
- Keep Reasoning tokens, Messages, Session duration as useful extras
CLI shows Prompt tokens (total), Completion tokens, Cost status/source,
separator lines, and always-displayed cache rows. Restore all of them.

Use column-aligned padding matching CLI's wide-column layout.
… data

DB fallback carries cumulative/inflated session_* counters from the
previous turn (e.g. cache_read_tokens=5.3M). When the /usage handler
blindly builds apiUsage from these, the progress bar explodes.

Use isLive = (lastPromptTokens != null) as the gate. DB fallback
does not set lastPromptTokens, so it never overwrites apiUsage.
Live bridge data (which has lastPromptTokens) passes through.
- Reasoning tokens: only shown if >0, labeled '↳ Reasoning (subset):'
- Add 'Note: Pricing unknown for {model}' at bottom when applicable
- Add last_prompt_tokens column to sessions table schema
- Add field to HermesSessionRow interface
- Write lastPromptTokens in updateSession() call
- Read it back in resolveBridgeUsageFromDb() so the progress bar
  and /usage Current context survive page reload without falling
  back to cumulative session_* values
…teSession

These construct HermesSessionRow objects and were missing the
newly added last_prompt_tokens field, causing TS2741 errors.
@GoldenFish123321

GoldenFish123321 commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

修了一些东西


进度条修正

hermes agent 返回的 session_prompt_tokens 会在多轮 tool call 时重复累加共享上下文,进度条显示的值是实际占用的 N 倍。改成取 last_prompt_tokens(最后一次 API 调用的真实 prompt 消耗),进度条终于显示准确的上下文占用。/usage 的 Current context 行也做了同样的修正。

DB 也加了 last_prompt_tokens 列,页面刷新后不会退化成累加值。

数据源隔离

live bridge 数据和 DB 回退数据不能混用。DB 回退时拿到的 session_* 是上一轮的累加值(可能几百万 token),直接写进进度条就炸了。

所有 apiUsage 的注入点——run.completed/usage handler、切换会话的 resume、WebSocket 重连的 resume——统一用 lastPromptTokens != null 做闸门:有就是 live 数据,放心更新;没有就不碰 apiUsage,让进度条走现有 fallback。

DB 变更

sessions 表加了一列 last_prompt_tokenssyncTable 自动迁移)。updateSession() 也修了一个 binding error:bridge 返回的 cost 字段可能是 undefined,直接塞给 SQLite prepared statement 会报错,现在循环里遇到 undefined 直接跳过。

恢复路径

所有恢复路径现在都能正确处理 apiUsage

  • 切换会话switchSession 的 resume 回调从 bridgeUsage 重建
  • WS 重连applyReconnectResume 同上(之前漏了这个路径)
  • 标签页恢复:只刷 messages,不碰 token(本来就没问题)
  • loadSessions 刷新:保留旧 session 的 apiUsage 不丢

ResumeSessionPayload 接口同步加了 bridgeUsage 字段,之前一直靠 (data as any) 拿。

/usage 格式对齐

输出格式完全对齐 TUI CLI(cli.py:10206-10234),包括完整字段列表、成本估算、上下文占用百分比。细节上 Reasoning (subset)、cache 行、compression 只在有值时显示,unknown 定价时补一句提示。

边缘修复

  • 每轮 run 开始清空 bridgeUsage 防上一轮失败数据泄露
  • run.failed 现在也提取 API 真实数据(之前只用了本地估算)
  • loadSessions() 刷新时保留 apiUsage(之前刷新后进度条跳变)
  • duration 在 started_at 缺失时显示 N/A 而非 Date.now() 假值
  • /usage 多行输出用 \x0a 替代 \n 防 JSON 二次转义

@GoldenFish123321 GoldenFish123321 changed the title feat(bridge): 前端显示上下文真实token消耗,/usage命令结果与cli格式同步 feat(chat): 前端显示上下文真实token消耗,/usage命令结果与cli格式同步 May 30, 2026
@GoldenFish123321

GoldenFish123321 commented May 30, 2026

Copy link
Copy Markdown
Contributor Author

Same issue as the /usage handler — resume payload includes
bridgeUsage which may carry stale cumulative session_* values
without lastPromptTokens. Only rebuild apiUsage when
lastPromptTokens is available (live bridge data).
… resume

applyReconnectResume() was missing the bridgeUsage → apiUsage
rebuild that switchSession's resume callback has. Add it with
the same lastPromptTokens guard to prevent stale cumulative
values from polluting the progress bar.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant