Skip to content

fix(chat): align WebUI context with messaging sessions#2547

Open
AJV20 wants to merge 7 commits into
nesquena:masterfrom
AJV20:fix/webui-context-parity
Open

fix(chat): align WebUI context with messaging sessions#2547
AJV20 wants to merge 7 commits into
nesquena:masterfrom
AJV20:fix/webui-context-parity

Conversation

@AJV20
Copy link
Copy Markdown
Contributor

@AJV20 AJV20 commented May 18, 2026

Summary

  • Adds WebUI browser-surface metadata as ephemeral context so the agent can distinguish WebUI sessions from messaging-platform transcripts without persisting that metadata to history.
  • Loads configured prefill_messages_file or prefill_messages_script for WebUI chat turns and passes it to Hermes Agent as ephemeral prefill context, matching the recall behavior operators expect from other Hermes surfaces.
  • Adjusts WebUI progress guidance to preserve the normal Hermes messaging style and avoid extra browser-only status chatter while still allowing concise progress updates for long tool runs.

Test Plan

  • python3 -m py_compile api/streaming.py api/routes.py api/config.py
  • python3 -m pytest tests/test_webui_surface_context.py tests/test_webui_prefill_context.py tests/test_sprint42.py -q (35 passed)

Notes

This is a focused context/tone parity PR for browser-originated chat turns. It keeps prefill message bodies out of the browser-visible context status event and out of saved WebUI transcripts.

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Read the PR head (540292a) end to end: api/streaming.py diff at lines 174–330 / 3653–3663 / 4027–4039, static/messages.js:1523-1538 + the run‑journal event list update at 2014, the two new test files, and the agent‑side contracts at agent/agent_init.py:118 / 397, agent/conversation_loop.py:803-814 / 1730-1734, gateway/run.py:2254-2266, hermes_cli/config.py:1239 / 2796. Wiring is clean and the agent side already accepts prefill_messages= at AIAgent construction plus tolerates older builds via the _agent_kwargs.pop('prefill_messages', None) guard at streaming.py:3796 — good defensive layering.

Three points to consider before merge.

1. prefill_messages_script executes on every turn — cache or gate it

_load_webui_prefill_context() is called inside _run_agent_streaming at streaming.py:3655 for every chat turn, and the script branch runs subprocess.run([exe, path], …, timeout=20.0). For a script that reads Joplin notes or hits a remote memory service, that's a 20s‑bounded synchronous call per assistant turn on the streaming worker before the SSE starts producing tokens. Two follow‑ups worth doing in this PR (or a fast follow):

  • Add a small TTL cache keyed on (path, mtime) so a recall.py that's cheap‑idempotent doesn't re‑shell per turn. 5–10s is plenty for typing cadence.
  • Document the perf contract in the prompt block (file vs script) and in CHANGELOG: file = cheap each turn, script = caller pays per‑turn cost.
# api/streaming.py:282-308 — runs unconditionally per turn
if script_raw:
    path = _resolve_prefill_path(script_raw)
    ...
    proc = subprocess.run(cmd, cwd=str(path.parent), env=env, ..., timeout=timeout)

2. env=None inherits the full WebUI process environment

_load_webui_prefill_context is called at _run_agent_streaming with no env= kwarg, so the subprocess inherits the WebUI server's entire environment — including every provider API key, OAuth token, gateway secret, and HERMES_* knob loaded into the parent. The script is user‑configured, so this is "trusted code" by definition, but the test you added at tests/test_webui_prefill_context.py:18-22 already demonstrates the safer pattern (env={"PATH": os.environ.get("PATH","")}). I'd promote that to the production call:

# api/streaming.py:3655 — current
_prefill_context = _load_webui_prefill_context(_cfg)

# suggested
_prefill_context = _load_webui_prefill_context(
    _cfg,
    env={"PATH": os.environ.get("PATH", ""), "HOME": os.environ.get("HOME", "")},
)

That keeps recall scripts from accidentally logging the parent's API keys into their own debug output, and matches the existing AGENTS.md guidance about not printing tokens through tool output.

3. _SECRET_SHAPED_RE is re.compile(...), please confirm rendering

The diff display shows the assignment compressed (_SECRET_SHAPED_RE=*** in some viewers) — I confirmed via raw bytes that the source is _SECRET_SHAPED_RE = re.compile(\n r"(?i)(api[_-]?key|...)..."\n) so this is just a GitHub diff‑renderer quirk, but worth knowing. Two minor regex notes:

  • Lone bearer‑style keys like a single sk-… string aren't matched — only when preceded by api[_-]?key|token|password|secret\s*[:=], or when shaped like a 3‑part JWT. A leaked sk-proj-… in a script's stderr would pass through.
  • The status text is already truncated to 240 chars by _redact_prefill_status_text, which limits exposure surface even if the regex misses.

Not blocking; the redactor is already much better than echoing raw stderr.

Other reads that look good

  • surface_context block at streaming.py:213-220 correctly skips empty Workspace/Profile lines, matching tests/test_webui_surface_context.py:28-37. The "not the same live transcript as Telegram, Discord, Slack" guardrail is well‑targeted for the "WebUI thinks it's the gateway" failure mode.
  • context_status is included in the run‑journal cursor list at static/messages.js:2014, so a reconnecting browser doesn't see a stale composer status. Good.
  • The ephemeral_system_prompt change is non‑persistent (verified at agent/system_prompt.py:220 and conversation_loop.py:803-806), so the new surface block won't bloat saved transcripts.
  • The prefill_messages injection at agent/conversation_loop.py:810-814 inserts after the system prompt and before history with pfm.copy() — no mutation hazard.

Test plan beyond the 35 already passing: (a) configure prefill_messages_script pointing to a 21s‑sleep script and verify the 20s timeout produces a status="error" payload (not a worker hang); (b) configure a prefill_messages_file whose JSON has a mix of valid and invalid roles and confirm _valid_prefill_messages drops the bad ones without poisoning the good ones; (c) verify _public_prefill_context_status never carries messages (already covered in test 2 by negative absence, but worth an explicit assertion).

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Phase 0 re-assessment — same subprocess surface concern as closed #2526

Thanks @AJV20 for the consolidated take that combines the surface-context block from the original #2523 with the prefill loader from #2526 into a single coherent PR. The agent review on this head (540292a) caught three good performance/security notes that should still be addressed (per-turn subprocess cost, env=None leaks the parent's keys, regex completeness).

I want to re-raise the architectural concern I raised on the closed #2526, because the surface here is the same — and the three flagged review notes are downstream symptoms of that surface choice:

_load_webui_prefill_context() runs subprocess.run([exe, path], …) on every WebUI chat turn based on whatever path is in prefill_messages_script in config.yaml or HERMES_PREFILL_MESSAGES_SCRIPT in the env. That's a new always-on per-turn shell-out path off a config value. Three things follow:

  1. Anyone with write access to config.yaml (or the env) gets code execution on every turn. WebUI already has the MCP layer for "run a thing when the agent asks" — adding a second always-on, per-turn, untyped shell-out off config means there are now two surfaces a misconfiguration or malicious config edit can hit.

  2. The agent's review notes — TTL caching, env= scoping, secret-regex completeness — are mitigations for a surface we probably shouldn't have added in this shape. Caching reduces the cost but doesn't address the surface; scoping env= reduces leak but the surface remains. If we do take this path, we need all three mitigations plus documentation that this is a power-user feature.

  3. The agent contract change (prefill_messages= kwarg) is still cross-repo coupling. The defensive _agent_kwargs.pop('prefill_messages', None) guard at streaming.py:3796 catches the absence gracefully, but it means the feature is silently dead-code on hermes-webui master until a paired hermes-agent release ships. We've avoided that shape before (see how MCP, memory-providers, and skills are wired) — they go through stable hermes-agent ABCs that already exist.

Suggested paths

  1. MCP path: route session recall through an MCP server (the standard "do a thing on demand" surface) instead of a per-turn subprocess. The agent already wires MCP tools; the user configures Joplin/equivalent as an MCP server. Surface-context still ships separately as the small ephemeral prompt block, no subprocess.
  2. File-only path: keep prefill_messages_file (bounded read, no execution), drop prefill_messages_script. Also keep the surface-context block. This is the minimal viable slice.
  3. Reject the prefill loader, keep the surface-context block: split fix(chat): align WebUI context with messaging sessions #2547 into two PRs. The surface-context block (4 fields, no execution surface) is a clean self-contained ship; the prefill loader gets its own design conversation.

Applying maintainer-review and pinging @nesquena for direction. The surface-context portion of this PR is a clean ship in any path forward — the question is just whether the prefill subprocess path goes with it.

Note: this isn't a re-litigation of feedback that's been addressed — the three agent review notes are valid and would still need to land. It's a layer above those: are we adding the right surface at all?

@nesquena-hermes nesquena-hermes added the maintainer-review Maintainer fit-assessment needed — may not merge even with fixes label May 18, 2026
@AJV20
Copy link
Copy Markdown
Contributor Author

AJV20 commented May 19, 2026

Updated this branch in 35da27b after the latest master merge.

What changed:

  • Resolved the CHANGELOG.md merge conflict against current origin/master.
  • Kept the existing prefill/surface-context hardening changes intact.

Verification:

  • python3 -m py_compile api/streaming.py
  • python3 -m pytest tests/test_webui_prefill_context.py tests/test_webui_surface_context.py tests/test_sprint42.py -q -o addopts= → 38 passed
  • GitHub Actions matrix is green for 3.11 / 3.12 / 3.13

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hold maintainer-review Maintainer fit-assessment needed — may not merge even with fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants