Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,9 @@ Environment variables controlling behavior:
HERMES_WEBUI_DEFAULT_MODEL Optional model override; unset means provider default
HERMES_WEBUI_PASSWORD Optional: enable password auth (off by default)
HERMES_WEBUI_SKIP_ONBOARDING Optional: bypass the first-run onboarding wizard
HERMES_PREFILL_MESSAGES_FILE Optional JSON message list for browser-turn prefill context
HERMES_WEBUI_PREFILL_MESSAGES_SCRIPT Optional command that prints JSON messages or text prefill context
HERMES_WEBUI_PREFILL_MESSAGES_SCRIPT_TIMEOUT Optional script timeout in seconds (default 5, max 30)
HERMES_HOME Base directory for Hermes state (~/.hermes by default)

Test isolation environment variables (set by conftest.py):
Expand Down
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,17 @@

## [Unreleased]

## [v0.51.141] — 2026-05-26 — Release DM (stage-batch23 — 4-PR second hold-bucket pass)

### Added

- WebUI can now opt into a `webui_prefill_messages_script` / `HERMES_WEBUI_PREFILL_MESSAGES_SCRIPT` hook for dynamic browser-turn prefill context from local notes or recall systems. The script output is capped at 256 KiB, normalized to ephemeral prefill messages, and browser status still hides message bodies while redacting script errors.
- Added a read-only WebUI/CLI session source switch in the chat sidebar when agent session sync is enabled. WebUI conversations stay in the default list, while imported CLI/agent sessions are surfaced under a separate `CLI sessions` tab with counts so large CLI histories do not clutter the normal conversation list. (Refs #2351)

### Fixed

- Compact tool activity now keeps visible interim assistant progress in the live Session timeline instead of making that progress effectively collapsed-only inside Activity details. The interim assistant stream path creates and flushes a visible assistant segment before resetting for later tool/compression activity.

## [v0.51.140] — 2026-05-26 — Release DL (stage-batch22 — 5-PR hold-bucket reassessment)

### Fixed
Expand Down Expand Up @@ -113,13 +124,18 @@
## [v0.51.132] — 2026-05-24 — Release DD (stage-batch14 — 4-PR replayed-context + interrupted-response + shutdown affordance + passkey opt-in)

### Added
- **Cursor ACP provider integration** — Add `cursor-acp` to the WebUI model picker and route slash model IDs (for example `cursor/composer-2.5`) through explicit `@cursor-acp:` provider hints so they do not fall through to the configured default HTTP provider.

- **PR #2859** by @AJV20 — Optional passkey/WebAuthn sign-in for password-protected WebUI instances. Authenticated users can register/remove passkeys from Settings -> System, and `/login` shows a passwordless sign-in button only after a passkey exists. Password auth remains the default-off bootstrap and recovery path. **Opt-in default-off behind `HERMES_WEBUI_PASSKEY=1` env var or `webui_passkey_enabled: true` config flag** — when disabled, the UI block hides, all 6 `/api/auth/passkey/*` endpoints return 404, and `is_auth_enabled()` ignores any pre-existing credential file so the auth posture cannot silently flip if the flag is unset later.

- **PR #2824** by @gavinssr — A "Stop server" affordance in Settings → System that gracefully shuts down the local WebUI server. Useful when WebUI was launched via `./ctl.sh start` or the native macOS/Windows app and the user wants to stop it without context-switching to a terminal. Confirmation dialog before the actual shutdown. The `/api/shutdown` route is CSRF-gated and intended for local-loopback use. Originally a title-bar button; relocated to Settings per the project's deep-UX rule (default-hidden for niche destructive actions on always-visible surfaces).

### Fixed

### Fixed
- **Reasoning effort chip visibility** — `/api/reasoning` now accepts `model` and `provider` query params and returns `supported_efforts` so the composer chip hides for models without configurable reasoning levels (for example Cursor Composer) while remaining available for models like GPT-5.5. Model picker changes now re-sync the chip after the session model/provider update instead of querying with stale session state. Composer dropdown selections now pass the provider id into `selectModelFromDropdown()` so duplicate bare model ids (for example `gpt-5.5` under OpenAI Codex vs OpenRouter) no longer fall back to the profile default provider when refreshing the chip.
- **Cursor ACP routing and new-chat defaults** — New conversations now carry the visible composer picker selection into `POST /api/session/new`, persist model changes before a session exists, and evict cached session agents when the model/provider changes mid-session.

- **PR #2685** by @LumenYoung — Prevent replayed context in chat reconciliation and metering. When a WebUI session is recovered (e.g., after a process restart, network drop, or browser reload), the sidebar/`state.db` reconciliation logic walks the sidecar transcript in order and only skips rows that can actually be aligned with the remaining sidecar context. The prior set-membership check was too broad: a legitimate fresh message that happened to share a key with any older repeated short message in the sidecar was mis-classified as already-seen and dropped from the replay, leading to lost context and inconsistent metering. Also caps the per-turn live-tool-prompt token estimate at 12,000 to prevent unbounded growth on bursts of large tool reads before exact provider accounting overrides.

- **PR #2739** by @ai-ag2026 — Clarify `Response interrupted` recovery markers so they report that the live response stream stopped instead of asserting that the WebUI process restarted. The recovery path now records distinct interruption causes for real process restarts, stream/run split-brain, and lost worker bookkeeping; browser-side SSE transport failures show a separate `Connection interrupted` message, client-side `BrokenPipeError` disconnects no longer get logged as server 500s, and chat/gateway SSE errors emit rate-limited (30 events / 60s / 4KB body cap), sanitized client diagnostics to `/api/client-events/log` for future root-cause checks. The stream-status `terminal_state` value for lost-worker bookkeeping changes from `stale-from-restart` to `lost-worker-bookkeeping`, matching the new non-restart wording.
Expand Down
38 changes: 38 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,44 @@ For self-hosted VM or homelab installs, `ctl.sh` wraps the common daemon lifecyc

`ctl.sh start` runs the bootstrap in foreground/no-browser mode behind the daemon wrapper, writes logs to `~/.hermes/webui.log`, and respects `.env` plus inline overrides such as `HERMES_WEBUI_HOST=0.0.0.0 ./ctl.sh start`.

### Optional session recall prefill

WebUI can attach ephemeral prefill messages to new browser-originated
agent turns. This is useful when a deployment already has a local recall or
router script for Joplin, Obsidian, Notion, llm-wiki, or another third-party
notes source and wants browser chat to know where durable context lives.

Prefer a compact router-style prefill (for example, "Joplin has the durable
project context; use the available notes/search tools before answering
detail-dependent questions") instead of dumping the full note corpus into every
new browser session. The prefill should point the agent toward retrieval; the
notes/search tools should provide the specific facts on demand.

Static JSON remains supported through `prefill_messages_file` or
`HERMES_PREFILL_MESSAGES_FILE`. For dynamic recall, opt in explicitly with a
WebUI-specific script hook:

```yaml
webui_prefill_messages_script:
- python3
- /path/to/notes_recall.py
webui_prefill_messages_script_timeout: 5
```

or:

```bash
HERMES_WEBUI_PREFILL_MESSAGES_SCRIPT="python3 /path/to/notes_recall.py" \
HERMES_WEBUI_PREFILL_MESSAGES_SCRIPT_TIMEOUT=5 \
./ctl.sh restart
```

The script may print either an OpenAI-style JSON message list, a JSON object with
a `messages` list, or plain text; plain text is wrapped as one `system` prefill
message. Script output is capped at 256 KiB before parsing. The browser only
receives a compact status event (`source`, `label`, message count, and redacted
errors), never the prefill message bodies.

The bootstrap will:

1. Detect Hermes Agent and, if missing, attempt the official installer (`curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash`).
Expand Down
142 changes: 141 additions & 1 deletion api/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -709,6 +709,7 @@ def _resolve_cli_toolsets(cfg=None):
"openai-codex": "OpenAI Codex",
"xai-oauth": "xAI Grok OAuth",
"copilot": "GitHub Copilot",
"cursor-acp": "Cursor ACP",
"zai": "Z.AI / GLM",
"kimi-coding": "Kimi / Moonshot",
"deepseek": "DeepSeek",
Expand Down Expand Up @@ -1131,6 +1132,13 @@ def _named_custom_provider_slug_for_base_url(
{"id": "claude-sonnet-4.6", "label": "Claude Sonnet 4.6"},
{"id": "gemini-3-flash-preview", "label": "Gemini 3 Flash Preview"},
],
# Cursor ACP — models served via Cursor CLI agent acp
"cursor-acp": [
{"id": "cursor/composer-2.5", "label": "Composer 2.5"},
{"id": "cursor/composer-2", "label": "Composer 2"},
{"id": "cursor/default", "label": "Default"},
{"id": "cursor-acp", "label": "Cursor ACP"},
],
# OpenCode Zen — curated models via opencode.ai/zen (pay-as-you-go credits)
"opencode-zen": [
{"id": "gpt-5.4-pro", "label": "GPT-5.4 Pro"},
Expand Down Expand Up @@ -1987,6 +1995,12 @@ def _resolve_key(raw_api_key, raw_key_env, provider_hint=None) -> str | None:
return None, None


# Subprocess ACP transports (Cursor/Copilot CLI). Model IDs often contain '/'
# but must still route via explicit @provider:model so they do not fall through
# to the configured default HTTP provider (e.g. openai-codex).
_ACP_SUBPROCESS_PROVIDERS = frozenset({"cursor-acp", "copilot-acp"})


def model_with_provider_context(model_id: str, model_provider: str | None = None) -> str:
"""Return the model string to pass to ``resolve_model_provider()``.

Expand All @@ -2006,6 +2020,11 @@ def model_with_provider_context(model_id: str, model_provider: str | None = None
if isinstance(model_cfg, dict):
config_provider = str(model_cfg.get("provider") or "").strip().lower()

# ACP subprocess providers always need the explicit hint — their slash IDs
# are not OpenRouter paths and must not inherit config_provider routing.
if provider in _ACP_SUBPROCESS_PROVIDERS:
return f"@{provider}:{model}"

# If the selected provider is already the configured provider, leaving the
# model bare preserves provider-specific base_url/proxy settings.
if provider == config_provider:
Expand Down Expand Up @@ -2069,7 +2088,121 @@ def parse_reasoning_effort(effort):
return None


def get_reasoning_status() -> dict:
def _strip_provider_hint_for_reasoning(model_id: str) -> str:
"""Remove WebUI routing hints before provider-specific capability lookup."""
model = str(model_id or "").strip()
if model.startswith("@") and ":" in model:
return model.split(":", 1)[1]
return model


def _heuristic_reasoning_efforts(model_id: str, provider_id: str) -> list[str]:
"""Fallback when hermes_cli is unavailable."""
model = _strip_provider_hint_for_reasoning(model_id).lower()
provider = _resolve_provider_alias(str(provider_id or "").strip().lower())
if not model or provider in {"cursor-acp", "copilot-acp"}:
return []
bare = model.rsplit("/", 1)[-1]
if provider == "openai-codex" and bare.startswith(("gpt-5", "o1", "o3", "o4")):
if bare.startswith(("o1", "o3", "o4")):
return ["low", "medium", "high"]
return list(VALID_REASONING_EFFORTS)
if provider in {"copilot", "github-copilot"}:
if bare.startswith(("gpt-5", "o1", "o3", "o4")):
if bare.startswith(("o1", "o3", "o4")):
return ["low", "medium", "high"]
return list(VALID_REASONING_EFFORTS)
prefixes = (
"deepseek/",
"anthropic/",
"openai/",
"x-ai/",
"google/gemini-2",
"google/gemma-4",
"qwen/qwen3",
"tencent/hy3-preview",
"xiaomi/",
)
if any(model.startswith(prefix) for prefix in prefixes):
return list(VALID_REASONING_EFFORTS)
return []


def resolve_model_reasoning_efforts(
model_id: str | None = None,
provider_id: str | None = None,
base_url: str | None = None,
) -> list[str]:
"""Return supported reasoning-effort levels for *model_id*, or [] if none."""
model = str(model_id or "").strip()
if not model:
return []

provider = str(provider_id or "").strip().lower() if provider_id else ""
resolved_base_url = str(base_url or "").strip() or None
if not provider:
try:
_, provider, resolved_base_url = resolve_model_provider(model)
except Exception:
provider = str((cfg.get("model") or {}).get("provider") or "").strip().lower()

provider = _resolve_provider_alias(provider)
if provider in {"cursor-acp", "copilot-acp"}:
return []

try:
from hermes_cli.models import (
github_model_reasoning_efforts,
lmstudio_model_reasoning_options,
)
except Exception:
return _heuristic_reasoning_efforts(model, provider)

hinted_model = _strip_provider_hint_for_reasoning(model)
if provider in {"copilot", "github-copilot"}:
return github_model_reasoning_efforts(hinted_model)

if provider == "openai-codex":
bare = hinted_model.rsplit("/", 1)[-1]
return github_model_reasoning_efforts(bare)

if provider == "lmstudio":
probe_base = resolved_base_url or _get_provider_base_url(provider)
opts = lmstudio_model_reasoning_options(model, probe_base)
normalized = [str(opt).strip().lower() for opt in opts if str(opt).strip()]
if not normalized or set(normalized).issubset({"off"}):
return []
level_opts = [opt for opt in normalized if opt in VALID_REASONING_EFFORTS]
if level_opts:
return list(dict.fromkeys(level_opts))
if set(normalized).issubset({"off", "on"}):
return []
return []

model_lower = model.lower()
prefixes = (
"deepseek/",
"anthropic/",
"openai/",
"x-ai/",
"google/gemini-2",
"google/gemma-4",
"qwen/qwen3",
"tencent/hy3-preview",
"xiaomi/",
)
if any(model_lower.startswith(prefix) for prefix in prefixes):
return list(VALID_REASONING_EFFORTS)

return []


def get_reasoning_status(
*,
model_id: str | None = None,
provider_id: str | None = None,
base_url: str | None = None,
) -> dict:
"""Return current reasoning configuration from the active profile's
config.yaml — the same source of truth the CLI reads from.

Expand All @@ -2082,10 +2215,17 @@ def get_reasoning_status() -> dict:
agent_cfg = config_data.get("agent") or {}
show_raw = display_cfg.get("show_reasoning") if isinstance(display_cfg, dict) else None
effort_raw = agent_cfg.get("reasoning_effort") if isinstance(agent_cfg, dict) else None
supported_efforts = resolve_model_reasoning_efforts(
model_id,
provider_id=provider_id,
base_url=base_url,
)
return {
# Match CLI default (True if unset in config.yaml)
"show_reasoning": bool(show_raw) if isinstance(show_raw, bool) else True,
"reasoning_effort": str(effort_raw or "").strip().lower(),
"supported_efforts": supported_efforts,
"supports_reasoning_effort": bool(supported_efforts),
}


Expand Down
16 changes: 15 additions & 1 deletion api/routes.py
Original file line number Diff line number Diff line change
Expand Up @@ -4019,7 +4019,18 @@ def handle_get(handler, parsed) -> bool:
# Current reasoning config (shared source of truth with the CLI —
# reads display.show_reasoning and agent.reasoning_effort from
# the active profile's config.yaml).
return j(handler, get_reasoning_status())
query = parse_qs(parsed.query)
model_id = (query.get("model", [""])[0] or "").strip() or None
provider_id = (query.get("provider", [""])[0] or "").strip() or None
base_url = (query.get("base_url", [""])[0] or "").strip() or None
return j(
handler,
get_reasoning_status(
model_id=model_id,
provider_id=provider_id,
base_url=base_url,
),
)

if parsed.path == "/api/onboarding/status":
return j(handler, get_onboarding_status())
Expand Down Expand Up @@ -5416,6 +5427,9 @@ def handle_post(handler, parsed) -> bool:
)
s.threshold_tokens = 0
s.last_prompt_tokens = 0
from api.config import _evict_session_agent

_evict_session_agent(body["session_id"])
s.save()
if str(old_ws or "") != str(new_ws or ""):
try:
Expand Down
Loading
Loading