Skip to content

Release P — v0.51.40 — 4-PR contributor batch (quota subprocess hardening + env-lock prewarm + cron one-shot warning + Xiaomi env key)#2037

Merged
nesquena-hermes merged 9 commits into
masterfrom
stage-334
May 11, 2026
Merged

Release P — v0.51.40 — 4-PR contributor batch (quota subprocess hardening + env-lock prewarm + cron one-shot warning + Xiaomi env key)#2037
nesquena-hermes merged 9 commits into
masterfrom
stage-334

Conversation

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Release P — v0.51.40 — 4-PR contributor batch

Theme: Quota subprocess hardening + env-lock prewarm + cron one-shot warning + Xiaomi env key. Mixed-author, mixed-surface batch.

PRs included

Verification

  • Pytest: 5100 passed, 8 skipped, 1 xfailed, 2 xpassed in 151.94s (Python 3.11, HERMES_HOME isolation)
  • Diff: ~769 insertions / 32 deletions across 20 files (mostly tests)
  • node --check clean on all touched .js files
  • Python AST clean on all touched .py files
  • No merge-conflict markers in any file

File collisions (stage merge)

Opus advisor verdict

SHIP-WITH-CAVEATS — code correct, well-isolated, tested. Caveats are follow-ups, not blockers. Full review posted as a separate comment.

Tests

5082 → 5100 (+18 net new across 4 new test files).

Holds untouched

7 PRs with hold label left untouched per explicit non-hold scope: #1418, #1721, #1884, #1924, #1970, #1975, #1997.

cc @nesquena — requesting independent review per self-built work policy.

@nesquena-hermes nesquena-hermes requested a review from nesquena May 11, 2026 00:14
@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Opus Advisor review — stage-334 (Release P)

Independent review by Claude Opus 4.7 (max thinking) on the merged stage diff (commit e5dc58b, 4 PRs, +769/-32). Pulled actual files; verified line-by-line against my brief.

Verification (Opus pulled actual files at e5dc58b)

PR #2030 (api/providers.py)

  • Semaphore cap is hardcoded _MAX_CONCURRENT_ACCOUNT_USAGE_PROBES = 2 at api/providers.py:49 (not min(4, cpu_count) as my brief claimed)
  • POSIX guard is if hasattr(os, "fork") at api/providers.py:621 (not sys.platform.startswith('linux')). macOS would also try to call prctl via _account_usage_preexec_fn, which fails inside the try/except → no-op. Safe.
  • ctypes.CDLL(None) (line 108 + bootstrap at line 76) loads main-program symbols, not libc.so.6. Better than the brief's version (works on Alpine/musl too)
  • No argtypes set on prctl; bare except: pass swallows any ABI mismatch. Acceptable for best-effort cleanup hook
  • Semaphore is held around the entire subprocess.run() (up to 35s). get_provider_quota is called from request-handler threads on a ThreadingHTTPServer. With cap=2, the 3rd+ concurrent quota poll waits up to 35s. Not a deadlock, but a UX caveat.

PR #2032 (api/streaming.py) — helper at lines 44-63, call site at 2270. Helper does try: __import__ except ImportError: pass. Net effect on non-ImportError failure is strictly better than before — exception raised before lock is taken, no lock-held-during-failure risk. Post-prewarm body uses sys.modules.get(...) with if _sk is not None guards. Headless tests are fine.

PR #2033 (static/panels.js) — verified at lines 234-251. Classifier named _cronScheduleKindForInput (brief said _classifyCronOneShot). Tests assertions match what's actually there. Listeners added inside _renderCronForm are safe because the input element is recreated on each render — no duplicate handler accumulation. UI-only, low blast radius.

PR #2034 — verified xiaomi exists in _PROVIDER_MODELS at api/config.py:1073, display map line 660, alias map lines 711-712, new env-var paths. api/providers.py:105 adds "xiaomi": "XIAOMI_API_KEY" — disjoint from #2030's hunks (49/67/85/621/664). Mirrors existing patterns cleanly.

Caveats to track (not blockers)

  1. Cap=2 is conservative. With 5-6 OAuth-backed providers and a Settings page that polls all of them, two threads will block up to 35s waiting on a slot. Follow-up: raise cap, make probe asyncio-based, or coalesce in-flight probes per provider. File against Subprocess pool refactor for profile-isolated quota probes (follow-up to #1873) #1912 follow-ups.
  2. Clarify one-shot cron schedule behavior #2033 has only string-presence tests — flagged in test-augmentation-and-validation-pitfalls.md. Behavioral risk is small for a hint div, but a JSDOM/Playwright check of display:none ↔ display:'' toggling on input is a worthwhile follow-up.
  3. fix: harden quota probe subprocess handling #2030 prewarm/preexec ctypes path is best-effort by design — if prctl ever changes ABI, you get orphan probes (current behavior pre-PR), not a crash. Document in runbook so on-call doesn't chase a regression.

Verdict

SHIP-WITH-CAVEATS → tag v0.51.40. Code is correct, well-isolated, tested. Caveats above are follow-ups for the next batch (especially the cap=2 UX question), not gates on this release.

Copy link
Copy Markdown
Owner

@nesquena nesquena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review — end-to-end ✅ (clean APPROVE, no fix pushed)

What this ships

Release P (v0.51.40) — 4-PR contributor batch, mixed surfaces:

  • #2030 (@Michaelyklam) — Hardens the account-usage quota probe subprocess path. Adds module-level BoundedSemaphore(_MAX_CONCURRENT_ACCOUNT_USAGE_PROBES=2), stdin=subprocess.DEVNULL, POSIX preexec_fn that calls prctl(PR_SET_PDEATHSIG, SIGTERM) via ctypes.CDLL(None), and a child-side _ACCOUNT_USAGE_PARENT_DEATHSIG_BOOTSTRAP prepended to the python -c payload. Slice 1 of N for #1912.
  • #2032 (@Michaelyklam) — Moves tools.skills_tool / tools.skill_manager_tool imports out of _ENV_LOCK. New _prewarm_skill_tool_modules() runs before the lock; in-lock path uses sys.modules.get(...) plus existing HERMES_HOME / SKILLS_DIR attribute patching. Closes #2024.
  • #2033 (@franksong2702) — Surfaces one-shot cron semantics in the Scheduled Jobs form. New _cronScheduleKindForInput() classifier, hidden warning div, live input/change listeners, 8 locale strings + 1 CSS class. Refs #2031.
  • #2034 (@franksong2702) — Closes the Xiaomi XIAOMI_API_KEY env-detection gap (issue #2025). Updates get_available_models() configured-badge env list + detector, _PROVIDER_ENV_VAR map, onboarding metadata. 6 existing test scrub-lists extended.

~769 insertions / 32 deletions across 20 files. CI green on 3.11/3.12/3.13.

Traced against upstream hermes-agent

Pulled a fresh tarball (/tmp/hermes-agent-fresh). Verified the load-bearing pieces:

  • Xiaomihermes_cli/auth.py:377-383 declares ProviderConfig(id="xiaomi", api_key_env_vars=("XIAOMI_API_KEY",), inference_base_url="https://api.xiaomimimo.com/v1"). hermes_cli/config.py:1799 lists XIAOMI_API_KEY in the env-var help registry. agent/model_metadata.py:361-362 maps both api.xiaomimimo.com and xiaomimimo.com to xiaomi. The webui's new env-var is what the CLI already expects — perfect round-trip. No webui-internal convention introduced.
  • Skill toolsagent/skill_commands.py:60,147,251 and agent/display.py:314 import tools.skills_tool / tools.skill_manager_tool from within the agent. The webui's prewarm just primes sys.modules for the same modules; nothing about the agent's expectations changes.

End-to-end trace

#2030 (api/providers.py)

  • Semaphore declaration at api/providers.py:49 (_MAX_CONCURRENT_ACCOUNT_USAGE_PROBES = 2), lazy bootstrap at api/providers.py:88-94.
  • _account_usage_preexec_fn() at api/providers.py:104-109 — bare except Exception: pass makes it safe on platforms without prctl (Alpine/musl works via ctypes.CDLL(None) which loads main-program symbols).
  • Subprocess kwargs assembled at api/providers.py:609-621. stdin=subprocess.DEVNULL is set unconditionally; preexec_fn only attached when hasattr(os, "fork") is True — Windows safely skipped.
  • Subprocess invocation at api/providers.py:622-630. The bootstrap _ACCOUNT_USAGE_PARENT_DEATHSIG_BOOTSTRAP is concatenated to the existing _ACCOUNT_USAGE_SUBPROCESS_CODE string and passed as the single -c payload. Bootstrap is a frozen string literal — no user input flows into the -c arg.
  • Semaphore acquired around the entire subprocess.run() at api/providers.py:664-670 (inside _fetch_account_usage_with_profile_context).

#2032 (api/streaming.py)

  • Helper at api/streaming.py:44-63__import__(name) with hardcoded module names, swallows ImportError only.
  • Called at api/streaming.py:2270before the first with _ENV_LOCK: at line 2275. AST regression test enforces this ordering invariant.
  • In-lock body at api/streaming.py:2294-2310 — replaces previous import tools.skills_tool as _sk (held the lock during first-import) with _sys.modules.get('tools.skills_tool') then guarded attribute patching. O(1) dict lookup, no import machinery under the lock.
  • Teardown lock at api/streaming.py:3753 is env-restore only — no imports there either, as the AST test confirms.

#2033 (static/panels.js)

  • Classifier at static/panels.js:234-244. Branch order: trim → empty → every prefix → @ prefix → 5-field cron → ISO date → bare duration → fallthrough.
  • _syncCronScheduleWarning() at static/panels.js:247-252. Idempotent: toggles display: '' / 'none' on cronFormScheduleOnceWarning.
  • Hidden warning div at static/panels.js:745 — text is esc(t('cron_schedule_once_warning') || "Duration forms like '30m' run once and are removed after running. Use 'every 30m' to keep a recurring job."). XSS-safe (esc() on the fallback literal).
  • Listeners attached at static/panels.js:785-787 inside _renderCronForm. Input element is recreated on each render, so no listener accumulation.

#2034 (api/config.py, api/onboarding.py, api/providers.py)

  • _build_configured_model_badges() inner function (nested in get_available_models() near api/config.py:2514) — XIAOMI_API_KEY added to env-tracked list at api/config.py:2763, detector at api/config.py:2799-2800.
  • _PROVIDER_ENV_VAR map at api/providers.py:177 adds "xiaomi": "XIAOMI_API_KEY". Disjoint from #2030's hunks (49/85/104/606-670).
  • Onboarding setup at api/onboarding.py:142-150label="Xiaomi MiMo", env_var="XIAOMI_API_KEY", default_model="mimo-v2.5-pro", default_base_url="https://api.xiaomimimo.com/v1" — matches the CLI's inference_base_url exactly.

Other audit — things that are correct already

Security

  • _ACCOUNT_USAGE_PARENT_DEATHSIG_BOOTSTRAP is a frozen string literal — no user data is interpolated into the -c argument. Concatenation with _ACCOUNT_USAGE_SUBPROCESS_CODE happens at module level. api_key continues to be passed as argv[2], same as before — not a regression.
  • New _account_usage_preexec_fn() swallows all exceptions silently. On macOS / non-Linux POSIX, prctl is missing → no-op → safe.
  • _prewarm_skill_tool_modules() only catches ImportError (not bare Exception). A SyntaxError in tools.skills_tool would now surface before the lock, which is strictly better than the pre-fix behaviour.
  • _cronScheduleKindForInput consumes a user-controlled string. All checks are string-shape predicates; no eval, no innerHTML, no template expansion of input. Warning text is esc()'d.
  • Xiaomi env-var is pure additive — adds a new fallback to a frozen allowlist; can't introduce a routing-shape regression.

Thread safety

  • _get_account_usage_probe_semaphore() lazily creates the singleton. Race on first creation could in theory create two semaphores; benign because the second _account_usage_probe_semaphore would simply replace the first and either object has the right cap. Worst case: a momentary cap of 3 instead of 2 during startup. Acceptable.
  • _prewarm_skill_tool_modules() calls __import__ — Python's import lock (_imp.acquire_lock()) serialises concurrent first-imports of the same module; safe.
  • _syncCronScheduleWarning() runs on the UI thread only.

Test isolation

  • 6 tests already scrubbing provider envs add XIAOMI_API_KEY to their scrub-lists. Tests now reproducible regardless of host env.

Behavioural harnesses

#2030 semaphore — spawned 4 threads contending for _get_account_usage_probe_semaphore(). Max concurrent holders observed: 2. _account_usage_preexec_fn() invoked directly returned silently. hasattr(os, 'fork') confirmed True on host.

#2033 cron classifier — 19 inputs against the extracted JS function:

✓ "30m"                  → once       ✓ "every 30m"             → interval
✓ "2h"                   → once       ✓ "Every 2h"              → interval
✓ "1 day"                → once       ✓ "every 1 hour"          → interval
✓ "45 mins"              → once       ✓ "EVERY 30m"             → interval
✓ "2026-05-11"           → once       ✓ "0 9 * * *"             → cron
✓ "2026-05-11T08:00"     → once       ✓ "*/15 * * * *"          → cron
                                       ✓ "@daily"                → cron
✓ ""                     → ""         ✓ "@hourly"               → cron
✓ "gibberish"            → ""
✓ "100"                  → ""         (bare number — no false-positive warning)
✓ "  "                   → ""

ALL PASS (19/19)

Edge-case matrix

Scenario Expected Actual
4 concurrent quota probes, cap=2 only 2 in-flight ✅ verified
_account_usage_preexec_fn() on macOS silent no-op ✅ verified
_account_usage_preexec_fn() on Linux calls prctl, swallow on ABI mismatch ✅ code review
_prewarm_skill_tool_modules() when modules missing silently catches ImportError ✅ confirmed
_ENV_LOCK body re-entered after fix no import tools.skills_tool inside ✅ AST test
Stale listener after _renderCronForm rerender input element recreated, no leak ✅ DOM contract
cron_schedule_once_warning XSS esc() on fallback literal ✅ source
XIAOMI_API_KEY round-trip CLI ↔ webui CLI expects same var ✅ hermes_cli/auth.py:382
XIAOMI_API_KEY not detected when env unset _provider_has_key('xiaomi') False ✅ test #2025
5-field cron * with comma list 0,15,30,45 * * * * → 'cron' ✅ harness
@daily / @hourly → 'cron' ✅ harness

Tests

  • PR-targeted: 28/28 pass (test_issue2024_env_lock_skill_imports.py, test_issue2025_xiaomi_env_key.py, test_issue2031_cron_once_visibility.py, new quota tests in test_provider_quota_status.py).
  • Full suite (local Python 3.14): 4991 passed, 59 skipped, 3 xpassed. 5 failures are pre-existing baseline failures in test_docker_env_readonly_vars.py::TestStartShReadonlyEnvFilterBehavioral (macOS bash 3.2 readonly-filter behaviour). PR does not touch docker_init.bash, docker_start.sh, or tests/test_docker_env_readonly_vars.py — confirmed by git diff master..stage-334.
  • PR's CI: 5100 passed / 8 skipped / 1 xfailed / 2 xpassed in 151.94s on Python 3.11 (per PR body); 3.11 / 3.12 / 3.13 all green.

Minor observations (non-blocking)

  1. Cap=2 vs Settings poll fan-out. Opus advisor already flagged: with 5-6 OAuth-backed providers polling concurrently, the 3rd+ thread blocks up to 35 s waiting on a slot. Comment at api/providers.py:43-47 is honest about this. Follow-up: raise cap or coalesce in-flight probes per provider. Tracked in #1912.
  2. _account_usage_preexec_fn lacks argtypes. libc.prctl(1, signal.SIGTERM) is called without setting prctl.argtypes. On a Linux that ever changes the prctl ABI, the bare except swallows it → orphan probes (same as pre-PR behaviour, not worse). Best-effort by design.
  3. Lock-body from pathlib import Path as _P / import sys as _sys. These remain inside _ENV_LOCK, but both are stdlib and warm in sys.modules long before — first-time cost is zero. Not worth moving.
  4. #2033 has only static + extracted-function coverage — no JSDOM/Playwright display:none ↔ display:'' toggle check. Behavioural risk is small for a hint div; fine as a follow-up.

Recommendation

Approved. Code is correct, well-isolated, cross-tool-consistent. The Opus advisor's SHIP-WITH-CAVEATS verdict matches what I traced: the cap=2 UX question is a follow-up, not a blocker.

✅ Parked at approval — ready for the release agent's merge/tag pipeline.

@nesquena-hermes nesquena-hermes merged commit 50acda3 into master May 11, 2026
3 checks passed
@nesquena-hermes nesquena-hermes deleted the stage-334 branch May 11, 2026 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move tools.skills_tool / tools.skill_manager_tool imports out of _ENV_LOCK in api/streaming.py

3 participants