Skip to content

fix(onboard): announce and recover declared agent forward_ports#5389

Merged
cv merged 3 commits into
mainfrom
fix/hermes-openai-api-port-forward
Jun 13, 2026
Merged

fix(onboard): announce and recover declared agent forward_ports#5389
cv merged 3 commits into
mainfrom
fix/hermes-openai-api-port-forward

Conversation

@laitingsheng

@laitingsheng laitingsheng commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

Hermes onboard declares forward_ports: [18789, 8642] but the dashboard summary only printed the primary port and process recovery only re-established the primary forward. After the OpenShell gateway restarted during policy-preset apply, the secondary OpenAI-compatible API forward on port 8642 was silently dropped and never restored.

Related Issue

Fixes #5206

Changes

  • printDashboardUi now walks agent.forward_ports and emits a labelled block per non-primary entry.
  • checkAndRecoverSandboxProcesses now invokes a new ensureDeclaredAgentForwardPortsHealthy helper in all three branches.
  • Regression tests cover the print output and the recovery loop.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • New Features

    • Sandbox recovery now re-establishes secondary agent-declared port forwards in addition to the primary dashboard forward.
    • Dashboard/onboarding output now shows forwarded OpenAI-compatible API and other secondary endpoint URLs with forwarding notes.
  • Tests

    • Added tests verifying secondary forwards are detected and re-established when missing and that onboarding prints secondary forward URLs correctly.
  • Documentation

    • Quickstart and troubleshooting guides updated to show the forwarded OpenAI-compatible API and how to verify/recover missing forwards.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4e49cadd-2cdd-4d22-bf72-86b6a5efd6f5

📥 Commits

Reviewing files that changed from the base of the PR and between 4883743 and 6ea01a9.

📒 Files selected for processing (4)
  • docs/reference/troubleshooting.mdx
  • src/lib/agent/onboard.test.ts
  • src/lib/agent/onboard.ts
  • test/process-recovery.test.ts
✅ Files skipped from review due to trivial changes (1)
  • docs/reference/troubleshooting.mdx
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/lib/agent/onboard.ts
  • test/process-recovery.test.ts

📝 Walkthrough

Walkthrough

Recovery now re-establishes every non-primary manifest-declared agent forward port during gateway-related sandbox recovery. Onboard output prints all non-primary declared ports (labeling API ports specially) with per-port forwarding URLs and /v1 API path where applicable.

Changes

Agent Forward Port Recovery and Announcement

Layer / File(s) Summary
Forward port recovery helper and integration
src/lib/actions/sandbox/process-recovery.ts, test/process-recovery.test.ts
Adds ensureDeclaredAgentForwardPortsHealthy to probe and restart non-primary agent-declared forward_ports. Invoked across gateway-alive (dashboard-missing), gateway-alive (non-occupied), and gateway-restart paths; failures are logged and aggregated into the function's forwardRecovered result. Includes test verifying secondary forward start and skipping primary.
Onboard UI forward port announcement
src/lib/agent/onboard.ts, src/lib/agent/onboard.test.ts, docs/get-started/quickstart-hermes.mdx, docs/reference/troubleshooting.mdx
Adds printAdditionalForwardPorts to validate and display non-primary agent.forward_ports. Ports matching agent.healthProbe.port are labeled "OpenAI-compatible API" and rendered with /v1; URLs are normalized, deduplicated, and integrated into all dashboard announcement branches. Tests and docs updated.

Sequence Diagram(s)

sequenceDiagram
  participant checkAndRecoverSandboxProcesses
  participant ensureDeclaredAgentForwardPortsHealthy
  participant forwardHealthProbe
  participant OpenShell
  participant result

  checkAndRecoverSandboxProcesses->>ensureDeclaredAgentForwardPortsHealthy: inspect active agent forward_ports
  ensureDeclaredAgentForwardPortsHealthy->>forwardHealthProbe: probe each non-primary port
  alt port missing or unhealthy
    ensureDeclaredAgentForwardPortsHealthy->>OpenShell: forward start for port
    OpenShell-->>ensureDeclaredAgentForwardPortsHealthy: forward started
  end
  ensureDeclaredAgentForwardPortsHealthy-->>checkAndRecoverSandboxProcesses: return true/false/null
  checkAndRecoverSandboxProcesses->>result: include declaredForwardsRecovered in forwardRecovered
Loading
sequenceDiagram
  participant printDashboardUi
  participant printAdditionalForwardPorts
  participant buildControlUiUrls
  participant output

  printDashboardUi->>printAdditionalForwardPorts: validate agent.forward_ports
  printAdditionalForwardPorts->>buildControlUiUrls: generate per-port control UI URLs
  buildControlUiUrls-->>printAdditionalForwardPorts: dashboard link(s)
  alt port == agent.healthProbe.port
    printAdditionalForwardPorts->>output: print "OpenAI-compatible API" block with URLs (/v1)
  else
    printAdditionalForwardPorts->>output: print "additional port" block with URLs
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

area: sandbox, v0.0.65

Suggested reviewers

  • cv
  • sandl99

Poem

🐰 A rabbit hops through ports both new and old,
Recovery nudges forwards back into hold,
Eight-six-four-two now peeks through the door,
Announced with a path and a little bit more,
Hop—connect, call /v1, and explore!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: announcing and recovering declared agent forward_ports in the onboard flow.
Linked Issues check ✅ Passed The PR fully addresses issue #5206 by implementing port 8642 announcement in dashboard output and recovery logic for declared forward ports.
Out of Scope Changes check ✅ Passed All changes are directly scoped to addressing issue #5206: process recovery, onboard UI rendering, documentation, and test coverage for declared forward ports.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/hermes-openai-api-port-forward

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-code-quality

github-code-quality Bot commented Jun 13, 2026

Copy link
Copy Markdown

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/plugin

The overall coverage in the branch is 96%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File 6ea01a9 +/-
nemoclaw/src/se...cret-scanner.ts 100%
nemoclaw/src/commands/slash.ts 100%
nemoclaw/src/li...bprocess-env.ts 100%
nemoclaw/src/bl...eprint/state.ts 98%
nemoclaw/src/onboard/config.ts 98%
nemoclaw/src/bl...int/snapshot.ts 97%
nemoclaw/src/bl...print/runner.ts 95%
nemoclaw/src/co...ration-state.ts 94%
nemoclaw/src/bl...ate-networks.ts 94%
nemoclaw/src/index.ts 94%

TypeScript / code-coverage/cli

The overall coverage in the branch is 44%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File 6ea01a9 +/-
src/lib/state/o...oard-session.ts 90%
src/lib/inference/local.ts 77%
src/lib/sandbox/config.ts 72%
src/lib/inference/nim.ts 72%
src/lib/onboard/preflight.ts 64%
src/lib/state/sandbox.ts 55%
src/lib/onboard...er-gpu-patch.ts 50%
src/lib/actions...licy-channel.ts 49%
src/lib/policy/index.ts 48%
src/lib/onboard.ts 17%

Updated June 13, 2026 14:41 UTC
Code Coverage is in Public Preview. Learn more and provide us with your feedback.

@laitingsheng laitingsheng added integration: hermes Hermes integration behavior area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow bug-fix PR fixes a bug or regression labels Jun 13, 2026
@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Top item: PR review advisor unavailable

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • PR review advisor unavailable: The automated advisor could not complete: PR review advisor SDK provider error: orient-drift: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>; security: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>; acceptance-correctness-tests: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>; synthesize-json: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>
    • Recommendation: Re-run the PR Review Advisor or perform a manual review.
    • Evidence: PR review advisor SDK provider error: orient-drift: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>; security: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>; acceptance-correctness-tests: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>; synthesize-json: 403 <html> <head><title>403 Forbidden</title></head> <body> <center><h1>403 Forbidden</h1></center> </body> </html>

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Add or identify targeted runtime/integration validation for the changed behavior; do not report external E2E job pass/fail here.. Runtime/sandbox/infrastructure paths need behavioral runtime validation: docs/get-started/quickstart-hermes.mdx, docs/reference/troubleshooting.mdx, src/lib/actions/sandbox/process-recovery.ts, src/lib/agent/onboard.ts.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/actions/sandbox/process-recovery.ts`:
- Around line 434-476: The loop in ensureDeclaredAgentForwardPortsHealthy
currently only skips primaryPort but must also skip the optional Hermes web
dashboard port so it isn't redundantly managed; retrieve the Hermes dashboard
port using the same helper/logic used elsewhere for Hermes (the code path that
ensures Hermes' dashboard port, referenced by
ensureHermesDashboardPortForwardIfEnabled) for the given sandboxName (or call
the existing helper that returns that port) and add a check in the for loop to
continue when candidate === hermesDashboardPort; keep the other validations and
return behavior unchanged (use agentRuntime.getSessionAgent,
isSandboxPortForwardHealthy, and ensureSandboxPortForwardForPort as before).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a1496e13-5b2d-4780-aec5-0d0471e68f51

📥 Commits

Reviewing files that changed from the base of the PR and between 158f575 and bfb41f7.

📒 Files selected for processing (4)
  • src/lib/actions/sandbox/process-recovery.ts
  • src/lib/agent/onboard.test.ts
  • src/lib/agent/onboard.ts
  • test/process-recovery.test.ts

Comment thread src/lib/actions/sandbox/process-recovery.ts
@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: hermes-e2e-vitest, sandbox-survival-vitest
Optional E2E: gateway-guard-recovery, rebuild-hermes-vitest

Dispatch hint: hermes-e2e-vitest,sandbox-survival-vitest

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • hermes-e2e-vitest (medium): Validates the Hermes live user flow affected by this PR: nemohermes onboarding, dashboard/API URL reporting, OpenShell forward list entries for Hermes ports including 8642, and host-side API health access.
  • sandbox-survival-vitest (medium): Exercises live sandbox restart/survival and gateway recovery behavior through install/onboard and recovery paths. The PR modifies the generic process recovery function used by sandbox lifecycle commands, so this is merge-blocking coverage for regressions outside Hermes.

Optional E2E

  • gateway-guard-recovery (medium): Adjacent confidence for recovery internals: it runs a live gateway recovery scenario through the same connect/probe recovery path, but focuses on guard-chain restoration rather than manifest-declared secondary forwards.
  • rebuild-hermes-vitest (medium): Useful adjacent Hermes lifecycle coverage if maintainers want extra confidence that Hermes port and dashboard/API state still survive rebuild-related lifecycle flows, but the PR primarily changes recovery and onboarding summary behavior.

New E2E recommendations

  • Hermes secondary forward recovery (high): The existing Hermes E2E appears to check that port 8642 is initially forwarded and healthy, but the PR's core behavior is recovery of manifest-declared non-primary forward_ports after they go missing while the primary dashboard forward remains healthy.
    • Suggested test: Add a live Hermes recovery scenario that onboards Hermes, stops/removes only the OpenShell forward for 8642, runs nemohermes <sandbox> connect --probe-only or nemohermes <sandbox> recover, then asserts openshell forward list shows 8642 for that sandbox and curl -sf http://127.0.0.1:8642/health succeeds without restarting or stealing the primary dashboard forward.

Dispatch hint

  • Workflow: E2E / Vitest Scenarios
  • jobs input: hermes-e2e-vitest,sandbox-survival-vitest

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: None
Optional Vitest E2E scenarios: None

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

  • None. Advisor reported no Vitest E2E scenario impact.

Optional Vitest E2E scenarios

  • None.

Relevant changed files

  • src/lib/actions/sandbox/process-recovery.ts
  • src/lib/agent/onboard.ts

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@github-actions

Copy link
Copy Markdown
Contributor

…overage

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@laitingsheng laitingsheng added the v0.0.65 Release target label Jun 13, 2026
@cv cv merged commit 720bee9 into main Jun 13, 2026
46 checks passed
@cv cv deleted the fix/hermes-openai-api-port-forward branch June 13, 2026 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow bug-fix PR fixes a bug or regression integration: hermes Hermes integration behavior v0.0.65 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ubuntu 24.04][Onboard] nemohermes onboard does not forward OpenAI-compatible API on port 8642

2 participants