Skip to content

fix(onboard): allocate dashboard ports across NemoClaw gateways#5379

Merged
cv merged 4 commits into
mainfrom
feat/multi-gateway-dashboard-binding
Jun 13, 2026
Merged

fix(onboard): allocate dashboard ports across NemoClaw gateways#5379
cv merged 4 commits into
mainfrom
feat/multi-gateway-dashboard-binding

Conversation

@laitingsheng

@laitingsheng laitingsheng commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Summary

NemoClaw allocated dashboard ports per gateway by parsing openshell forward list, which only shows forwards owned by the currently selected gateway. A second sandbox onboarded against a non-default NEMOCLAW_GATEWAY_PORT could not see the first gateway's allocations and re-handed-out the same dashboard port — both sandboxes ended up reporting the same http://127.0.0.1:18789/ URL, and the first sandbox became unreachable with a raw gRPC sandbox has no spec error. The persisted sandbox registry already records each sandbox's dashboardPort at host scope; the allocator now consults that view as a supplementary signal so a fresh onboard on a sibling gateway cannot collide with an existing sandbox's port.

Related Issue

Fixes #4865
Fixes #5359

Changes

  • src/lib/onboard/dashboard-port.tsfindAvailableDashboardPort accepts an additional registryOccupiedPorts view; private mergeOccupiedPorts lets the active gateway's forward-list entry win when both views see the same port. The allocator defaults registryOccupiedPorts to an empty map so its unit tests stay independent of whatever sandboxes happen to live in the test runner's ~/.nemoclaw/sandboxes.json. New exported helper getRegistryOccupiedDashboardPorts(currentSandboxName, listSandboxesFn?) reads ~/.nemoclaw/sandboxes.json and returns a port → sandbox map excluding the sandbox currently being allocated for; it lets listSandboxes() handle missing or unparseable registry files and propagates any other error (e.g. permission-denied) instead of swallowing it. resolveCreateSandboxDashboardPort defaults input.registryOccupiedPorts to the registry-derived map internally, so the create-time call site keeps its existing shape without growing src/lib/onboard.ts.
  • src/lib/onboard/dashboard.tsensureDashboardForward passes getRegistryOccupiedDashboardPorts(sandboxName) through to findAvailableDashboardPort so the post-build forward-setup path applies the same cross-gateway view.
  • src/lib/actions/sandbox/gateway-state.tsprintGatewayLifecycleHint adds a clause that recognises the gateway-side sandbox has no spec gRPC reply and surfaces a concrete openshell gateway select <owning> hint with the sandbox's recorded per-port gateway name, instead of letting the raw gRPC string be the last word.
  • src/lib/onboard/dashboard-port.test.ts — 8 new tests: 5 cover the allocator's cross-gateway behaviour (registry-occupied ports block reuse, the current sandbox can still reclaim its own port, registry entries with null / non-numeric ports are ignored, exhaustion errors include registry-owned ports, the active gateway's forward-list entry wins). 3 cover getRegistryOccupiedDashboardPorts.
  • src/lib/actions/sandbox/gateway-state-hints.test.ts — new file, 3 tests covering the new sandbox has no spec hint clause across default and per-port gateway names, plus a non-match check on unrelated lifecycle output.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

  • Bug Fixes

    • Clearer, user-facing guidance when a sandbox lacks a spec — prompts selecting the owning gateway and retrying instead of surfacing raw gRPC output.
  • New Features

    • Dashboard port allocation now accounts for ports persisted by sibling sandboxes across gateways to reduce conflicts and prefer appropriate owners.
  • Tests

    • Added unit coverage for gateway hint behavior and multi-gateway dashboard-port allocation scenarios.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 48234cd7-0d4c-4624-bc27-4b5096678968

📥 Commits

Reviewing files that changed from the base of the PR and between c90427d and 390c7cb.

📒 Files selected for processing (2)
  • src/lib/onboard/dashboard-port.ts
  • src/lib/onboard/dashboard.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/onboard/dashboard-port.ts

📝 Walkthrough

Walkthrough

Reads persisted sandbox registry to build cross-gateway dashboard-port occupancy, merges that with the active gateway forward-list when selecting ports, wires the registry map through create/resolve flows and tests, and emits specific guidance when a sandbox returns "sandbox has no spec".

Changes

Multi-instance Sandbox Support

Layer / File(s) Summary
Registry occupancy types and retrieval
src/lib/onboard/dashboard-port.ts
Adds SandboxRegistryEntry/ListSandboxesFn types and getRegistryOccupiedDashboardPorts; builds a port→sandbox map excluding the current sandbox and invalid ports.
Dashboard port allocation wiring
src/lib/onboard/dashboard-port.ts
Adds optional registryOccupiedPorts to CreateSandboxDashboardPortInput; resolveCreateSandboxDashboardPort now fetches registry occupancy when not supplied and forwards it into allocation.
Allocation logic: merge & availability
src/lib/onboard/dashboard-port.ts
Implements mergeOccupiedPorts and extends findAvailableDashboardPort to consult merged occupancy (forward-list precedence) so ports owned by other sandboxes are skipped while allowing self-reuse.
Multi-gateway dashboard port allocation tests
src/lib/onboard/dashboard-port.test.ts
Imports and tests getRegistryOccupiedDashboardPorts; extends findAvailableDashboardPort tests for sibling registry occupancy, self-reuse, invalid entry filtering, exhaustion reporting, and forward-list precedence.
Dashboard ensure wiring
src/lib/onboard/dashboard.ts
Imports getRegistryOccupiedDashboardPorts and passes the registry map into findAvailableDashboardPort when ensuring dashboard forwards.
Error guidance for orphaned sandbox
src/lib/actions/sandbox/gateway-state.ts, src/lib/actions/sandbox/gateway-state-hints.test.ts
printGatewayLifecycleHint now detects "sandbox has no spec" and emits instructions to select the owning nemoclaw-<port> gateway and retry; tests validate per-port gateway naming and negative cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5130: Overlaps at src/lib/onboard/dashboard-port.ts where resolve/create port logic was refactored; related changes to allocation flow.
  • NVIDIA/NemoClaw#5128: Earlier changes to dashboard-port creation wiring and resolveCreateSandboxDashboardPort that this PR extends.
  • NVIDIA/NemoClaw#5225: Related gateway-binding and lifecycle hint adjustments that intersect with printGatewayLifecycleHint behavior.

Suggested labels

area: onboarding, v0.0.64

Suggested reviewers

  • cv

🐰 I poked the registry, checked each gate,
Assigned ports so sandboxes wait no more;
When a sandbox cries "no spec" in fright,
Select its gateway — and try again tonight.
Hoppy deploys and happy ports!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(onboard): allocate dashboard ports across NemoClaw gateways' clearly and concisely describes the main change: fixing dashboard port allocation to work across multiple NemoClaw gateways.
Linked Issues check ✅ Passed The PR comprehensively addresses all primary coding objectives from issues #4865 and #5359: prevents cross-gateway dashboard port reuse via registry consultation, adds 'sandbox has no spec' error hint in gateway-state.ts, and includes comprehensive test coverage for multi-gateway scenarios.
Out of Scope Changes check ✅ Passed All changes in dashboard-port.ts, dashboard.ts, gateway-state.ts, and their tests directly support the core objectives of fixing cross-gateway port allocation and improving error diagnostics; no out-of-scope refactoring or unrelated changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/multi-gateway-dashboard-binding

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: None
Optional E2E: None

Workflow run

Full advisor summary

E2E Recommendation Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: None
Optional Vitest E2E scenarios: None

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Failed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Top item: PR review advisor unavailable

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • PR review advisor unavailable: The automated advisor could not complete: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt
    • Recommendation: Re-run the PR Review Advisor or perform a manual review.
    • Evidence: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Add or identify targeted runtime/integration validation for the changed behavior; do not report external E2E job pass/fail here.. Runtime/sandbox/infrastructure paths need behavioral runtime validation: src/lib/actions/sandbox/gateway-state.ts, src/lib/onboard/dashboard-port.ts, src/lib/onboard/dashboard.ts.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@laitingsheng laitingsheng added area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression labels Jun 13, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)

2541-2550: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Move this lookup behind dashboard-port.ts so src/lib/onboard.ts stays within the growth budget.

This extra wiring is what pushed src/lib/onboard.ts over the entrypoint guardrail in CI. If resolveCreateSandboxDashboardPort() computes getRegistryOccupiedDashboardPorts(input.sandboxName) when registryOccupiedPorts is omitted, this call site can keep its old shape without losing the new behavior.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard.ts` around lines 2541 - 2550, The call site currently
computes getRegistryOccupiedDashboardPorts(sandboxName) and passes it into
resolveCreateSandboxDashboardPort, which bloats src/lib/onboard.ts; instead
remove the registryOccupiedPorts argument from this call site (stop invoking
getRegistryOccupiedDashboardPorts here) and make
resolveCreateSandboxDashboardPort responsible for computing registry-occupied
ports when its registryOccupiedPorts parameter is omitted/undefined. Update
resolveCreateSandboxDashboardPort's implementation to import/use
getRegistryOccupiedDashboardPorts(sandboxName) internally as the fallback, keep
the function signature backward-compatible (optional param), and ensure existing
behavior and tests remain unchanged.

Source: Pipeline failures

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard/dashboard-port.ts`:
- Around line 209-213: The blanket catch around entries = list().sandboxes is
hiding real IO/permission errors; instead either call the existing
listSandboxes() helper (which already degrades for missing/unparseable registry
files) or narrow the catch to only handle the safe fallback cases (e.g.,
error.code === 'ENOENT' or JSON parsing errors) and rethrow any other errors
(permission/unreadable file) so they abort; reference list(), listSandboxes(),
and readConfigFile to locate the logic and implement the safer error handling
(i.e., remove the unconditional catch or replace it with conditional checks and
rethrow).

---

Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 2541-2550: The call site currently computes
getRegistryOccupiedDashboardPorts(sandboxName) and passes it into
resolveCreateSandboxDashboardPort, which bloats src/lib/onboard.ts; instead
remove the registryOccupiedPorts argument from this call site (stop invoking
getRegistryOccupiedDashboardPorts here) and make
resolveCreateSandboxDashboardPort responsible for computing registry-occupied
ports when its registryOccupiedPorts parameter is omitted/undefined. Update
resolveCreateSandboxDashboardPort's implementation to import/use
getRegistryOccupiedDashboardPorts(sandboxName) internally as the fallback, keep
the function signature backward-compatible (optional param), and ensure existing
behavior and tests remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 61beb437-b077-4a3c-8299-ad174f026e5d

📥 Commits

Reviewing files that changed from the base of the PR and between 1467e84 and 3c70f45.

📒 Files selected for processing (6)
  • src/lib/actions/sandbox/gateway-state-hints.test.ts
  • src/lib/actions/sandbox/gateway-state.ts
  • src/lib/onboard.ts
  • src/lib/onboard/dashboard-port.test.ts
  • src/lib/onboard/dashboard-port.ts
  • src/lib/onboard/dashboard.ts

Comment thread src/lib/onboard/dashboard-port.ts Outdated
…rt module

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
…st registry state

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@cv cv merged commit f34ac4c into main Jun 13, 2026
46 checks passed
@cv cv deleted the feat/multi-gateway-dashboard-binding branch June 13, 2026 08:21
@cv cv added the v0.0.65 Release target label Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression v0.0.65 Release target

Projects

None yet

2 participants