fix(onboard): allocate dashboard ports across NemoClaw gateways#5379
Conversation
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughReads persisted sandbox registry to build cross-gateway dashboard-port occupancy, merges that with the active gateway forward-list when selecting ports, wires the registry map through create/resolve flows and tests, and emits specific guidance when a sandbox returns "sandbox has no spec". ChangesMulti-instance Sandbox Support
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: None Full advisor summaryE2E Recommendation AdvisorFailed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-advisor-raw-output.txt |
Vitest E2E Scenario RecommendationRequired Vitest E2E scenarios: None Full Vitest E2E advisor summaryVitest E2E Scenario AdvisorFailed: Could not parse JSON from advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/e2e-advisor/e2e-scenario-advisor-raw-output.txt |
PR Review AdvisorFindings: 0 needs attention, 1 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)
2541-2550: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winMove this lookup behind
dashboard-port.tssosrc/lib/onboard.tsstays within the growth budget.This extra wiring is what pushed
src/lib/onboard.tsover the entrypoint guardrail in CI. IfresolveCreateSandboxDashboardPort()computesgetRegistryOccupiedDashboardPorts(input.sandboxName)whenregistryOccupiedPortsis omitted, this call site can keep its old shape without losing the new behavior.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard.ts` around lines 2541 - 2550, The call site currently computes getRegistryOccupiedDashboardPorts(sandboxName) and passes it into resolveCreateSandboxDashboardPort, which bloats src/lib/onboard.ts; instead remove the registryOccupiedPorts argument from this call site (stop invoking getRegistryOccupiedDashboardPorts here) and make resolveCreateSandboxDashboardPort responsible for computing registry-occupied ports when its registryOccupiedPorts parameter is omitted/undefined. Update resolveCreateSandboxDashboardPort's implementation to import/use getRegistryOccupiedDashboardPorts(sandboxName) internally as the fallback, keep the function signature backward-compatible (optional param), and ensure existing behavior and tests remain unchanged.Source: Pipeline failures
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard/dashboard-port.ts`:
- Around line 209-213: The blanket catch around entries = list().sandboxes is
hiding real IO/permission errors; instead either call the existing
listSandboxes() helper (which already degrades for missing/unparseable registry
files) or narrow the catch to only handle the safe fallback cases (e.g.,
error.code === 'ENOENT' or JSON parsing errors) and rethrow any other errors
(permission/unreadable file) so they abort; reference list(), listSandboxes(),
and readConfigFile to locate the logic and implement the safer error handling
(i.e., remove the unconditional catch or replace it with conditional checks and
rethrow).
---
Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 2541-2550: The call site currently computes
getRegistryOccupiedDashboardPorts(sandboxName) and passes it into
resolveCreateSandboxDashboardPort, which bloats src/lib/onboard.ts; instead
remove the registryOccupiedPorts argument from this call site (stop invoking
getRegistryOccupiedDashboardPorts here) and make
resolveCreateSandboxDashboardPort responsible for computing registry-occupied
ports when its registryOccupiedPorts parameter is omitted/undefined. Update
resolveCreateSandboxDashboardPort's implementation to import/use
getRegistryOccupiedDashboardPorts(sandboxName) internally as the fallback, keep
the function signature backward-compatible (optional param), and ensure existing
behavior and tests remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 61beb437-b077-4a3c-8299-ad174f026e5d
📒 Files selected for processing (6)
src/lib/actions/sandbox/gateway-state-hints.test.tssrc/lib/actions/sandbox/gateway-state.tssrc/lib/onboard.tssrc/lib/onboard/dashboard-port.test.tssrc/lib/onboard/dashboard-port.tssrc/lib/onboard/dashboard.ts
…rt module Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
…st registry state Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Summary
NemoClaw allocated dashboard ports per gateway by parsing
openshell forward list, which only shows forwards owned by the currently selected gateway. A second sandbox onboarded against a non-defaultNEMOCLAW_GATEWAY_PORTcould not see the first gateway's allocations and re-handed-out the same dashboard port — both sandboxes ended up reporting the samehttp://127.0.0.1:18789/URL, and the first sandbox became unreachable with a raw gRPCsandbox has no specerror. The persisted sandbox registry already records each sandbox'sdashboardPortat host scope; the allocator now consults that view as a supplementary signal so a fresh onboard on a sibling gateway cannot collide with an existing sandbox's port.Related Issue
Fixes #4865
Fixes #5359
Changes
src/lib/onboard/dashboard-port.ts—findAvailableDashboardPortaccepts an additionalregistryOccupiedPortsview; privatemergeOccupiedPortslets the active gateway's forward-list entry win when both views see the same port. The allocator defaultsregistryOccupiedPortsto an empty map so its unit tests stay independent of whatever sandboxes happen to live in the test runner's~/.nemoclaw/sandboxes.json. New exported helpergetRegistryOccupiedDashboardPorts(currentSandboxName, listSandboxesFn?)reads~/.nemoclaw/sandboxes.jsonand returns a port → sandbox map excluding the sandbox currently being allocated for; it letslistSandboxes()handle missing or unparseable registry files and propagates any other error (e.g. permission-denied) instead of swallowing it.resolveCreateSandboxDashboardPortdefaultsinput.registryOccupiedPortsto the registry-derived map internally, so the create-time call site keeps its existing shape without growingsrc/lib/onboard.ts.src/lib/onboard/dashboard.ts—ensureDashboardForwardpassesgetRegistryOccupiedDashboardPorts(sandboxName)through tofindAvailableDashboardPortso the post-build forward-setup path applies the same cross-gateway view.src/lib/actions/sandbox/gateway-state.ts—printGatewayLifecycleHintadds a clause that recognises the gateway-sidesandbox has no specgRPC reply and surfaces a concreteopenshell gateway select <owning>hint with the sandbox's recorded per-port gateway name, instead of letting the raw gRPC string be the last word.src/lib/onboard/dashboard-port.test.ts— 8 new tests: 5 cover the allocator's cross-gateway behaviour (registry-occupied ports block reuse, the current sandbox can still reclaim its own port, registry entries with null / non-numeric ports are ignored, exhaustion errors include registry-owned ports, the active gateway's forward-list entry wins). 3 covergetRegistryOccupiedDashboardPorts.src/lib/actions/sandbox/gateway-state-hints.test.ts— new file, 3 tests covering the newsandbox has no spechint clause across default and per-port gateway names, plus a non-match check on unrelated lifecycle output.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Tinson Lai tinsonl@nvidia.com
Summary by CodeRabbit
Bug Fixes
New Features
Tests