fix(start): drop slow-mode polling on late allowlisted scope upgrades by laitingsheng · Pull Request #5387 · NVIDIA/NemoClaw

laitingsheng · 2026-06-13T10:00:58Z

Summary

The in-sandbox auto-pair watcher transitions to slow-mode keepalive after the first device pairs and a handful of quiet polls. A scope upgrade requested after that point — fresh openclaw tui, openclaw agent, or any other allowlisted CLI client — waits up to one slow-poll interval before the watcher revisits the pending list, which exceeds the OpenClaw client's tolerance for scope upgrade pending approval and forces an embedded-mode fallback. Drop the slow-mode default to 5s and arm a bounded fast-reentry counter on the rising edge of each fresh allowlisted approval attempt so the override floors polling at 1s for the next few iterations. The counter is keyed by requestId (garbage-collected against the live pending list) so a sticky failing request bumps once and then yields back to the slow cadence; it is also floored by the caller's existing default so fast-reentry never increases latency on a tight retry pass.

Related Issue

Fixes #5343
Refs #5324

#5324 is partially addressed by the polling-cadence fix when the failing non-TUI client requests stay within the allowlisted scopes (operator.pairing/read/write). The bug's other symptoms — Unknown command: openclaw device, the dashboard Auth did not match rate-limit, and any subcommand whose pairing genuinely needs operator.admin — sit outside the auto-pair policy boundary and remain open for a separate operator-driven approval surface.

Changes

scripts/nemoclaw-start.sh: declare FAST_REENTRY_POLLS / FAST_REENTRY_INTERVAL / FAST_REENTRY_REMAINING / FAST_REENTRY_BUMPED_REQUEST_IDS, add a sleep_for_next_poll(default) helper that floors the override by the caller's default, wire it into every watcher sleep call-site, and bump the counter once on the rising edge of each fresh allowlisted approval attempt — emitting a single [auto-pair] fast-reentry bumped polls=N approved=N mode=slow|fast marker. Lower SLOW_INTERVAL default from 30s to 5s so the worst-case latency on the first late upgrade sits below typical OpenClaw client wait windows.
test/nemoclaw-start.test.ts: extract a shared late-CLI fixture (setupLateCliFixture) so the existing convergence test and the new fast-reentry assertions share their fake openclaw stub, and fold the fast-reentry marker + ordering + single-rising-edge assertions into the existing late-upgrade test to stay inside the file size budget.
test/e2e/test-cli-scope-upgrade-approval.sh (renamed from test-issue-4462-scope-upgrade-approval.sh): drop AUTO_PAIR_SLOW_INTERVAL_DEFAULT from 600 to 5 so the in-sandbox watcher is observably exercised by the test, tolerate watcher-wins in Phase 3 via allow_already_approved=1, and make Phase 6 strict — it now fails the test if the watcher did not record a slow-mode transition or did not log the fast-reentry marker.
test/e2e-script-workflow.test.ts, .github/workflows/nightly-e2e.yaml: update the legacy + nightly script allowlists and CI job script paths to the renamed file, rename the two affected nightly E2E jobs to drop the issue-id prefix (cli-scope-upgrade-approval-e2e, cli-scope-upgrade-legacy-repro-e2e), and sync the approval-job's NEMOCLAW_AUTO_PAIR_SLOW_INTERVAL_SECS from 600 to 5.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

Summary by CodeRabbit

New Features
- Fast re-entry polling added to accelerate follow-up approval polls and reduce normal slow cadence.
Tests
- E2E jobs and test scripts switched to CLI-scoped variants.
- Approval test script enhanced with additional verification phases (including state log snapshot and fast-reentry detection).
- Late-CLI approval tests refactored to cover concurrent request scenarios and improved assertions.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai · 2026-06-13T10:01:12Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3be791f0-029c-4b9b-bc0c-b87ea2d8db86

📥 Commits

Reviewing files that changed from the base of the PR and between cae6c20 and 90026e7.

📒 Files selected for processing (2)

test/e2e/test-cli-scope-upgrade-approval.sh
test/nemoclaw-start.test.ts

🚧 Files skipped from review as they are similar to previous changes (2)

test/e2e/test-cli-scope-upgrade-approval.sh
test/nemoclaw-start.test.ts

📝 Walkthrough

Walkthrough

This PR adds a bounded "fast-reentry" polling window to the auto-pair watcher that briefly forces 1-second polling after allowlisted device approvals are attempted, enabling capture of late CLI arrivals. The nightly E2E workflow and tests are updated to use unified CLI scope-upgrade naming instead of issue-4462 prefixes.

Changes

Fast-reentry scope-upgrade polling

Layer / File(s)	Summary
Nightly workflow and allowlist `.github/workflows/nightly-e2e.yaml`, `test/e2e-script-workflow.test.ts`	Replaces issue-4462 job IDs with `cli-scope-upgrade-approval-e2e` and `cli-scope-upgrade-legacy-repro-e2e`; both run the unified CLI approval script; downstream job aggregators updated to reference new IDs; frozen test allowlists updated to expect the shared CLI script.
Fast-reentry constants and sleep helper `scripts/nemoclaw-start.sh`	Introduces FAST_REENTRY_POLLS, FAST_REENTRY_INTERVAL, FAST_REENTRY_REMAINING, and FAST_REENTRY_BUMPED_REQUEST_IDS state; sets SLOW_INTERVAL default to 5s; adds `sleep_for_next_poll(default_seconds)` helper to override polling intervals while fast-reentry counter is active; routes device-list failure and polling sleeps through the helper.
Approval tracking and fast-reentry activation `scripts/nemoclaw-start.sh`	Adds per-iteration tracking of attempted_request_ids and pending_request_ids; marks requestIds before allowlisted approve calls; garbage-collects bumped ids post-processing; on rising-edge new attempted requestIds (when FAST_REENTRY_POLLS > 0), sets FAST_REENTRY_REMAINING to refresh the bounded fast-reentry window.
E2E script timing and instrumentation `test/e2e/test-cli-scope-upgrade-approval.sh`	Switches env var prefixes to NEMOCLAW_CLI_SCOPE_*; lowers slow-interval default to 5s; makes Phase 3 tolerant of already-approved requests; adds Phase 6 to snapshot logs, assert slow-mode keepalive, and check for fast-reentry bumped marker; updates agent session-id labels and pass messaging.
Vitest: late-CLI slow-mode & fast-reentry test `test/nemoclaw-start.test.ts`	Extracts reusable `setupLateCliFixture(prefix)` to build workspace and precomputed device payloads; refactors late-CLI test to use explicit fast-reentry env vars, concurrent late requestIds, and strict marker count/sequencing assertions; updates non-zero approve failure test with fast-reentry env vars and marker occurrence validation.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

nightly-e2e: issue-4462 gateway-pinned approval E2E flakes on transient gateway unreachability #5377: touches the same approval/polling readiness behavior addressed by this PR — both modify approval test flow and watcher polling to mitigate approval flakes.

Suggested labels

nightly-e2e, area: e2e

Suggested reviewers

jyaunches
cv

Poem

A rabbit bounds through approval gates,
Fast-reentry polls for those who wait—
Late CLI friends arrive in time,
In brief 1s bursts, the watcher chimes.
🐰 ⚡

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main behavioral change: implementing a fast-reentry mechanism to prevent slow-mode polling delays on late allowlisted scope upgrades, matching the PR's core fix.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/scope-upgrade-late-approval-race

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-code-quality · 2026-06-13T10:01:34Z

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/plugin

The overall coverage in the fix/scope-upgrade-la... branch is 96%. Coverage data for the main branch is not yet available.

Show a code coverage summary of the most covered files.

File	main	fix/scope-upgrade-la... `90026e7`	+/-
`nemoclaw/src/se...cret-scanner.ts`	—	100%	—
`nemoclaw/src/commands/slash.ts`	—	100%	—
`nemoclaw/src/li...bprocess-env.ts`	—	100%	—
`nemoclaw/src/bl...eprint/state.ts`	—	98%	—
`nemoclaw/src/onboard/config.ts`	—	98%	—
`nemoclaw/src/bl...int/snapshot.ts`	—	97%	—
`nemoclaw/src/bl...print/runner.ts`	—	95%	—
`nemoclaw/src/co...ration-state.ts`	—	94%	—
`nemoclaw/src/bl...ate-networks.ts`	—	94%	—
`nemoclaw/src/index.ts`	—	94%	—

TypeScript / code-coverage/cli

The overall coverage in the fix/scope-upgrade-la... branch is 44%. Coverage data for the main branch is not yet available.

Show a code coverage summary of the most covered files.

File	main	fix/scope-upgrade-la... `90026e7`	+/-
`src/lib/state/o...oard-session.ts`	—	90%	—
`src/lib/inference/local.ts`	—	77%	—
`src/lib/sandbox/config.ts`	—	72%	—
`src/lib/inference/nim.ts`	—	72%	—
`src/lib/onboard/preflight.ts`	—	64%	—
`src/lib/state/sandbox.ts`	—	55%	—
`src/lib/onboard...er-gpu-patch.ts`	—	50%	—
`src/lib/actions...licy-channel.ts`	—	49%	—
`src/lib/policy/index.ts`	—	48%	—
`src/lib/onboard.ts`	—	17%	—

_{Updated June 13, 2026 16:26 UTC
Code Coverage is in Public Preview. Learn more and provide us with your feedback.}

github-actions · 2026-06-13T10:05:15Z

E2E Advisor Recommendation

Required E2E: cli-scope-upgrade-approval-e2e
Optional E2E: cli-scope-upgrade-legacy-repro-e2e, device-auth-health-e2e, sessions-agents-cli-e2e

Dispatch hint: cli-scope-upgrade-approval-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

cli-scope-upgrade-approval-e2e (high): Directly covers the changed auto-pair watcher and CLI scope-upgrade approval path in a real sandbox, including that an approved OpenClaw agent turn stays on the gateway path and does not fall back to embedded mode.

Optional E2E

cli-scope-upgrade-legacy-repro-e2e (high): Useful diagnostic coverage for the legacy gateway-pinned approval behavior using the same renamed script and updated workflow wiring. It is adjacent to the runtime change but characterized as diagnostic rather than the merge-blocking fix gate.
device-auth-health-e2e (medium): Additional confidence for device-auth and pairing health around gateway/device approval semantics touched by the auto-pair watcher change.
sessions-agents-cli-e2e (high): Adjacent real-sandbox OpenClaw CLI/agent flow coverage that can catch regressions in agent command routing through the gateway after device approval changes.

New E2E recommendations

TUI late scope-upgrade approval (medium): The start-script comments identify late openclaw tui scope upgrades as a target of the slow-mode fast-reentry behavior, but the changed E2E primarily proves the openclaw agent operator.write path. A dedicated TUI late-scope-upgrade check would close that gap if TUI regressions recur.
- Suggested test: Add an E2E that starts an onboarded sandbox, waits for auto-pair slow mode, triggers a late openclaw tui/webchat scope upgrade, and verifies the watcher approves it without client fallback or stuck pending state.

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: cli-scope-upgrade-approval-e2e

github-actions · 2026-06-13T10:05:16Z

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: ubuntu-repo-cloud-openclaw
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

ubuntu-repo-cloud-openclaw: The PR changes scripts/nemoclaw-start.sh auto-pair watcher behavior used during OpenClaw sandbox startup/onboarding. The ubuntu-repo-cloud-openclaw live Vitest scenario is the smallest live-supported typed scenario that exercises current-repo OpenClaw onboarding, gateway/device pairing setup, and the baseline smoke/inference path affected by this startup script.
- Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional Vitest E2E scenarios

None.

Relevant changed files

scripts/nemoclaw-start.sh

github-actions · 2026-06-13T10:05:18Z

PR Review Advisor

Findings: 1 needs attention, 3 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 1 still applies, 2 new items found

Review findings

🛠️ Needs attention

[All Platforms][Inference] Multi-sandbox parallel routing test fails: both sandboxes hit gateway scope-upgrade block and fall back to embedded mode #5343 two-provider inference.local routing remains unvalidated (test/e2e/test-cli-scope-upgrade-approval.sh:1099): The prior review finding is partially addressed by the new Phase 7 two-sandbox concurrent `openclaw agent` check, but the linked issue's expected result is specifically two sandboxes with different inference providers routed through `inference.local`. The new test says both sandboxes share the same NVIDIA Cloud provider and does not assert the requested NVIDIA/Ollama models or `inference.local` routing for either sandbox.
- Recommendation: Either add or identify coverage for sandbox-A using NVIDIA Cloud model `nvidia/nemotron-3-super-120b-a12b` and sandbox-B using local Ollama model `qwen3.5:9b`, both through `inference.local` concurrently with no scope/pairing/fallback markers; or narrow the PR/issue claim so [All Platforms][Inference] Multi-sandbox parallel routing test fails: both sandboxes hit gateway scope-upgrade block and fall back to embedded mode #5343 is only partially addressed.
- Evidence: [All Platforms][Inference] Multi-sandbox parallel routing test fails: both sandboxes hit gateway scope-upgrade block and fall back to embedded mode #5343 Expected Result: `sandbox-A` → `inference.local` → NVIDIA Cloud model `nvidia/nemotron-3-super-120b-a12b`; `sandbox-B` → `inference.local` → local Ollama model `qwen3.5:9b`. Diff evidence: Phase 7 comments state CI has only one provider key and both sandboxes share the same NVIDIA Cloud provider, then checks only concurrent `openclaw agent` output and `OPENCLAW_GATEWAY_URL=ws://`.

🔎 Worth checking

Source-of-truth review needed: scripts/nemoclaw-start.sh fast-reentry auto-pair polling: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `scripts/nemoclaw-start.sh` documents why slow-mode can force embedded fallback and adds fast-reentry, but only the older gateway-env approval workaround has an explicit removal note.
Automatic operator.write approval is faster and longer-lived on spoofable client metadata (scripts/nemoclaw-start.sh:1927): The PR reduces the default slow-mode approval poll interval and adds fast-reentry for allowlisted requests. The policy still limits approvals to `operator.pairing/read/write` and rejects `operator.admin`, which is good, but the watcher comments acknowledge `clientId` and `clientMode` are client-supplied and spoofable. This change makes automatic approval of `operator.write` more responsive during long-running sandbox sessions, so the least-privilege boundary deserves explicit threat-model confirmation.
- Recommendation: Document why automatically approving `operator.write` for `cli`/`webchat` metadata remains acceptable in the gateway device-auth model, or bind late upgrades to a stronger already-paired device identity/provenance if OpenClaw exposes one. Also confirm the 5s default does not create gateway load/DoS risk on multi-sandbox hosts.
- Evidence: `scripts/lib/openclaw_device_approval_policy.py` allows `operator.pairing`, `operator.read`, and `operator.write` for `openclaw-control-ui`, `webchat`, or `cli`; the changed watcher lowers `SLOW_INTERVAL` to 5s and adds bounded fast-reentry; the watcher comment states `clientId/clientMode are client-supplied and spoofable`.
Fast-reentry workaround lacks a clear source-of-truth removal boundary (scripts/nemoclaw-start.sh:1913): The patch handles a late pending scope-upgrade by tuning the NemoClaw watcher polling cadence and fast-reentry behavior. The invalid state and tests are described, but unlike the existing gateway-env approval workaround, the fast-reentry block does not clearly state when this workaround should be removed or what upstream/source behavior would make it obsolete.
- Recommendation: Add a short source-of-truth note for the fast-reentry workaround: where the pending late-upgrade state is created, why OpenClaw/device-auth cannot be fixed in this PR, what upstream behavior would replace polling, and the condition for deleting the workaround.
- Evidence: The changed comments explain that slow-mode polling can exceed the OpenClaw client's tolerance and force embedded fallback, and tests cover the watcher behavior. The nearby older OpenClaw CLI scope-upgrade approval deadlocks and forces openclaw agent into embedded fallback #4462 workaround explicitly says to remove it when OpenClaw approve can complete scope upgrades through the gateway; the new fast-reentry cadence change does not have an equivalent removal condition.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — Two real sandboxes with different configured providers (`sandbox-A` NVIDIA Cloud `nvidia/nemotron-3-super-120b-a12b`, `sandbox-B` Ollama `qwen3.5:9b`) run concurrent `openclaw agent` turns and each request is observed through `inference.local` with the expected provider/model.. The changed behavior is in sandbox lifecycle and gateway/device-auth paths. Unit tests provide useful state-machine coverage and the E2E now covers concurrent same-provider agent turns, but the linked issue is a runtime multi-provider `inference.local` routing scenario.
**Runtime validation** — Two sandboxes on distinct `NEMOCLAW_GATEWAY_PORT` values run late concurrent scope-upgrade agent turns and prove each watcher approves only its own sandbox's gateway/state.. The changed behavior is in sandbox lifecycle and gateway/device-auth paths. Unit tests provide useful state-machine coverage and the E2E now covers concurrent same-provider agent turns, but the linked issue is a runtime multi-provider `inference.local` routing scenario.
**Runtime validation** — A long-lived slow-mode watcher with a sticky failing allowlisted requestId across multiple slow polls emits only one fast-reentry bump until the request disappears and reappears.. The changed behavior is in sandbox lifecycle and gateway/device-auth paths. Unit tests provide useful state-machine coverage and the E2E now covers concurrent same-provider agent turns, but the linked issue is a runtime multi-provider `inference.local` routing scenario.
**Runtime validation** — If TUI remains in the claimed scope, a late `openclaw tui` connection after slow-mode stays gateway-backed with no `scope upgrade pending approval`, `pairing required`, or embedded-fallback markers.. The changed behavior is in sandbox lifecycle and gateway/device-auth paths. Unit tests provide useful state-machine coverage and the E2E now covers concurrent same-provider agent turns, but the linked issue is a runtime multi-provider `inference.local` routing scenario.
**Acceptance clause:** Fixes [All Platforms][Inference] Multi-sandbox parallel routing test fails: both sandboxes hit gateway scope-upgrade block and fall back to embedded mode #5343 — add test evidence or identify existing coverage. The production watcher change and E2E Phase 7 address late scope-upgrade fallback for concurrent `openclaw agent` calls, but the linked issue's full two-provider `inference.local` routing scenario is not validated.
**Acceptance clause:** Refs [Brev][CLI&UX] Non-TUI openclaw commands (cron, exec, agent) blocked by gateway scope approval with no approval path available #5324 — add test evidence or identify existing coverage. The PR body limits [Brev][CLI&UX] Non-TUI openclaw commands (cron, exec, agent) blocked by gateway scope approval with no approval path available #5324 to allowlisted `operator.pairing/read/write` scope-upgrade behavior. The diff does not claim to fix the `operator.admin` or other non-allowlisted symptoms.
**Acceptance clause:** The documented multi-sandbox routing test requires two NemoClaw sandboxes with different inference providers to handle prompts in parallel via the gateway (`inference.local`). — add test evidence or identify existing coverage. Phase 7 creates a second sandbox and runs concurrent agent prompts, but its comment states both sandboxes share the same NVIDIA Cloud provider and it does not assert `inference.local` provider routing.
**Acceptance clause:** `sandbox-A` → `inference.local` → NVIDIA Cloud model `nvidia/nemotron-3-super-120b-a12b` — add test evidence or identify existing coverage. No changed test provisions or asserts this model for sandbox A; the concurrent test only checks the OpenClaw gateway URL and agent success markers.

Since last review details

Current findings:

Source-of-truth review needed: scripts/nemoclaw-start.sh fast-reentry auto-pair polling: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: `scripts/nemoclaw-start.sh` documents why slow-mode can force embedded fallback and adds fast-reentry, but only the older gateway-env approval workaround has an explicit removal note.
[All Platforms][Inference] Multi-sandbox parallel routing test fails: both sandboxes hit gateway scope-upgrade block and fall back to embedded mode #5343 two-provider inference.local routing remains unvalidated (test/e2e/test-cli-scope-upgrade-approval.sh:1099): The prior review finding is partially addressed by the new Phase 7 two-sandbox concurrent `openclaw agent` check, but the linked issue's expected result is specifically two sandboxes with different inference providers routed through `inference.local`. The new test says both sandboxes share the same NVIDIA Cloud provider and does not assert the requested NVIDIA/Ollama models or `inference.local` routing for either sandbox.
- Recommendation: Either add or identify coverage for sandbox-A using NVIDIA Cloud model `nvidia/nemotron-3-super-120b-a12b` and sandbox-B using local Ollama model `qwen3.5:9b`, both through `inference.local` concurrently with no scope/pairing/fallback markers; or narrow the PR/issue claim so [All Platforms][Inference] Multi-sandbox parallel routing test fails: both sandboxes hit gateway scope-upgrade block and fall back to embedded mode #5343 is only partially addressed.
- Evidence: [All Platforms][Inference] Multi-sandbox parallel routing test fails: both sandboxes hit gateway scope-upgrade block and fall back to embedded mode #5343 Expected Result: `sandbox-A` → `inference.local` → NVIDIA Cloud model `nvidia/nemotron-3-super-120b-a12b`; `sandbox-B` → `inference.local` → local Ollama model `qwen3.5:9b`. Diff evidence: Phase 7 comments state CI has only one provider key and both sandboxes share the same NVIDIA Cloud provider, then checks only concurrent `openclaw agent` output and `OPENCLAW_GATEWAY_URL=ws://`.
Automatic operator.write approval is faster and longer-lived on spoofable client metadata (scripts/nemoclaw-start.sh:1927): The PR reduces the default slow-mode approval poll interval and adds fast-reentry for allowlisted requests. The policy still limits approvals to `operator.pairing/read/write` and rejects `operator.admin`, which is good, but the watcher comments acknowledge `clientId` and `clientMode` are client-supplied and spoofable. This change makes automatic approval of `operator.write` more responsive during long-running sandbox sessions, so the least-privilege boundary deserves explicit threat-model confirmation.
- Recommendation: Document why automatically approving `operator.write` for `cli`/`webchat` metadata remains acceptable in the gateway device-auth model, or bind late upgrades to a stronger already-paired device identity/provenance if OpenClaw exposes one. Also confirm the 5s default does not create gateway load/DoS risk on multi-sandbox hosts.
- Evidence: `scripts/lib/openclaw_device_approval_policy.py` allows `operator.pairing`, `operator.read`, and `operator.write` for `openclaw-control-ui`, `webchat`, or `cli`; the changed watcher lowers `SLOW_INTERVAL` to 5s and adds bounded fast-reentry; the watcher comment states `clientId/clientMode are client-supplied and spoofable`.
Fast-reentry workaround lacks a clear source-of-truth removal boundary (scripts/nemoclaw-start.sh:1913): The patch handles a late pending scope-upgrade by tuning the NemoClaw watcher polling cadence and fast-reentry behavior. The invalid state and tests are described, but unlike the existing gateway-env approval workaround, the fast-reentry block does not clearly state when this workaround should be removed or what upstream/source behavior would make it obsolete.
- Recommendation: Add a short source-of-truth note for the fast-reentry workaround: where the pending late-upgrade state is created, why OpenClaw/device-auth cannot be fixed in this PR, what upstream behavior would replace polling, and the condition for deleting the workaround.
- Evidence: The changed comments explain that slow-mode polling can exceed the OpenClaw client's tolerance and force embedded fallback, and tests cover the watcher behavior. The nearby older OpenClaw CLI scope-upgrade approval deadlocks and forces openclaw agent into embedded fallback #4462 workaround explicitly says to remove it when OpenClaw approve can complete scope upgrades through the gateway; the new fast-reentry cadence change does not have an equivalent removal condition.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

github-actions · 2026-06-13T10:05:26Z

Selective E2E Results — ❌ Some jobs failed

Run: 27463690336
Target ref: 04069f70a3743c2892e409295d79c3e029d1f638
Workflow ref: main
Requested jobs: issue-4462-scope-upgrade-approval-e2e,issue-4462-gateway-pinned-approval-characterization-e2e
Summary: 0 passed, 2 failed, 0 cancelled, 0 skipped

Job	Result
issue-4462-gateway-pinned-approval-characterization-e2e	❌ failure
issue-4462-scope-upgrade-approval-e2e	❌ failure

Failed jobs: issue-4462-gateway-pinned-approval-characterization-e2e, issue-4462-scope-upgrade-approval-e2e. Check run artifacts for logs.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 2050-2061: The fast-reentry counter is being reinitialized every
loop when attempted_approval is true, allowing a single failing pending request
to keep fast mode indefinitely; change the logic in the watcher loop so
FAST_REENTRY_REMAINING is only reset once per approval lifecycle (e.g., on the
rising edge of attempted_approval or when FAST_REENTRY_REMAINING == 0) instead
of every loop iteration: add a small persistent flag like
prev_attempted_approval or check FAST_REENTRY_REMAINING to detect transitions
and only set FAST_REENTRY_REMAINING = FAST_REENTRY_POLLS and print the log when
the transition occurs (leave APPROVED and SLOW_MODE usage as-is).

In `@test/nemoclaw-start.test.ts`:
- Around line 1746-1859: The test added ("drops slow-mode polling back to fast
cadence when a late scope upgrade arrives") exceeds the file line budget;
extract the duplicated late-cli fixture and setup into a small reusable helper
(e.g. create a function like createLateCliFixture that returns { tmpDir,
fakeOpenclaw, stateFile, approveLog } and sets up the script and permissions)
and call that helper from this test (and the adjacent similar test) instead of
inlining the long bash string and env setup, or alternatively fold the
fast-reentry assertions into the existing late-upgrade test to avoid duplicating
the entire fixture; locate references to fakeOpenclaw, stateFile, approveLog,
and buildAutoPairScript() to update call sites to use the helper and remove the
duplicated lines from the test body.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5b40365e-a235-4bb4-bb63-2de367836ecb

📥 Commits

Reviewing files that changed from the base of the PR and between 39fd60a and 04069f7.

📒 Files selected for processing (5)

.github/workflows/nightly-e2e.yaml
scripts/nemoclaw-start.sh
test/e2e-script-workflow.test.ts
test/e2e/test-cli-scope-upgrade-approval.sh
test/nemoclaw-start.test.ts

github-actions · 2026-06-13T10:07:46Z

Selective E2E Results — ❌ Some jobs failed

Run: 27463736935
Target ref: 1ff7eb0c5a32f8b430d88fc5685923744caa1dc8
Workflow ref: main
Requested jobs: issue-4462-scope-upgrade-approval-e2e,issue-4462-gateway-pinned-approval-characterization-e2e
Summary: 0 passed, 2 failed, 0 cancelled, 0 skipped

Job	Result
issue-4462-gateway-pinned-approval-characterization-e2e	❌ failure
issue-4462-scope-upgrade-approval-e2e	❌ failure

Failed jobs: issue-4462-gateway-pinned-approval-characterization-e2e, issue-4462-scope-upgrade-approval-e2e. Check run artifacts for logs.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/e2e/test-cli-scope-upgrade-approval.sh`:
- Around line 1082-1086: The grep check for '[auto-pair] fast-reentry bumped'
using auto_pair_log_snapshot should be made informational: in the else branch
replace the fail call with a non-fatal log/warn (or call a function like warn or
echo) so the script doesn't hard-fail when the watcher didn't log the marker;
keep the pass branch unchanged. Update the block that currently uses grep -F ...
and calls fail to instead emit a warning/message while allowing the test to
continue.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5a348f44-52fd-43f1-a690-23bae8b5c178

📥 Commits

Reviewing files that changed from the base of the PR and between 04069f7 and 2b040db.

📒 Files selected for processing (4)

.github/workflows/nightly-e2e.yaml
scripts/nemoclaw-start.sh
test/e2e/test-cli-scope-upgrade-approval.sh
test/nemoclaw-start.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

test/nemoclaw-start.test.ts

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-06-13T10:57:58Z

Selective E2E Results — ❌ Some jobs failed

Run: 27464785119
Target ref: fix/scope-upgrade-late-approval-race
Requested jobs: cli-scope-upgrade-approval-e2e,cli-scope-upgrade-legacy-repro-e2e
Summary: 0 passed, 2 failed, 0 cancelled, 0 skipped

Job	Result
cli-scope-upgrade-approval-e2e	❌ failure
cli-scope-upgrade-legacy-repro-e2e	❌ failure

Failed jobs: cli-scope-upgrade-approval-e2e, cli-scope-upgrade-legacy-repro-e2e. Check run artifacts for logs.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

github-actions · 2026-06-13T11:21:07Z

Selective E2E Results — ❌ Some jobs failed

Run: 27465290873
Target ref: fix/scope-upgrade-late-approval-race
Requested jobs: cli-scope-upgrade-approval-e2e,cli-scope-upgrade-legacy-repro-e2e
Summary: 0 passed, 2 failed, 0 cancelled, 0 skipped

Job	Result
cli-scope-upgrade-approval-e2e	❌ failure
cli-scope-upgrade-legacy-repro-e2e	❌ failure

Failed jobs: cli-scope-upgrade-approval-e2e, cli-scope-upgrade-legacy-repro-e2e. Check run artifacts for logs.

…failures Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

…approve failures" This reverts commit bba211b.

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/nemoclaw-start.test.ts`:
- Around line 2248-2250: The test's assertion currently only matches
"fast-reentry bumped polls=3 approved=0 mode=fast" and can miss alternate bump
lines; update the marker and expectation used in test variable markerRe (and the
subsequent expect using run.stdout.match(...).length) to match any "fast-reentry
bumped" line (ignore the particular field values) and assert the total number of
such occurrences is exactly 1 so any extra bump log (e.g., with approved=1) will
fail the test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 97da2f1c-484a-4ce2-a416-670cedbb0123

📥 Commits

Reviewing files that changed from the base of the PR and between bba211b and cae6c20.

📒 Files selected for processing (1)

test/nemoclaw-start.test.ts

…ng-edge assertion Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

fix(start): drop slow-mode polling on late allowlisted scope upgrades

04069f7

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

Merge branch 'main' into fix/scope-upgrade-late-approval-race

1ff7eb0

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread scripts/nemoclaw-start.sh Outdated

Comment thread test/nemoclaw-start.test.ts Outdated

laitingsheng added integration: openclaw OpenClaw integration behavior bug-fix PR fixes a bug or regression labels Jun 13, 2026

fix(start): bound fast-reentry per request and tighten slow-mode default

2b040db

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread test/e2e/test-cli-scope-upgrade-approval.sh

test(e2e): keep fast-reentry marker check informational

53f7a30

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added v0.0.65 Release target and removed v0.0.65 Release target labels Jun 13, 2026

test(e2e): strip issue-id from artifact, log, env, and sandbox names

759f69b

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added v0.0.65 Release target and removed v0.0.65 Release target labels Jun 13, 2026

laitingsheng added 3 commits June 13, 2026 15:09

test(start): cover concurrent late scope upgrades and sticky approve …

bba211b

…failures Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

Revert "test(start): cover concurrent late scope upgrades and sticky …

21b00dc

…approve failures" This reverts commit bba211b.

test(start): cover concurrent late upgrades and rising-edge fast-reentry

cae6c20

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread test/nemoclaw-start.test.ts Outdated

test: validate two-sandbox concurrent gateway agents and tighten risi…

90026e7

…ng-edge assertion Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added the v0.0.65 Release target label Jun 13, 2026

Conversation

laitingsheng commented Jun 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-code-quality Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage Overview

TypeScript / code-coverage/plugin

TypeScript / code-coverage/cli

Uh oh!

github-actions Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vitest E2E Scenario Recommendation

Vitest E2E Scenario Advisor

Required Vitest E2E scenarios

Optional Vitest E2E scenarios

Relevant changed files

Uh oh!

github-actions Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

github-actions Bot commented Jun 13, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

github-actions Bot commented Jun 13, 2026

Selective E2E Results — ❌ Some jobs failed

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

laitingsheng commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

github-code-quality Bot commented Jun 13, 2026 •

edited

Loading

github-actions Bot commented Jun 13, 2026 •

edited

Loading

github-actions Bot commented Jun 13, 2026 •

edited

Loading

github-actions Bot commented Jun 13, 2026 •

edited

Loading