fix(e2e): stabilize nightly recovery coverage#5401
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds failed device-approval recovery in policy and watcher flows, expands Kimi inference compatibility handling for routed/live runs and nightly secrets, and simplifies one crash-loop recovery E2E guard assertion. ChangesOpenClaw approval recovery
Kimi inference compatibility coverage
Crash-loop recovery E2E assertion
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Code Coverage OverviewLanguages: TypeScript TypeScript / code-coverage/pluginThe overall coverage in the branch is 96%. Coverage data for the branch is not yet available. Show a code coverage summary of the most covered files.
TypeScript / code-coverage/cliThe overall coverage in the branch is 44%. Coverage data for the branch is not yet available. Show a code coverage summary of the most covered files.
Updated |
PR Review AdvisorFindings: 0 needs attention, 5 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
Vitest E2E Scenario RecommendationRequired Vitest E2E scenarios: Dispatch required Vitest E2E scenarios:
Full Vitest E2E advisor summaryVitest E2E Scenario AdvisorBase: Required Vitest E2E scenarios
Optional Vitest E2E scenarios
Relevant changed files
|
Selective E2E Results — ❌ Some jobs failedRun: 27481775942
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
scripts/nemoclaw-start.sh (1)
1882-2029: Run the sandbox boot E2Es for these entrypoint changes.These recovery branches only execute inside the sandbox entrypoint/watchers, so unit coverage will not exercise Landlock, non-root, and process-lifecycle behavior. Please run
sandbox-survival-e2e,sandbox-operations-e2e,cloud-e2e, andopenclaw-slack-pairing-e2ebefore merge.As per coding guidelines,
scripts/nemoclaw-start.sh: "This file is a sandbox entrypoint script. Changes affect every sandbox boot and are invisible to unit tests."Also applies to: 2424-2486
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/nemoclaw-start.sh` around lines 1882 - 2029, The new recovery logic in scripts/nemoclaw-start.sh should be validated with the sandbox boot end-to-end suites, since this entrypoint/watcher behavior is not covered by unit tests. Please run sandbox-survival-e2e, sandbox-operations-e2e, cloud-e2e, and openclaw-slack-pairing-e2e against the changes around run() and the pending-approval recovery branch to verify Landlock, non-root, and process-lifecycle behavior before merge.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/lib/openclaw_device_approval_policy.py`:
- Around line 211-213: The current code deletes the original pending entry using
request_id, but the pending dict is keyed by arbitrary labels (like 'original',
'replacement') not by request_id, leaving stale entries behind. In
scripts/lib/openclaw_device_approval_policy.py at lines 211-213, replace the
pending.pop(request_id, None) call to instead remove the original_key when it
exists, then conditionally remove recovery_key only if it differs from
original_key. In scripts/nemoclaw-start.sh at lines 2481-2482, mirror the same
key-based deletion logic in the inline shell recovery path so the interactive
openclaw devices approve command clears both stale entries consistently.
In `@test/e2e/test-issue-2478-crash-loop-recovery.sh`:
- Around line 227-253: Remove the stale third argument from every
gateway_guards_active call and delete the now-unused gateway_log_marker setup.
The function now only accepts pid and timeout, so update the call sites in
test/e2e/test-issue-2478-crash-loop-recovery.sh that still pass guard_log_start,
negative_guard_log_start, and restore_guard_log_start, and keep the
gateway_guards_active helper aligned with its 2-parameter signature.
In `@test/e2e/test-kimi-inference-compat.sh`:
- Around line 112-123: The ensure_public_nvidia_key() function currently
validates only NVIDIA_INFERENCE_API_KEY against the nvapi-* pattern, but it
should validate that at least one of the two alias variables
(NVIDIA_INFERENCE_API_KEY or NVIDIA_API_KEY) matches the pattern. After the
synchronization logic copies values between the two variables, update the final
validation check to return success if either NVIDIA_INFERENCE_API_KEY or
NVIDIA_API_KEY matches the nvapi-* pattern, ensuring both keys are treated as
valid alternatives for the validation.
---
Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 1882-2029: The new recovery logic in scripts/nemoclaw-start.sh
should be validated with the sandbox boot end-to-end suites, since this
entrypoint/watcher behavior is not covered by unit tests. Please run
sandbox-survival-e2e, sandbox-operations-e2e, cloud-e2e, and
openclaw-slack-pairing-e2e against the changes around run() and the
pending-approval recovery branch to verify Landlock, non-root, and
process-lifecycle behavior before merge.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 05b49698-6094-4adc-8ec8-214535bfd4de
📒 Files selected for processing (12)
.github/workflows/nightly-e2e.yamlci/test-file-size-budget.jsonnemoclaw-blueprint/openclaw-plugins/kimi-inference-compat/index.jsscripts/lib/openclaw_device_approval_policy.pyscripts/nemoclaw-start.shsrc/lib/actions/sandbox/auto-pair-approval.test.tssrc/lib/actions/sandbox/auto-pair-approval.tstest/e2e/test-issue-2478-crash-loop-recovery.shtest/e2e/test-kimi-inference-compat.shtest/kimi-inference-compat-plugin.test.tstest/nemoclaw-start.test.tstest/openclaw-device-approval-policy.test.ts
💤 Files with no reviewable changes (1)
- test/nemoclaw-start.test.ts
| pending.pop(request_id, None) | ||
| if recovery_key: | ||
| pending.pop(recovery_key, None) |
There was a problem hiding this comment.
Delete the original pending entry by its map key, not by requestId.
Both recovery implementations mutate pending.json as a dict keyed by arbitrary labels (original, replacement, ...), but the success path removes the original request with pending.pop(request_id, None). When OpenClaw leaves both the original pending entry and a replacement entry behind, the replacement branch reports recovery success while the original entry stays pending.
scripts/lib/openclaw_device_approval_policy.py#L211-L213: removeoriginal_keywhen it exists, then removerecovery_keyonly if it is different from the original key.scripts/nemoclaw-start.sh#L2481-L2482: mirror the same key-based deletion in the inline shell recovery path so interactiveopenclaw devices approveclears both stale entries too.
📍 Affects 2 files
scripts/lib/openclaw_device_approval_policy.py#L211-L213(this comment)scripts/nemoclaw-start.sh#L2481-L2482
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/lib/openclaw_device_approval_policy.py` around lines 211 - 213, The
current code deletes the original pending entry using request_id, but the
pending dict is keyed by arbitrary labels (like 'original', 'replacement') not
by request_id, leaving stale entries behind. In
scripts/lib/openclaw_device_approval_policy.py at lines 211-213, replace the
pending.pop(request_id, None) call to instead remove the original_key when it
exists, then conditionally remove recovery_key only if it differs from
original_key. In scripts/nemoclaw-start.sh at lines 2481-2482, mirror the same
key-based deletion logic in the inline shell recovery path so the interactive
openclaw devices approve command clears both stale entries consistently.
| gateway_guards_active() { | ||
| local pid="$1" | ||
| local timeout="${2:-30}" | ||
| local log_boundary="${3:-0}" | ||
| local elapsed=0 | ||
|
|
||
| if [ -z "$pid" ]; then | ||
| return 1 | ||
| fi | ||
|
|
||
| local env_contents | ||
| env_contents="$(proxy_env_contents)" | ||
| if ! echo "$env_contents" | grep -q 'nemoclaw-sandbox-safety-net'; then | ||
| echo " [guards] proxy-env.sh missing safety-net export" | ||
| return 1 | ||
| fi | ||
| if ! echo "$env_contents" | grep -q 'nemoclaw-ciao-network-guard'; then | ||
| echo " [guards] proxy-env.sh missing ciao-network-guard export" | ||
| return 1 | ||
| fi | ||
|
|
||
| while [ "$elapsed" -lt "$timeout" ]; do | ||
| if gateway_log_after_boundary "$log_boundary" | grep -Eq '\[sandbox-safety-net\] loaded \((openclaw-gateway|launcher)\)' \ | ||
| && gateway_log_after_boundary "$log_boundary" | grep -Eq '\[guard\] ciao-network-guard loaded \((openclaw-gateway|launcher)\)'; then | ||
| # Confirm gateway is still alive after guard activations. | ||
| if [ -n "$(gateway_pid)" ]; then | ||
| return 0 | ||
| fi | ||
| echo " [guards] guard fired but gateway no longer running" | ||
| return 1 | ||
| fi | ||
| # Backward-compatible proof for older images: this line is emitted by | ||
| # the ciao preload only when ciao calls os.networkInterfaces(). | ||
| if gateway_log_after_boundary "$log_boundary" | grep -Fq '[guard] os.networkInterfaces() failed:'; then | ||
| local env_contents | ||
| env_contents="$(proxy_env_contents)" | ||
| if echo "$env_contents" | grep -q 'nemoclaw-sandbox-safety-net' \ | ||
| && echo "$env_contents" | grep -q 'nemoclaw-ciao-network-guard'; then | ||
| if [ -n "$(gateway_pid)" ]; then | ||
| return 0 | ||
| fi | ||
| echo " [guards] guard fired but gateway no longer running" | ||
| echo " [guards] proxy-env.sh has guard exports but gateway no longer running" | ||
| return 1 | ||
| fi | ||
| sleep 3 | ||
| elapsed=$((elapsed + 3)) | ||
| done | ||
|
|
||
| echo " [guards] no fresh gateway-process guard activation signatures in gateway.log within ${timeout}s" | ||
| echo " [guards] proxy-env.sh missing safety-net or ciao guard exports within ${timeout}s" | ||
| return 1 | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Find all gateway_guards_active call sites and gateway_log_marker assignments
# Find all calls to gateway_guards_active
echo "=== gateway_guards_active call sites ==="
rg -n 'gateway_guards_active\s+' test/e2e/test-issue-2478-crash-loop-recovery.sh
# Find all gateway_log_marker assignments that capture into *_guard_log_start variables
echo ""
echo "=== gateway_log_marker assignments to *guard_log_start variables ==="
rg -n '(guard_log_start|negative_guard_log_start|restore_guard_log_start)=.*gateway_log_marker' test/e2e/test-issue-2478-crash-loop-recovery.shRepository: NVIDIA/NemoClaw
Length of output: 729
🏁 Script executed:
# Read the function definition to confirm its signature
sed -n '227,253p' test/e2e/test-issue-2478-crash-loop-recovery.shRepository: NVIDIA/NemoClaw
Length of output: 771
🏁 Script executed:
# Check if the marker variables are used anywhere beyond their assignment and the call sites
echo "=== Checking for other uses of guard_log_start ==="
rg -n 'guard_log_start' test/e2e/test-issue-2478-crash-loop-recovery.sh
echo ""
echo "=== Checking for other uses of negative_guard_log_start ==="
rg -n 'negative_guard_log_start' test/e2e/test-issue-2478-crash-loop-recovery.sh
echo ""
echo "=== Checking for other uses of restore_guard_log_start ==="
rg -n 'restore_guard_log_start' test/e2e/test-issue-2478-crash-loop-recovery.shRepository: NVIDIA/NemoClaw
Length of output: 1116
Function signature change incomplete — four call sites still pass the removed log_boundary parameter.
The function signature was updated to accept only 2 parameters (pid and timeout), but the following call sites still pass three arguments:
- Line 556:
gateway_guards_active "$new_pid" 30 "$guard_log_start" - Line 639:
gateway_guards_active "$NEGATIVE_PID" 30 "$negative_guard_log_start" - Line 665:
gateway_guards_active "$NEGATIVE_PID" 5 "$negative_guard_log_start" - Line 685:
gateway_guards_active "$SOAK_START_PID" 30 "$restore_guard_log_start"
While bash silently ignores extra arguments, the variables guard_log_start (line 528), negative_guard_log_start (line 600), and restore_guard_log_start (line 674) are computed via gateway_log_marker() but never used—they only serve as dead 3rd arguments.
Remove the third argument from all four call sites and delete the unused gateway_log_marker() invocations.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/test-issue-2478-crash-loop-recovery.sh` around lines 227 - 253,
Remove the stale third argument from every gateway_guards_active call and delete
the now-unused gateway_log_marker setup. The function now only accepts pid and
timeout, so update the call sites in
test/e2e/test-issue-2478-crash-loop-recovery.sh that still pass guard_log_start,
negative_guard_log_start, and restore_guard_log_start, and keep the
gateway_guards_active helper aligned with its 2-parameter signature.
| ensure_public_nvidia_key() { | ||
| if [ -z "${NVIDIA_INFERENCE_API_KEY:-}" ] && [ -n "${NVIDIA_API_KEY:-}" ]; then | ||
| export NVIDIA_INFERENCE_API_KEY="$NVIDIA_API_KEY" | ||
| fi | ||
| if [ -z "${NVIDIA_API_KEY:-}" ] && [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ]; then | ||
| export NVIDIA_API_KEY="$NVIDIA_INFERENCE_API_KEY" | ||
| fi | ||
| if [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ] && [[ "${NVIDIA_INFERENCE_API_KEY}" == nvapi-* ]]; then | ||
| return 0 | ||
| fi | ||
| return 1 | ||
| } |
There was a problem hiding this comment.
Validate both NVIDIA key aliases against the same rule before returning success.
ensure_public_nvidia_key() currently succeeds/fails solely based on NVIDIA_INFERENCE_API_KEY after one-way fill. If both vars are set and only NVIDIA_API_KEY is valid, the function can fail despite the “either key” contract.
Suggested fix
ensure_public_nvidia_key() {
- if [ -z "${NVIDIA_INFERENCE_API_KEY:-}" ] && [ -n "${NVIDIA_API_KEY:-}" ]; then
- export NVIDIA_INFERENCE_API_KEY="$NVIDIA_API_KEY"
- fi
- if [ -z "${NVIDIA_API_KEY:-}" ] && [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ]; then
- export NVIDIA_API_KEY="$NVIDIA_INFERENCE_API_KEY"
- fi
- if [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ] && [[ "${NVIDIA_INFERENCE_API_KEY}" == nvapi-* ]]; then
- return 0
- fi
- return 1
+ local candidate=""
+ if [ -n "${NVIDIA_API_KEY:-}" ] && [[ "${NVIDIA_API_KEY}" == nvapi-* ]]; then
+ candidate="${NVIDIA_API_KEY}"
+ elif [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ] && [[ "${NVIDIA_INFERENCE_API_KEY}" == nvapi-* ]]; then
+ candidate="${NVIDIA_INFERENCE_API_KEY}"
+ else
+ return 1
+ fi
+ export NVIDIA_API_KEY="${candidate}"
+ export NVIDIA_INFERENCE_API_KEY="${candidate}"
+ return 0
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ensure_public_nvidia_key() { | |
| if [ -z "${NVIDIA_INFERENCE_API_KEY:-}" ] && [ -n "${NVIDIA_API_KEY:-}" ]; then | |
| export NVIDIA_INFERENCE_API_KEY="$NVIDIA_API_KEY" | |
| fi | |
| if [ -z "${NVIDIA_API_KEY:-}" ] && [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ]; then | |
| export NVIDIA_API_KEY="$NVIDIA_INFERENCE_API_KEY" | |
| fi | |
| if [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ] && [[ "${NVIDIA_INFERENCE_API_KEY}" == nvapi-* ]]; then | |
| return 0 | |
| fi | |
| return 1 | |
| } | |
| ensure_public_nvidia_key() { | |
| local candidate="" | |
| if [ -n "${NVIDIA_API_KEY:-}" ] && [[ "${NVIDIA_API_KEY}" == nvapi-* ]]; then | |
| candidate="${NVIDIA_API_KEY}" | |
| elif [ -n "${NVIDIA_INFERENCE_API_KEY:-}" ] && [[ "${NVIDIA_INFERENCE_API_KEY}" == nvapi-* ]]; then | |
| candidate="${NVIDIA_INFERENCE_API_KEY}" | |
| else | |
| return 1 | |
| fi | |
| export NVIDIA_API_KEY="${candidate}" | |
| export NVIDIA_INFERENCE_API_KEY="${candidate}" | |
| return 0 | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@test/e2e/test-kimi-inference-compat.sh` around lines 112 - 123, The
ensure_public_nvidia_key() function currently validates only
NVIDIA_INFERENCE_API_KEY against the nvapi-* pattern, but it should validate
that at least one of the two alias variables (NVIDIA_INFERENCE_API_KEY or
NVIDIA_API_KEY) matches the pattern. After the synchronization logic copies
values between the two variables, update the final validation check to return
success if either NVIDIA_INFERENCE_API_KEY or NVIDIA_API_KEY matches the nvapi-*
pattern, ensuring both keys are treated as valid alternatives for the
validation.
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Selective E2E Results — ❌ Some jobs failedRun: 27482229875
|
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Selective E2E Results — ❌ Some jobs failedRun: 27482361445
|
## Summary Declares `supportsStore: false` in the OpenClaw Kimi K2.6 model-specific setup manifest so managed NVIDIA/Kimi routes get the same compatibility shape as the previous custom mock path. This fixes the live `kimi-inference-compat-e2e` K2 config assertion that failed after #5401 moved the scenario to public NVIDIA Endpoints. ## Related Issue Related to #5401. ## Changes - Add `supportsStore: false` to `nemoclaw-blueprint/model-specific-setup/openclaw/kimi-k2.6-managed-inference.json`. - Update `test/generate-openclaw-config.test.ts` so registry-only Kimi compat includes the store flag. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] Git hooks passed during commit and push, or `npx prek run --from-ref main --to-ref HEAD` passes - [x] Targeted tests pass for changed behavior - [ ] Full `npm test` passes (broad runtime changes only) - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Targeted verification: - `npx biome check --write nemoclaw-blueprint/model-specific-setup/openclaw/kimi-k2.6-managed-inference.json test/generate-openclaw-config.test.ts` - `npx vitest run --project cli test/generate-openclaw-config.test.ts test/validate-config-schemas.test.ts --testNamePattern "Kimi|model-specific|registry compat"` Docs review: no user-facing docs changes needed; this only completes an existing model-specific OpenClaw compatibility manifest. Note: local broad pre-commit/pre-push hooks currently fail in unrelated runtime recovery preload tests because temp preload files are seen as group-writable (`mode=664`), matching the existing local hook failure observed on #5401 follow-up work. Targeted changed tests passed. --- Signed-off-by: Carlos Villela <cvillela@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Updated model-specific configuration to disable store support * **Tests** * Updated test case formatting <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Carlos Villela <cvillela@nvidia.com>
## Summary Stabilizes the live Kimi inference compatibility E2E by accepting Kimi tool calls emitted across multiple assistant turns. The full nightly run showed Kimi successfully executing all three expected tools and producing the final answer, but the trajectory checker only inspected the last assistant tool-call message and failed when the model emitted one command per assistant turn. ## Related Issue Related to #5401. ## Changes - Flatten tool-call content from all assistant tool-call messages in `test/e2e/test-kimi-inference-compat.sh` before checking the expected `hostname`, `date`, `uptime` order. - Preserve the existing checks that the trace artifacts contain exactly three `exec` tool metas, no combined semicolon command remains, final status is success, and the final assistant response follows all tool results. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] Git hooks passed during commit and push, or `npx prek run --from-ref main --to-ref HEAD` passes - [x] Targeted tests pass for changed behavior - [ ] Full `npm test` passes (broad runtime changes only) - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Targeted verification: - `bash -n test/e2e/test-kimi-inference-compat.sh` Nightly evidence: - Main full nightly run 27487294753 failed `kimi-inference-compat-e2e` only because `sourceAssistantCommands` was `['uptime']` while the same trajectory showed `toolMetasCount: 3`, command set `date/hostname/uptime`, final status `success`, and final assistant text was correct. Docs review: no user-facing docs changes needed; this is E2E harness stabilization only. --- Signed-off-by: Carlos Villela <cvillela@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Tests** * Enhanced end-to-end test validation to comprehensively examine tool call handling across all assistant messages with tool call blocks, ensuring more thorough assertion checks for command behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Summary
Stabilizes the nightly E2E follow-up failures by keeping the Kimi scenario on the public NVIDIA Kimi endpoint, recognizing the generated routed Kimi model ref, and repairing narrow OpenClaw scope-approval failures that report nonzero after local state changes. It also aligns the crash-loop recovery assertion with the current gateway guard markers and keeps the test-size budget ratcheted after moving coverage into a focused policy test.
Related Issue
Related to #2478, #4462, #2620, #3046.
Changes
nvidia-prodwithmoonshotai/kimi-k2.6; retain the local mock only behindNEMOCLAW_KIMI_USE_MOCK=1and sanitize Kimi failure logs.inference/moonshotai/kimi-k2.6model ref in the Kimi compatibility plugin.operator.admin.test/nemoclaw-start.test.tssize budget downward.Type of Change
Verification
npx prek run --from-ref main --to-ref HEADpassesnpm testpasses (broad runtime changes only)npm run docsbuilds without warnings (doc changes only)Docs review found no user-facing docs changes were warranted;
docs/and generated.agents/skills/remained clean.Signed-off-by: Carlos Villela cvillela@nvidia.com
Summary by CodeRabbit
New Features
Bug Fixes
Security & CI
Tests / Chores