fix(ci): align hosted nightly inference defaults by cv · Pull Request #5399 · NVIDIA/NemoClaw

cv · 2026-06-13T19:38:03Z

Summary

Fixes the hosted nightly E2E inference defaults that were still pointing at a stale double-prefixed Nemotron model ID, and keeps rebuild fixture registry metadata aligned with the provider/model selected during onboarding. This should unblock the hosted validation failures plus the rebuild/upgrade resume failures where CI onboarded as compatible-endpoint but the fixture registry forced nvidia-prod.

Changes

Updated hosted CI/custom compatible inference defaults to nvidia/nemotron-3-super-120b-a12b across workflows, onboarding defaults, E2E fixtures, and docs.
Changed rebuild/upgrade E2E fixture registry seeding to preserve the current onboard session provider/model before falling back to env/default values.
Added workflow/script contract coverage so hosted model IDs and rebuild fixture provider/model alignment do not regress.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

Git hooks passed during commit and push, or npx prek run --from-ref main --to-ref HEAD passes
Targeted tests pass for changed behavior
Full npm test passes (broad runtime changes only)
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Notes:

npm run docs passed with 0 errors; Fern reported 2 hidden warnings.
npx prek run --all-files passed.
npm test -- test/e2e-script-workflow.test.ts test/e2e-scenario/support-tests/hosted-inference.test.ts src/lib/onboard/providers.test.ts passed.
bash -n passed for the changed rebuild/upgrade shell fixtures.
rg -n "nvidia/nvidia/nemotron-3-super-v3" .github src test tools docs || true has no product/workflow/doc hits; the only remaining nemotron-3-super-v3 reference is a regression assertion that rejects the old ID.

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Release Notes

Chores
- Updated hosted inference model configuration across CI/CD workflows
Documentation
- Updated Model Router configuration example
Refactor
- Refactored test scripts to dynamically derive model and provider configuration instead of using hardcoded values
Tests
- Updated E2E test expectations and hosted inference configuration to align with model changes

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

coderabbitai · 2026-06-13T19:38:18Z

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR updates the hosted inference model identifier to nvidia/nemotron-3-super-120b-a12b across CI workflows, source constants, and test infrastructure. Additionally, three E2E shell scripts are refactored to derive provider and model dynamically from session state and environment variables rather than using hardcoded values.

Changes

Nemotron Model Update and E2E Script Refactoring

Layer / File(s)	Summary
CI Workflow Model Configuration `.github/workflows/e2e-script.yaml`, `.github/workflows/e2e-vitest-scenarios.yaml`, `.github/workflows/nightly-e2e.yaml`	`NEMOCLAW_MODEL` and `NEMOCLAW_COMPAT_MODEL` are updated from `nvidia/nvidia/nemotron-3-super-v3` to `nvidia/nemotron-3-super-120b-a12b` across all E2E workflow jobs (e2e-script, credential-migration-vitest, openclaw-tui, issue-4434, token-rotation, sandbox-operations, credential-migration, onboard-repair/resume/negative-paths, runtime-overrides, credential-sanitization, telegram-injection, and launchable-smoke).
Source and Fixture Model Constants `src/lib/onboard/providers.ts`, `test/e2e-scenario/fixtures/hosted-inference.ts`, `test/e2e/lib/ci-compatible-inference.sh`, `docs/inference/inference-options.mdx`	Constants are updated: `HOSTED_INFERENCE_MODEL` in providers.ts, `DEFAULT_HOSTED_INFERENCE_MODEL` in test fixtures, and `NEMOCLAW_E2E_COMPATIBLE_INFERENCE_MODEL_DEFAULT` in CI helper all switch to the new model identifier; documentation example for Model Router configuration is updated.
E2E Shell Script Dynamic Provider/Model Derivation `test/e2e/test-rebuild-hermes.sh`, `test/e2e/test-rebuild-openclaw.sh`, `test/e2e/test-upgrade-stale-sandbox.sh`	Python registry update logic now loads session JSON early and derives `provider` and `model` from session fields or environment variables (`NEMOCLAW_ENDPOINT_URL`/`COMPATIBLE_API_KEY` for provider, `NEMOCLAW_MODEL`/`NEMOCLAW_COMPAT_MODEL` for model) with fallbacks, replacing hardcoded constants; redundant session-loading blocks are consolidated.
Test Contract and Assertion Updates `test/e2e-script-workflow.test.ts`	Assertions are updated to expect the new model identifier in credential-migration Vitest, hosted CI inference routing, and direct custom inference jobs; a new test validates that rebuild fixture scripts derive `provider` and `model` from session state and do not hardcode the production provider or old model values.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

NVIDIA/NemoClaw#5155: Removes legacy test-onboard-inference-smoke.sh from allowlist in test contracts, aligning with E2E infrastructure cleanup referenced in this PR's test updates.
NVIDIA/NemoClaw#5385: Routes nightly hosted inference through custom provider endpoint with related model configuration changes in the same CI workflows updated here.
NVIDIA/NemoClaw#5380: Updates credential-migration and hosted-inference E2E wiring by consuming NEMOCLAW_MODEL and NEMOCLAW_COMPAT_MODEL in the same test fixtures and workflows modified in this PR.

Suggested labels

bug-fix, area: ci, area: inference, area: onboarding, area: e2e, documentation

Suggested reviewers

prekshivyas

Poem

🐰 A model update hops through the CI,
From super-v3 to a newer spree,
Scripts now dance with session-derived delight,
Dynamic paths replace hardcoded blight,
Nemotron-120b lights the E2E night! 🚀

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: updating hosted inference model defaults in CI workflows and related files to resolve a stale model ID issue.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/fix-nightly-e2e-regressions

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-13T19:38:43Z

🌿 Preview your docs: https://nvidia-preview-pr-5399.docs.buildwithfern.com/nemoclaw

github-code-quality · 2026-06-13T19:38:47Z

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/plugin

The overall coverage in the codex/fix-nightly-e2... branch is 96%. Coverage data for the main branch is not yet available.

Show a code coverage summary of the most covered files.

File	main	codex/fix-nightly-e2... `a0f260f`	+/-
`nemoclaw/src/se...cret-scanner.ts`	—	100%	—
`nemoclaw/src/commands/slash.ts`	—	100%	—
`nemoclaw/src/li...bprocess-env.ts`	—	100%	—
`nemoclaw/src/bl...eprint/state.ts`	—	98%	—
`nemoclaw/src/onboard/config.ts`	—	98%	—
`nemoclaw/src/bl...int/snapshot.ts`	—	97%	—
`nemoclaw/src/bl...print/runner.ts`	—	95%	—
`nemoclaw/src/co...ration-state.ts`	—	94%	—
`nemoclaw/src/bl...ate-networks.ts`	—	94%	—
`nemoclaw/src/index.ts`	—	94%	—

TypeScript / code-coverage/cli

The overall coverage in the codex/fix-nightly-e2... branch is 44%. Coverage data for the main branch is not yet available.

Show a code coverage summary of the most covered files.

File	main	codex/fix-nightly-e2... `a0f260f`	+/-
`src/lib/state/o...oard-session.ts`	—	90%	—
`src/lib/inference/local.ts`	—	77%	—
`src/lib/sandbox/config.ts`	—	72%	—
`src/lib/inference/nim.ts`	—	72%	—
`src/lib/onboard/preflight.ts`	—	64%	—
`src/lib/state/sandbox.ts`	—	55%	—
`src/lib/onboard...er-gpu-patch.ts`	—	50%	—
`src/lib/actions...licy-channel.ts`	—	49%	—
`src/lib/policy/index.ts`	—	48%	—
`src/lib/onboard.ts`	—	17%	—

_{Updated June 13, 2026 19:41 UTC
Code Coverage is in Public Preview. Learn more and provide us with your feedback.}

github-actions · 2026-06-13T19:40:03Z

E2E Advisor Recommendation

Required E2E: cloud-inference-vitest, credential-migration-vitest, rebuild-openclaw-e2e, upgrade-stale-sandbox-e2e, rebuild-hermes-e2e
Optional E2E: inference-routing-vitest, launchable-smoke-e2e, rebuild-hermes-stale-base-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

cloud-inference-vitest (medium): Primary live confidence check for hosted/custom OpenAI-compatible inference after changing the hosted model default and fixture wiring.
credential-migration-vitest (medium): Validates NVIDIA_INFERENCE_API_KEY is staged through the compatible-provider credential path using the new hosted model in the live credential migration flow.
rebuild-openclaw-e2e (high): Directly exercises the changed OpenClaw rebuild fixture that now aligns registry provider/model with the onboard session before running a real sandbox rebuild.
upgrade-stale-sandbox-e2e (high): Directly exercises the changed stale-sandbox upgrade fixture and verifies stale detection plus rebuild using the session-aligned provider/model registry values.
rebuild-hermes-e2e (high): Directly exercises the changed Hermes rebuild fixture, including registry/session inference metadata and real Hermes sandbox rebuild behavior.

Optional E2E

inference-routing-vitest (medium): Adjacent route-selection coverage for custom/OpenAI-compatible inference after the hosted model ID and compatible endpoint defaults changed.
launchable-smoke-e2e (medium): Useful end-to-end install/onboard smoke for a direct nightly job that consumes the changed hosted inference environment values.
rebuild-hermes-stale-base-e2e (high): Additional Hermes rebuild confidence for the stale-base variant of the same changed script; useful if time permits, but the standard Hermes rebuild covers the modified registry/session path.

New E2E recommendations

None.

github-actions · 2026-06-13T19:40:04Z

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: e2e-scenarios-all
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref>

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

e2e-scenarios-all: The PR changes the shared Vitest scenario workflow and a shared hosted-inference fixture used by live Vitest scenarios, so run the full Vitest scenario fan-out to cover registry-driven scenarios and free-standing Vitest jobs under the canonical workflow.
- Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref>

Optional Vitest E2E scenarios

None.

Relevant changed files

.github/workflows/e2e-vitest-scenarios.yaml
src/lib/onboard/providers.ts
test/e2e-scenario/fixtures/hosted-inference.ts

github-actions · 2026-06-13T19:41:27Z

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Top item: No blocking code findings

Consider writing more tests for

**Runtime validation** — rebuild fixtures seed registry provider/model from onboard-session when session provider/model differ from env defaults. The patch touches live workflow, hosted inference, and sandbox rebuild/upgrade infrastructure paths. Static and unit contract coverage is present, but runtime validation is still useful for the environment-dependent hosted-compatible and rebuild paths.
**Runtime validation** — rebuild fixtures fall back to hosted compatible provider/model when onboard-session is missing and COMPATIBLE_API_KEY is staged. The patch touches live workflow, hosted inference, and sandbox rebuild/upgrade infrastructure paths. Static and unit contract coverage is present, but runtime validation is still useful for the environment-dependent hosted-compatible and rebuild paths.
**Runtime validation** — hosted inference fixture default model is nvidia/nemotron-3-super-120b-a12b and never the old double-prefixed ID. The patch touches live workflow, hosted inference, and sandbox rebuild/upgrade infrastructure paths. Static and unit contract coverage is present, but runtime validation is still useful for the environment-dependent hosted-compatible and rebuild paths.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

## Summary Reverts the hosted custom inference model-ID changes from #5399 so CI continues using the model ID actually served by `https://inference-api.nvidia.com/v1/chat/completions`. Bounds the ordinary OpenAI-compatible chat-completions onboarding validation probe with `max_tokens: 8`, and keeps rebuild/upgrade E2E registry metadata aligned with the hosted-compatible onboarding session. ## Changes - Restore hosted custom inference defaults and workflow/test expectations to `nvidia/nvidia/nemotron-3-super-v3`. - Preserve provider/model in rebuild and upgrade E2E fixture registry seeding from the onboard session, falling back to hosted-compatible env values. - Use the hosted-compatible model env for post-rebuild inference smoke calls instead of hardcoding the public `nvidia-prod` model ID. - Add `max_tokens: 8` to the non-strict chat-completions validation probe payload. - Add regression coverage for the bounded probe payload and rebuild fixture provider/model alignment. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] Git hooks passed during commit and push, or `npx prek run --from-ref main --to-ref HEAD` passes - [x] Targeted tests pass for changed behavior - [ ] Full `npm test` passes (broad runtime changes only) - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Notes: - `npx prek run --from-ref main --to-ref HEAD` passed before the latest fixture update; commit and push hooks passed for the latest update. - `bash -n` passed for the changed rebuild/upgrade shell fixtures. - `npm test -- src/lib/inference/onboard-probes.test.ts test/e2e-script-workflow.test.ts src/lib/onboard/providers.test.ts` passed. - `npm test -- test/onboard-selection.test.ts test/stale-dist-check.test.ts src/lib/inference/onboard-probes.test.ts` passed. - Isolated rerun of `npm test -- test/onboard-model-router.test.ts -t "prefers the managed Model Router command over PATH"` passed after one transient commit-hook failure in that unrelated test. - `npm run docs` passed with 0 errors; Fern reported 2 hidden warnings, so the docs-without-warnings checkbox is left unchecked. --- Signed-off-by: Carlos Villela <cvillela@nvidia.com>  ## Summary by CodeRabbit * **Chores** * Updated default hosted inference model identifier across CI/workflows, test fixtures, and configs to a new model. * Updated scripts to compute provider/model with new default/mapping logic (including a compatible-endpoint mapping). * **Bug Fixes** * Set a baseline max_tokens cap for certain hosted probe requests. * **Tests** * Adjusted E2E/contract tests and fixtures to expect the new model; improved defensive fallback handling and removed a deprecated alignment test.  --------- Signed-off-by: Carlos Villela <cvillela@nvidia.com>

fix(ci): align hosted nightly inference defaults

a0f260f

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv self-assigned this Jun 13, 2026

cv merged commit 3f003c0 into main Jun 13, 2026
47 of 48 checks passed

cv deleted the codex/fix-nightly-e2e-regressions branch June 13, 2026 19:43

cv mentioned this pull request Jun 13, 2026

fix(onboard): bound compatible endpoint probe #5400

Merged

13 tasks

Conversation

cv commented Jun 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type of Change

Verification

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-code-quality Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Coverage Overview

TypeScript / code-coverage/plugin

TypeScript / code-coverage/cli

Uh oh!

github-actions Bot commented Jun 13, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Uh oh!

github-actions Bot commented Jun 13, 2026

Vitest E2E Scenario Recommendation

Vitest E2E Scenario Advisor

Required Vitest E2E scenarios

Optional Vitest E2E scenarios

Relevant changed files

Uh oh!

github-actions Bot commented Jun 13, 2026

PR Review Advisor

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cv commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 13, 2026 •

edited

Loading

github-code-quality Bot commented Jun 13, 2026 •

edited

Loading