Skip to content

fix(ci): align hosted nightly inference defaults#5399

Merged
cv merged 1 commit into
mainfrom
codex/fix-nightly-e2e-regressions
Jun 13, 2026
Merged

fix(ci): align hosted nightly inference defaults#5399
cv merged 1 commit into
mainfrom
codex/fix-nightly-e2e-regressions

Conversation

@cv

@cv cv commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes the hosted nightly E2E inference defaults that were still pointing at a stale double-prefixed Nemotron model ID, and keeps rebuild fixture registry metadata aligned with the provider/model selected during onboarding. This should unblock the hosted validation failures plus the rebuild/upgrade resume failures where CI onboarded as compatible-endpoint but the fixture registry forced nvidia-prod.

Changes

  • Updated hosted CI/custom compatible inference defaults to nvidia/nemotron-3-super-120b-a12b across workflows, onboarding defaults, E2E fixtures, and docs.
  • Changed rebuild/upgrade E2E fixture registry seeding to preserve the current onboard session provider/model before falling back to env/default values.
  • Added workflow/script contract coverage so hosted model IDs and rebuild fixture provider/model alignment do not regress.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • Git hooks passed during commit and push, or npx prek run --from-ref main --to-ref HEAD passes
  • Targeted tests pass for changed behavior
  • Full npm test passes (broad runtime changes only)
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Notes:

  • npm run docs passed with 0 errors; Fern reported 2 hidden warnings.
  • npx prek run --all-files passed.
  • npm test -- test/e2e-script-workflow.test.ts test/e2e-scenario/support-tests/hosted-inference.test.ts src/lib/onboard/providers.test.ts passed.
  • bash -n passed for the changed rebuild/upgrade shell fixtures.
  • rg -n "nvidia/nvidia/nemotron-3-super-v3" .github src test tools docs || true has no product/workflow/doc hits; the only remaining nemotron-3-super-v3 reference is a regression assertion that rejects the old ID.

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Release Notes

  • Chores

    • Updated hosted inference model configuration across CI/CD workflows
  • Documentation

    • Updated Model Router configuration example
  • Refactor

    • Refactored test scripts to dynamically derive model and provider configuration instead of using hardcoded values
  • Tests

    • Updated E2E test expectations and hosted inference configuration to align with model changes

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv self-assigned this Jun 13, 2026
@coderabbitai

coderabbitai Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

This PR updates the hosted inference model identifier to nvidia/nemotron-3-super-120b-a12b across CI workflows, source constants, and test infrastructure. Additionally, three E2E shell scripts are refactored to derive provider and model dynamically from session state and environment variables rather than using hardcoded values.

Changes

Nemotron Model Update and E2E Script Refactoring

Layer / File(s) Summary
CI Workflow Model Configuration
.github/workflows/e2e-script.yaml, .github/workflows/e2e-vitest-scenarios.yaml, .github/workflows/nightly-e2e.yaml
NEMOCLAW_MODEL and NEMOCLAW_COMPAT_MODEL are updated from nvidia/nvidia/nemotron-3-super-v3 to nvidia/nemotron-3-super-120b-a12b across all E2E workflow jobs (e2e-script, credential-migration-vitest, openclaw-tui, issue-4434, token-rotation, sandbox-operations, credential-migration, onboard-repair/resume/negative-paths, runtime-overrides, credential-sanitization, telegram-injection, and launchable-smoke).
Source and Fixture Model Constants
src/lib/onboard/providers.ts, test/e2e-scenario/fixtures/hosted-inference.ts, test/e2e/lib/ci-compatible-inference.sh, docs/inference/inference-options.mdx
Constants are updated: HOSTED_INFERENCE_MODEL in providers.ts, DEFAULT_HOSTED_INFERENCE_MODEL in test fixtures, and NEMOCLAW_E2E_COMPATIBLE_INFERENCE_MODEL_DEFAULT in CI helper all switch to the new model identifier; documentation example for Model Router configuration is updated.
E2E Shell Script Dynamic Provider/Model Derivation
test/e2e/test-rebuild-hermes.sh, test/e2e/test-rebuild-openclaw.sh, test/e2e/test-upgrade-stale-sandbox.sh
Python registry update logic now loads session JSON early and derives provider and model from session fields or environment variables (NEMOCLAW_ENDPOINT_URL/COMPATIBLE_API_KEY for provider, NEMOCLAW_MODEL/NEMOCLAW_COMPAT_MODEL for model) with fallbacks, replacing hardcoded constants; redundant session-loading blocks are consolidated.
Test Contract and Assertion Updates
test/e2e-script-workflow.test.ts
Assertions are updated to expect the new model identifier in credential-migration Vitest, hosted CI inference routing, and direct custom inference jobs; a new test validates that rebuild fixture scripts derive provider and model from session state and do not hardcode the production provider or old model values.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#5155: Removes legacy test-onboard-inference-smoke.sh from allowlist in test contracts, aligning with E2E infrastructure cleanup referenced in this PR's test updates.
  • NVIDIA/NemoClaw#5385: Routes nightly hosted inference through custom provider endpoint with related model configuration changes in the same CI workflows updated here.
  • NVIDIA/NemoClaw#5380: Updates credential-migration and hosted-inference E2E wiring by consuming NEMOCLAW_MODEL and NEMOCLAW_COMPAT_MODEL in the same test fixtures and workflows modified in this PR.

Suggested labels

bug-fix, area: ci, area: inference, area: onboarding, area: e2e, documentation

Suggested reviewers

  • prekshivyas

Poem

🐰 A model update hops through the CI,
From super-v3 to a newer spree,
Scripts now dance with session-derived delight,
Dynamic paths replace hardcoded blight,
Nemotron-120b lights the E2E night! 🚀

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: updating hosted inference model defaults in CI workflows and related files to resolve a stale model ID issue.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/fix-nightly-e2e-regressions

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
Contributor

@github-code-quality

github-code-quality Bot commented Jun 13, 2026

Copy link
Copy Markdown

Code Coverage Overview

Languages: TypeScript

TypeScript / code-coverage/plugin

The overall coverage in the branch is 96%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File a0f260f +/-
nemoclaw/src/se...cret-scanner.ts 100%
nemoclaw/src/commands/slash.ts 100%
nemoclaw/src/li...bprocess-env.ts 100%
nemoclaw/src/bl...eprint/state.ts 98%
nemoclaw/src/onboard/config.ts 98%
nemoclaw/src/bl...int/snapshot.ts 97%
nemoclaw/src/bl...print/runner.ts 95%
nemoclaw/src/co...ration-state.ts 94%
nemoclaw/src/bl...ate-networks.ts 94%
nemoclaw/src/index.ts 94%

TypeScript / code-coverage/cli

The overall coverage in the branch is 44%. Coverage data for the branch is not yet available.

Show a code coverage summary of the most covered files.
File a0f260f +/-
src/lib/state/o...oard-session.ts 90%
src/lib/inference/local.ts 77%
src/lib/sandbox/config.ts 72%
src/lib/inference/nim.ts 72%
src/lib/onboard/preflight.ts 64%
src/lib/state/sandbox.ts 55%
src/lib/onboard...er-gpu-patch.ts 50%
src/lib/actions...licy-channel.ts 49%
src/lib/policy/index.ts 48%
src/lib/onboard.ts 17%

Updated June 13, 2026 19:41 UTC
Code Coverage is in Public Preview. Learn more and provide us with your feedback.

@github-actions

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: cloud-inference-vitest, credential-migration-vitest, rebuild-openclaw-e2e, upgrade-stale-sandbox-e2e, rebuild-hermes-e2e
Optional E2E: inference-routing-vitest, launchable-smoke-e2e, rebuild-hermes-stale-base-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • cloud-inference-vitest (medium): Primary live confidence check for hosted/custom OpenAI-compatible inference after changing the hosted model default and fixture wiring.
  • credential-migration-vitest (medium): Validates NVIDIA_INFERENCE_API_KEY is staged through the compatible-provider credential path using the new hosted model in the live credential migration flow.
  • rebuild-openclaw-e2e (high): Directly exercises the changed OpenClaw rebuild fixture that now aligns registry provider/model with the onboard session before running a real sandbox rebuild.
  • upgrade-stale-sandbox-e2e (high): Directly exercises the changed stale-sandbox upgrade fixture and verifies stale detection plus rebuild using the session-aligned provider/model registry values.
  • rebuild-hermes-e2e (high): Directly exercises the changed Hermes rebuild fixture, including registry/session inference metadata and real Hermes sandbox rebuild behavior.

Optional E2E

  • inference-routing-vitest (medium): Adjacent route-selection coverage for custom/OpenAI-compatible inference after the hosted model ID and compatible endpoint defaults changed.
  • launchable-smoke-e2e (medium): Useful end-to-end install/onboard smoke for a direct nightly job that consumes the changed hosted inference environment values.
  • rebuild-hermes-stale-base-e2e (high): Additional Hermes rebuild confidence for the stale-base variant of the same changed script; useful if time permits, but the standard Hermes rebuild covers the modified registry/session path.

New E2E recommendations

  • None.

@github-actions

Copy link
Copy Markdown
Contributor

Vitest E2E Scenario Recommendation

Required Vitest E2E scenarios: e2e-scenarios-all
Optional Vitest E2E scenarios: None

Dispatch required Vitest E2E scenarios:

  • gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref>

Workflow run

Full Vitest E2E advisor summary

Vitest E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required Vitest E2E scenarios

  • e2e-scenarios-all: The PR changes the shared Vitest scenario workflow and a shared hosted-inference fixture used by live Vitest scenarios, so run the full Vitest scenario fan-out to cover registry-driven scenarios and free-standing Vitest jobs under the canonical workflow.
    • Dispatch: gh workflow run e2e-vitest-scenarios.yaml --ref <pr-head-ref>

Optional Vitest E2E scenarios

  • None.

Relevant changed files

  • .github/workflows/e2e-vitest-scenarios.yaml
  • src/lib/onboard/providers.ts
  • test/e2e-scenario/fixtures/hosted-inference.ts

@github-actions

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 0 worth checking, 0 nice ideas
Top item: No blocking code findings

Consider writing more tests for
  • **Runtime validation** — rebuild fixtures seed registry provider/model from onboard-session when session provider/model differ from env defaults. The patch touches live workflow, hosted inference, and sandbox rebuild/upgrade infrastructure paths. Static and unit contract coverage is present, but runtime validation is still useful for the environment-dependent hosted-compatible and rebuild paths.
  • **Runtime validation** — rebuild fixtures fall back to hosted compatible provider/model when onboard-session is missing and COMPATIBLE_API_KEY is staged. The patch touches live workflow, hosted inference, and sandbox rebuild/upgrade infrastructure paths. Static and unit contract coverage is present, but runtime validation is still useful for the environment-dependent hosted-compatible and rebuild paths.
  • **Runtime validation** — hosted inference fixture default model is nvidia/nemotron-3-super-120b-a12b and never the old double-prefixed ID. The patch touches live workflow, hosted inference, and sandbox rebuild/upgrade infrastructure paths. Static and unit contract coverage is present, but runtime validation is still useful for the environment-dependent hosted-compatible and rebuild paths.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@cv cv merged commit 3f003c0 into main Jun 13, 2026
47 of 48 checks passed
@cv cv deleted the codex/fix-nightly-e2e-regressions branch June 13, 2026 19:43
cv added a commit that referenced this pull request Jun 13, 2026
## Summary
Reverts the hosted custom inference model-ID changes from #5399 so CI
continues using the model ID actually served by
`https://inference-api.nvidia.com/v1/chat/completions`. Bounds the
ordinary OpenAI-compatible chat-completions onboarding validation probe
with `max_tokens: 8`, and keeps rebuild/upgrade E2E registry metadata
aligned with the hosted-compatible onboarding session.

## Changes
- Restore hosted custom inference defaults and workflow/test
expectations to `nvidia/nvidia/nemotron-3-super-v3`.
- Preserve provider/model in rebuild and upgrade E2E fixture registry
seeding from the onboard session, falling back to hosted-compatible env
values.
- Use the hosted-compatible model env for post-rebuild inference smoke
calls instead of hardcoding the public `nvidia-prod` model ID.
- Add `max_tokens: 8` to the non-strict chat-completions validation
probe payload.
- Add regression coverage for the bounded probe payload and rebuild
fixture provider/model alignment.

## Type of Change
- [x] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [ ] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- [x] Git hooks passed during commit and push, or `npx prek run
--from-ref main --to-ref HEAD` passes
- [x] Targeted tests pass for changed behavior
- [ ] Full `npm test` passes (broad runtime changes only)
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes
- [ ] `npm run docs` builds without warnings (doc changes only)
- [ ] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

Notes:
- `npx prek run --from-ref main --to-ref HEAD` passed before the latest
fixture update; commit and push hooks passed for the latest update.
- `bash -n` passed for the changed rebuild/upgrade shell fixtures.
- `npm test -- src/lib/inference/onboard-probes.test.ts
test/e2e-script-workflow.test.ts src/lib/onboard/providers.test.ts`
passed.
- `npm test -- test/onboard-selection.test.ts
test/stale-dist-check.test.ts src/lib/inference/onboard-probes.test.ts`
passed.
- Isolated rerun of `npm test -- test/onboard-model-router.test.ts -t
"prefers the managed Model Router command over PATH"` passed after one
transient commit-hook failure in that unrelated test.
- `npm run docs` passed with 0 errors; Fern reported 2 hidden warnings,
so the docs-without-warnings checkbox is left unchecked.

---
Signed-off-by: Carlos Villela <cvillela@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated default hosted inference model identifier across CI/workflows,
test fixtures, and configs to a new model.
* Updated scripts to compute provider/model with new default/mapping
logic (including a compatible-endpoint mapping).

* **Bug Fixes**
  * Set a baseline max_tokens cap for certain hosted probe requests.

* **Tests**
* Adjusted E2E/contract tests and fixtures to expect the new model;
improved defensive fallback handling and removed a deprecated alignment
test.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant