fix(ci): align hosted nightly inference defaults#5399
Conversation
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
|
Caution Review failedPull request was closed or merged during review 📝 WalkthroughWalkthroughThis PR updates the hosted inference model identifier to ChangesNemotron Model Update and E2E Script Refactoring
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
🌿 Preview your docs: https://nvidia-preview-pr-5399.docs.buildwithfern.com/nemoclaw |
Code Coverage OverviewLanguages: TypeScript TypeScript / code-coverage/pluginThe overall coverage in the branch is 96%. Coverage data for the branch is not yet available. Show a code coverage summary of the most covered files.
TypeScript / code-coverage/cliThe overall coverage in the branch is 44%. Coverage data for the branch is not yet available. Show a code coverage summary of the most covered files.
Updated |
E2E Advisor RecommendationRequired E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
|
Vitest E2E Scenario RecommendationRequired Vitest E2E scenarios: Dispatch required Vitest E2E scenarios:
Full Vitest E2E advisor summaryVitest E2E Scenario AdvisorBase: Required Vitest E2E scenarios
Optional Vitest E2E scenarios
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 0 worth checking, 0 nice ideas Consider writing more tests for
This is an automated advisory review. A human maintainer must make the final merge decision. |
## Summary Reverts the hosted custom inference model-ID changes from #5399 so CI continues using the model ID actually served by `https://inference-api.nvidia.com/v1/chat/completions`. Bounds the ordinary OpenAI-compatible chat-completions onboarding validation probe with `max_tokens: 8`, and keeps rebuild/upgrade E2E registry metadata aligned with the hosted-compatible onboarding session. ## Changes - Restore hosted custom inference defaults and workflow/test expectations to `nvidia/nvidia/nemotron-3-super-v3`. - Preserve provider/model in rebuild and upgrade E2E fixture registry seeding from the onboard session, falling back to hosted-compatible env values. - Use the hosted-compatible model env for post-rebuild inference smoke calls instead of hardcoding the public `nvidia-prod` model ID. - Add `max_tokens: 8` to the non-strict chat-completions validation probe payload. - Add regression coverage for the bounded probe payload and rebuild fixture provider/model alignment. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] Git hooks passed during commit and push, or `npx prek run --from-ref main --to-ref HEAD` passes - [x] Targeted tests pass for changed behavior - [ ] Full `npm test` passes (broad runtime changes only) - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) Notes: - `npx prek run --from-ref main --to-ref HEAD` passed before the latest fixture update; commit and push hooks passed for the latest update. - `bash -n` passed for the changed rebuild/upgrade shell fixtures. - `npm test -- src/lib/inference/onboard-probes.test.ts test/e2e-script-workflow.test.ts src/lib/onboard/providers.test.ts` passed. - `npm test -- test/onboard-selection.test.ts test/stale-dist-check.test.ts src/lib/inference/onboard-probes.test.ts` passed. - Isolated rerun of `npm test -- test/onboard-model-router.test.ts -t "prefers the managed Model Router command over PATH"` passed after one transient commit-hook failure in that unrelated test. - `npm run docs` passed with 0 errors; Fern reported 2 hidden warnings, so the docs-without-warnings checkbox is left unchecked. --- Signed-off-by: Carlos Villela <cvillela@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Updated default hosted inference model identifier across CI/workflows, test fixtures, and configs to a new model. * Updated scripts to compute provider/model with new default/mapping logic (including a compatible-endpoint mapping). * **Bug Fixes** * Set a baseline max_tokens cap for certain hosted probe requests. * **Tests** * Adjusted E2E/contract tests and fixtures to expect the new model; improved defensive fallback handling and removed a deprecated alignment test. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Summary
Fixes the hosted nightly E2E inference defaults that were still pointing at a stale double-prefixed Nemotron model ID, and keeps rebuild fixture registry metadata aligned with the provider/model selected during onboarding. This should unblock the hosted validation failures plus the rebuild/upgrade resume failures where CI onboarded as
compatible-endpointbut the fixture registry forcednvidia-prod.Changes
nvidia/nemotron-3-super-120b-a12bacross workflows, onboarding defaults, E2E fixtures, and docs.Type of Change
Verification
npx prek run --from-ref main --to-ref HEADpassesnpm testpasses (broad runtime changes only)npm run docsbuilds without warnings (doc changes only)Notes:
npm run docspassed with 0 errors; Fern reported 2 hidden warnings.npx prek run --all-filespassed.npm test -- test/e2e-script-workflow.test.ts test/e2e-scenario/support-tests/hosted-inference.test.ts src/lib/onboard/providers.test.tspassed.bash -npassed for the changed rebuild/upgrade shell fixtures.rg -n "nvidia/nvidia/nemotron-3-super-v3" .github src test tools docs || truehas no product/workflow/doc hits; the only remainingnemotron-3-super-v3reference is a regression assertion that rejects the old ID.Signed-off-by: Carlos Villela cvillela@nvidia.com
Summary by CodeRabbit
Release Notes
Chores
Documentation
Refactor
Tests