Skip to content

fix(setup): increase nico-core helm install timeout to 600s#2995

Merged
shayan1995 merged 1 commit into
NVIDIA:mainfrom
shayan1995:fix/nico-core-install-timeout
Jun 29, 2026
Merged

fix(setup): increase nico-core helm install timeout to 600s#2995
shayan1995 merged 1 commit into
NVIDIA:mainfrom
shayan1995:fix/nico-core-install-timeout

Conversation

@shayan1995

Copy link
Copy Markdown
Contributor

nico-pxe boot-artifact init containers can take ~6 minutes to pull, exceeding the previous 300s timeout and causing the install to fail. Increased timeout to 600s.

Fixes: nvbug 6391901

Related issues

  • nvbug 6391901

Type of Change

  • Fix - Bug fixes

Breaking Changes

  • This PR contains breaking changes

Testing

  • Manual testing performed

Additional Notes

nico-pxe observed taking ~5m50s to complete init containers in production. The previous 300s (5 min) hard timeout caused helm upgrade --install nico to fail before nico-pxe finished pulling boot-artifact images. 600s gives ~4 minutes of headroom.

@shayan1995 shayan1995 requested a review from a team as a code owner June 29, 2026 22:18
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 01c2535c-e62e-4022-b8f0-cb3a45fe6533

📥 Commits

Reviewing files that changed from the base of the PR and between 7b3a00f and 2e5d458.

📒 Files selected for processing (1)
  • helm-prereqs/setup.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • helm-prereqs/setup.sh

Summary by CodeRabbit

  • Bug Fixes
    • Increased the Helm install/upgrade wait timeout from 300s to 600s, giving deployments more time to complete successfully before timing out.

Walkthrough

The helm upgrade --install command for NICo Core in helm-prereqs/setup.sh has its --timeout parameter increased from 300s to 600s. No other parameters, control flow, or logic in the script are modified.

Changes

NICo Core Helm Timeout Adjustment

Layer / File(s) Summary
Increase Helm install timeout
helm-prereqs/setup.sh
The --timeout flag on the NICo Core helm upgrade --install command is changed from 300s to 600s; --wait remains in place.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: increasing the Helm install timeout to 600s.
Description check ✅ Passed The description is directly related to the timeout increase and the Helm install failure it addresses.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@helm-prereqs/setup.sh`:
- Line 438: The Helm install note is stale: setup.sh now uses a 600s timeout,
but NOTES.txt still tells operators to run the same helm upgrade --install nico
./helm command with --timeout 300s. Update the referenced note in
helm-prereqs/templates/NOTES.txt so it matches the new timeout used by setup.sh,
keeping the manual install instructions in sync with the behavior described by
helm install/upgrade.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e2aae19f-1965-4615-9db1-094112862333

📥 Commits

Reviewing files that changed from the base of the PR and between f097c36 and 7b3a00f.

📒 Files selected for processing (1)
  • helm-prereqs/setup.sh

Comment thread helm-prereqs/setup.sh
--set-string "global.image.repository=${NICO_IMAGE_REGISTRY}/nvmetal-carbide"
--set-string "global.image.tag=${NICO_CORE_IMAGE_TAG}"
--timeout 300s --wait
--timeout 600s --wait

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Update the matching Helm install note as well.

Line 438 now waits 600s, but helm-prereqs/templates/NOTES.txt:58-67 still tells operators to run the same helm upgrade --install nico ./helm command with --timeout 300s. That leaves the manual follow-up path stale and can reintroduce the original timeout failure outside setup.sh.

Suggested follow-up
-      --timeout 300s --wait
+      --timeout 600s --wait

As per coding guidelines, "Keep OpenAPI specs, protobufs, database migrations, Helm manifests, generated code, and documentation in sync with the behavior they describe."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@helm-prereqs/setup.sh` at line 438, The Helm install note is stale: setup.sh
now uses a 600s timeout, but NOTES.txt still tells operators to run the same
helm upgrade --install nico ./helm command with --timeout 300s. Update the
referenced note in helm-prereqs/templates/NOTES.txt so it matches the new
timeout used by setup.sh, keeping the manual install instructions in sync with
the behavior described by helm install/upgrade.

Source: Coding guidelines

nico-pxe boot-artifact init containers can take ~6 minutes to pull,
exceeding the previous 300s timeout and causing the install to fail.

Fixes: nvbug 6391901
@shayan1995 shayan1995 force-pushed the fix/nico-core-install-timeout branch from 7b3a00f to 2e5d458 Compare June 29, 2026 22:27
@shayan1995 shayan1995 enabled auto-merge (squash) June 29, 2026 22:29
@shayan1995 shayan1995 merged commit 278b144 into NVIDIA:main Jun 29, 2026
56 checks passed
@github-actions

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 288 6 26 105 7 144
machine-validation-runner 751 30 190 274 36 221
machine_validation 751 30 190 274 36 221
machine_validation-aarch64 751 30 190 274 36 221
nvmetal-carbide 751 30 190 274 36 221
TOTAL 3298 126 786 1207 151 1028

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants