feat(rest): add site prerequisite bootstrap workflow#2964
Conversation
Signed-off-by: Hasan Khan <hasank@nvidia.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a ChangesSite Bootstrap Command
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
🔍 Container Scan Summary
Per-CVE detail lives in the per-service |
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-06-29 07:31:00 UTC | Commit: 2a2c79a |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rest-api/cli/pkg/site_bootstrap.go`:
- Around line 609-614: The drift comparison in bootstrapScalarEqual is too
permissive because fmt.Sprint collapses different scalar types into the same
string, so type mismatches can be hidden. Update bootstrapScalarEqual to
preserve scalar types and only normalize compatible numeric representations
before comparing; for non-numeric values, compare both type and value
explicitly. Keep the nil handling as-is, and use the bootstrapScalarEqual helper
as the single place to enforce the stricter comparison behavior.
- Around line 414-424: The resolved request in ensureBootstrapResource is only
type-checked as a map, but request["name"] can become non-string after
resolveBootstrapValue and is then silently coerced to an empty name. Re-validate
the resolved request fields before lookup/create, especially the name extracted
in ensureBootstrapResource, and return an error if name is missing or not a
string so the CLI never searches or posts an invalid request.
- Around line 470-485: The 409 recovery path in create/bootstrap handling is
swallowing drift verification failures by only returning the original APIError
from client.Do. Update the conflict branch in site_bootstrap.go (the logic
around findBootstrapResource, verifyBootstrapResource, and bootstrapResponseID)
so that if verifyBootstrapResource reports a mismatch, that verifyErr is
propagated instead of discarded, while still keeping the successful reuse path
unchanged. Keep the existing conflict lookup flow, but make the returned error
reflect the actionable configuration drift rather than the generic conflict.
- Around line 630-677: The resolveBootstrapValue function currently passes
through malformed template strings when bootstrapRefPattern finds no matches,
allowing unresolved ${...} text to reach the REST API. Update the
string-handling branch in resolveBootstrapValue to detect leftover "${" or
unmatched "}" syntax and return errBootstrapReference instead of returning the
original value; keep the existing lookupBootstrapReference flow for valid full
and embedded references.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 699cac4b-943c-4b26-ae4a-64938f8d50c1
📒 Files selected for processing (5)
rest-api/cli/README.mdrest-api/cli/examples/site-prerequisites.yamlrest-api/cli/pkg/app.gorest-api/cli/pkg/site_bootstrap.gorest-api/cli/pkg/site_bootstrap_test.go
Signed-off-by: Hasan Khan <hasank@nvidia.com>
Signed-off-by: Hasan Khan <hasank@nvidia.com>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
rest-api/cli/pkg/site_bootstrap_test.go (2)
91-106: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert the resolver error class, not just its text.
These cases only match substrings today, so a future regression could return the wrong wrapped error type and still pass as long as the wording stays similar. Add
require.ErrorIs(t, err, errBootstrapReference)for the missing/malformed-reference rows to pin the resolver contract. As per path instructions, review Go code for correctness, clean control flow, error handling, context propagation, test coverage, performance, and cohesive organization around well-defined, well-named structs with receiver functions when behavior belongs to a domain type.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rest-api/cli/pkg/site_bootstrap_test.go` around lines 91 - 106, The resolver tests in resolveBootstrapValue currently only assert on error text, which can miss regressions in the actual error type. Update the table-driven cases in site_bootstrap_test.go for the missing reference and malformed reference scenarios to also assert the returned error matches errBootstrapReference using require.ErrorIs, while keeping the existing substring checks for message detail. Use the resolveBootstrapValue test block and errBootstrapReference as the key symbols to locate the assertions.Source: Path instructions
123-154: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winProve the 409 recovery path actually executes.
The mock switches to the drifting resource on the second GET, so this test can pass even if the implementation never reaches the POST-conflict branch it is meant to cover. Record POST count as well, then assert the expected
GET -> POST(409) -> GETsequence so the test really locks in conflict recovery behavior. As per path instructions, review Go code for correctness, clean control flow, error handling, context propagation, test coverage, performance, and cohesive organization around well-defined, well-named structs with receiver functions when behavior belongs to a domain type.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rest-api/cli/pkg/site_bootstrap_test.go` around lines 123 - 154, The test for ensureBootstrapResource is not proving the 409 recovery path because the second GET can satisfy the assertion without ever hitting the POST conflict branch. Update the test around ensureBootstrapResource, lookupCount, and the httptest.NewServer handler to track POST attempts too, then assert the exact GET -> POST(409) -> GET flow so the conflict-retry behavior is actually exercised and locked in.Source: Path instructions
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@rest-api/cli/pkg/site_bootstrap_test.go`:
- Around line 91-106: The resolver tests in resolveBootstrapValue currently only
assert on error text, which can miss regressions in the actual error type.
Update the table-driven cases in site_bootstrap_test.go for the missing
reference and malformed reference scenarios to also assert the returned error
matches errBootstrapReference using require.ErrorIs, while keeping the existing
substring checks for message detail. Use the resolveBootstrapValue test block
and errBootstrapReference as the key symbols to locate the assertions.
- Around line 123-154: The test for ensureBootstrapResource is not proving the
409 recovery path because the second GET can satisfy the assertion without ever
hitting the POST conflict branch. Update the test around
ensureBootstrapResource, lookupCount, and the httptest.NewServer handler to
track POST attempts too, then assert the exact GET -> POST(409) -> GET flow so
the conflict-retry behavior is actually exercised and locked in.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: e9fd8f1d-c546-45a3-b7ba-7b622f7903ce
📒 Files selected for processing (2)
rest-api/cli/pkg/site_bootstrap.gorest-api/cli/pkg/site_bootstrap_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- rest-api/cli/pkg/site_bootstrap.go
|
Implementation second pass notes:
|
Signed-off-by: Hasan Khan <hasank@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rest-api/cli/pkg/site_bootstrap.go`:
- Around line 677-713: The logic in discoverExistingResource currently rejects
ID-only selectors because it resolves and type-checks resource.Match before
honoring resource.ID, and it also stops at a stale ID instead of retrying via
the selector. Update discoverExistingResource so bootstrapExistingResource can
be resolved from ID alone when present, and when api.getResource in the
resource.ID path fails due to a missing/stale object, fall back to
api.findMatching using resource.Match before returning an error. Keep the
existing error wrapping and update resource.ID from bootstrapResponseID after a
successful fallback.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d4559e46-ec54-469c-963a-4484e6cf4733
📒 Files selected for processing (6)
rest-api/cli/README.mdrest-api/cli/examples/site-prerequisites.yamlrest-api/cli/pkg/app.gorest-api/cli/pkg/commands.gorest-api/cli/pkg/site_bootstrap.gorest-api/cli/pkg/site_bootstrap_test.go
✅ Files skipped from review due to trivial changes (2)
- rest-api/cli/examples/site-prerequisites.yaml
- rest-api/cli/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- rest-api/cli/pkg/app.go
Signed-off-by: Hasan Khan <hasank@nvidia.com>
Signed-off-by: Hasan Khan <hasank@nvidia.com>
Signed-off-by: Hasan Khan <hasank@nvidia.com>
thossain-nv
left a comment
There was a problem hiding this comment.
Looks good @osu, but before merging please make sure that operator is not creating Sites on the fly and then expecting resources to be created on Site.
Signed-off-by: Hasan Khan <hasank@nvidia.com>
thossain-nv
left a comment
There was a problem hiding this comment.
Thanks for the changes @osu
Description
Add a declarative
nicocli site bootstrapworkflow for creating or verifying the REST resources required to use an existing Site.This change:
${...}references so later requests can consume IDs and derived resource IDs returned by earlier operations;provider.orgis omitted;Related issues
Closes #1889
Type of Change
Breaking Changes
No REST endpoint, OpenAPI schema, or existing CLI command changes are included. The new
site bootstrapcommand and manifest format are additive.Testing
Passed locally:
go test -race ./cli/... -count=1go test ./flow/... ./workflow-schema/...go vet ./...make buildfor every Linux/amd64 REST binary, includingnicocliDOCKER_HOST="unix://$HOME/.colima/default/docker.sock" TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE=/var/run/docker.sock make testacross IPAM, DB, API, auth, common, cert-manager, site-workflow, race-enabled site-manager and site-agent, workflow, Flow, power-shelf manager, and NVSwitch managersiteandsite bootstrap --helpchecksbuf1.70.0,protoc-gen-go1.36.11, andprotoc-gen-go-grpc1.6.1) followed by the repository cleanliness gateAdditional Notes
The workflow intentionally composes existing REST operations instead of adding a second server-side resource implementation. Methods and paths come from the embedded OpenAPI model. The Site request is used only for lookup and drift validation; managed-resource requests are passed to their corresponding operations so endpoint validation and authorization remain the source of truth.
Site IP Blocks are auto-created by the Site fabric-prefix inventory workflow added in #2589. Bootstrap only discovers them through
siteIpBlocksselectors and asks the operator to rerun when that asynchronous inventory is not ready.Site registration, fabric-prefix inventory, and machine readiness remain asynchronous external steps. If a dependent operation is not ready, operators can complete that step and rerun the same manifest; already-created resources are discovered and reused.
The REST copies and Go bindings for the current Core protobuf definitions are refreshed in this PR because the protobuf-generation CI gate exposed pre-existing generated-file drift. No Core protobuf source schema was authored as part of the bootstrap workflow.