Skip to content

feat(rest): add site prerequisite bootstrap workflow#2964

Merged
osu merged 8 commits into
NVIDIA:mainfrom
osu:feat/1889-site-prerequisite-bootstrap
Jun 30, 2026
Merged

feat(rest): add site prerequisite bootstrap workflow#2964
osu merged 8 commits into
NVIDIA:mainfrom
osu:feat/1889-site-prerequisite-bootstrap

Conversation

@osu

@osu osu commented Jun 29, 2026

Copy link
Copy Markdown
Member

Description

Add a declarative nicocli site bootstrap workflow for creating or verifying the REST resources required to use an existing Site.

This change:

  • calls the current Service Account operation first, using its Provider and Tenant IDs in Service Account mode and falling back to the current Provider and Tenant operations otherwise;
  • accepts a strict YAML manifest with ${...} references so later requests can consume IDs and derived resource IDs returned by earlier operations;
  • requires the specified Site to already exist, discovers Site IP Blocks that are auto-created from its fabric-prefix inventory, then creates or verifies Instance Types, Allocations, VPCs, VPC Prefixes, and optional Instances in dependency order;
  • reuses resources by recorded ID or exact name and scope, rejects configuration drift, handles concurrent-create conflicts, and recovers when replayed IDs do not exist in a replacement installation;
  • emits a replayable resolved manifest containing every Provider, Tenant, selector, and resource ID while preserving the original REST request bodies;
  • resolves methods, paths, and path parameters from the same embedded OpenAPI operation descriptors used by regular CLI commands and organizes bootstrap behavior around cohesive receiver-owned types;
  • supports separate Provider and Tenant organizations with the same authenticated client and falls back to the configured CLI organization when provider.org is omitted;
  • adds Service Account, OpenAPI indirection, read-only IP-block discovery, parser, reference-resolution, create/reuse, stale-ID recovery, drift-detection, and command coverage; and
  • documents the workflow with a complete site-prerequisites example covering network and compute allocations plus optional instance creation.

Related issues

Closes #1889

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

No REST endpoint, OpenAPI schema, or existing CLI command changes are included. The new site bootstrap command and manifest format are additive.

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Passed locally:

  • go test -race ./cli/... -count=1
  • go test ./flow/... ./workflow-schema/...
  • go vet ./...
  • CI-pinned Revive 1.3.9 across every REST Go package
  • CI-pinned GolangCI-Lint 2.7.2 with the repository's configured full-module invocation
  • make build for every Linux/amd64 REST binary, including nicocli
  • DOCKER_HOST="unix://$HOME/.colima/default/docker.sock" TESTCONTAINERS_DOCKER_SOCKET_OVERRIDE=/var/run/docker.sock make test across IPAM, DB, API, auth, common, cert-manager, site-workflow, race-enabled site-manager and site-agent, workflow, Flow, power-shelf manager, and NVSwitch manager
  • built-binary site and site bootstrap --help checks
  • end-to-end CLI command testing against an HTTP test server, including Service Account initialization and fallback, dependency-ordered request construction, resolved-manifest output, and a no-write replay
  • required-Site lookup plus auto-created Site IP-block discovery and not-ready checks proving bootstrap never posts a Site or Provider-owned IP Block
  • example-manifest parsing plus embedded OpenAPI operation/path indirection verification for every orchestrated resource
  • CI-pinned Core protobuf generation (buf 1.70.0, protoc-gen-go 1.36.11, and protoc-gen-go-grpc 1.6.1) followed by the repository cleanliness gate

Additional Notes

The workflow intentionally composes existing REST operations instead of adding a second server-side resource implementation. Methods and paths come from the embedded OpenAPI model. The Site request is used only for lookup and drift validation; managed-resource requests are passed to their corresponding operations so endpoint validation and authorization remain the source of truth.

Site IP Blocks are auto-created by the Site fabric-prefix inventory workflow added in #2589. Bootstrap only discovers them through siteIpBlocks selectors and asks the operator to rerun when that asynchronous inventory is not ready.

Site registration, fabric-prefix inventory, and machine readiness remain asynchronous external steps. If a dependent operation is not ready, operators can complete that step and rerun the same manifest; already-created resources are discovered and reused.

The REST copies and Go bindings for the current Core protobuf definitions are refreshed in this PR because the protobuf-generation CI gate exposed pre-existing generated-file drift. No Core protobuf source schema was authored as part of the bootstrap workflow.

Signed-off-by: Hasan Khan <hasank@nvidia.com>
@osu osu requested a review from a team as a code owner June 29, 2026 07:27
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a site bootstrap CLI command that reads a SitePrerequisites YAML manifest and reconciles provider/tenant organization identity, site, IP blocks, instance types, allocations, VPCs, VPC prefixes, and optional instances in dependency order. Includes ${...} reference interpolation, drift detection, replay output, and an in-process test suite with a mock REST server.

Changes

Site Bootstrap Command

Layer / File(s) Summary
Manifest types, endpoint definitions, and OpenAPI command helpers
rest-api/cli/pkg/site_bootstrap.go, rest-api/cli/pkg/commands.go
Defines bootstrap constants, sentinel errors, regex patterns, manifest data structures, endpoint descriptors, and an operationIndex abstraction with parameters() and execute() helpers used by both the generic command builder and the bootstrap workflow.
CLI wiring, manifest I/O, and command registration
rest-api/cli/pkg/site_bootstrap.go, rest-api/cli/pkg/app.go
Wires the site bootstrap subcommand with --file/--output flags, reads and writes strict single-document YAML manifests, validates manifest shape and required fields, and registers the command via addSiteBootstrapCommand in NewApp.
Bootstrap construction, organization initialization, and apply flow
rest-api/cli/pkg/site_bootstrap.go
Builds typed REST operation bindings from the OpenAPI spec, resolves provider/tenant identity via "current" endpoints or service-account mode, and drives the top-level apply sequence across all prerequisite resource groups in dependency order.
Resource reuse, selector lookup, and drift verification
rest-api/cli/pkg/site_bootstrap.go
Implements managed resource reuse by ID or name with paginated lookup, selector-based site IP block discovery, HTTP 409 conflict recovery, identity matching, and subset-based drift verification with big.Rat scalar normalization.
Reference interpolation and response decoding
rest-api/cli/pkg/site_bootstrap.go
Resolves ${a.b} references across nested manifest maps/slices with stray-brace detection, decodes JSON responses with UseNumber precision, extracts resource IDs, and detects HTTP 404 via APIError.
Example manifest and README documentation
rest-api/cli/examples/site-prerequisites.yaml, rest-api/cli/README.md
Adds a full SitePrerequisites YAML example wired with ${...} references, and documents workflow order, placeholder resolution, drift semantics, replay behavior, and provider/tenant initialization.
Bootstrap test suite and mock REST server
rest-api/cli/pkg/site_bootstrap_test.go
Covers manifest validation, reference resolution, reuse/drift/conflict recovery, scalar equality, service-account mode, IPBlock inventory wait, stale-ID fallback, CLI replay output, and OpenAPI binding via a fully in-process mock REST server.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the new site prerequisite bootstrap workflow.
Linked Issues check ✅ Passed The PR satisfies #1889 with provider/tenant init, site and prerequisite resource orchestration, idempotent reuse, and replayable manifests.
Out of Scope Changes check ✅ Passed The changes stay focused on the bootstrap workflow, its docs, command wiring, and supporting tests without unrelated additions.
Description check ✅ Passed The description accurately matches the new site bootstrap workflow, manifest example, and related CLI/OpenAPI changes.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
nico-flow 43 3 21 9 2 8
nico-nsm 49 0 10 31 8 0
nico-psm 43 3 21 9 2 8
nico-rest-api 43 3 21 9 2 8
nico-rest-cert-manager 42 3 21 9 1 8
nico-rest-db 43 3 21 9 2 8
nico-rest-site-agent 42 3 21 9 1 8
nico-rest-site-manager 42 3 21 9 1 8
nico-rest-workflow 43 3 21 9 2 8
TOTAL 390 24 178 103 21 64

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@github-actions

Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-06-29 07:31:00 UTC | Commit: 2a2c79a

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rest-api/cli/pkg/site_bootstrap.go`:
- Around line 609-614: The drift comparison in bootstrapScalarEqual is too
permissive because fmt.Sprint collapses different scalar types into the same
string, so type mismatches can be hidden. Update bootstrapScalarEqual to
preserve scalar types and only normalize compatible numeric representations
before comparing; for non-numeric values, compare both type and value
explicitly. Keep the nil handling as-is, and use the bootstrapScalarEqual helper
as the single place to enforce the stricter comparison behavior.
- Around line 414-424: The resolved request in ensureBootstrapResource is only
type-checked as a map, but request["name"] can become non-string after
resolveBootstrapValue and is then silently coerced to an empty name. Re-validate
the resolved request fields before lookup/create, especially the name extracted
in ensureBootstrapResource, and return an error if name is missing or not a
string so the CLI never searches or posts an invalid request.
- Around line 470-485: The 409 recovery path in create/bootstrap handling is
swallowing drift verification failures by only returning the original APIError
from client.Do. Update the conflict branch in site_bootstrap.go (the logic
around findBootstrapResource, verifyBootstrapResource, and bootstrapResponseID)
so that if verifyBootstrapResource reports a mismatch, that verifyErr is
propagated instead of discarded, while still keeping the successful reuse path
unchanged. Keep the existing conflict lookup flow, but make the returned error
reflect the actionable configuration drift rather than the generic conflict.
- Around line 630-677: The resolveBootstrapValue function currently passes
through malformed template strings when bootstrapRefPattern finds no matches,
allowing unresolved ${...} text to reach the REST API. Update the
string-handling branch in resolveBootstrapValue to detect leftover "${" or
unmatched "}" syntax and return errBootstrapReference instead of returning the
original value; keep the existing lookupBootstrapReference flow for valid full
and embedded references.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 699cac4b-943c-4b26-ae4a-64938f8d50c1

📥 Commits

Reviewing files that changed from the base of the PR and between 0082abd and 2a2c79a.

📒 Files selected for processing (5)
  • rest-api/cli/README.md
  • rest-api/cli/examples/site-prerequisites.yaml
  • rest-api/cli/pkg/app.go
  • rest-api/cli/pkg/site_bootstrap.go
  • rest-api/cli/pkg/site_bootstrap_test.go

Comment thread rest-api/cli/pkg/site_bootstrap.go Outdated
Comment thread rest-api/cli/pkg/site_bootstrap.go Outdated
Comment thread rest-api/cli/pkg/site_bootstrap.go
Comment thread rest-api/cli/pkg/site_bootstrap.go Outdated
osu added 2 commits June 29, 2026 00:37
Signed-off-by: Hasan Khan <hasank@nvidia.com>
Signed-off-by: Hasan Khan <hasank@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
rest-api/cli/pkg/site_bootstrap_test.go (2)

91-106: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert the resolver error class, not just its text.

These cases only match substrings today, so a future regression could return the wrong wrapped error type and still pass as long as the wording stays similar. Add require.ErrorIs(t, err, errBootstrapReference) for the missing/malformed-reference rows to pin the resolver contract. As per path instructions, review Go code for correctness, clean control flow, error handling, context propagation, test coverage, performance, and cohesive organization around well-defined, well-named structs with receiver functions when behavior belongs to a domain type.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rest-api/cli/pkg/site_bootstrap_test.go` around lines 91 - 106, The resolver
tests in resolveBootstrapValue currently only assert on error text, which can
miss regressions in the actual error type. Update the table-driven cases in
site_bootstrap_test.go for the missing reference and malformed reference
scenarios to also assert the returned error matches errBootstrapReference using
require.ErrorIs, while keeping the existing substring checks for message detail.
Use the resolveBootstrapValue test block and errBootstrapReference as the key
symbols to locate the assertions.

Source: Path instructions


123-154: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Prove the 409 recovery path actually executes.

The mock switches to the drifting resource on the second GET, so this test can pass even if the implementation never reaches the POST-conflict branch it is meant to cover. Record POST count as well, then assert the expected GET -> POST(409) -> GET sequence so the test really locks in conflict recovery behavior. As per path instructions, review Go code for correctness, clean control flow, error handling, context propagation, test coverage, performance, and cohesive organization around well-defined, well-named structs with receiver functions when behavior belongs to a domain type.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rest-api/cli/pkg/site_bootstrap_test.go` around lines 123 - 154, The test for
ensureBootstrapResource is not proving the 409 recovery path because the second
GET can satisfy the assertion without ever hitting the POST conflict branch.
Update the test around ensureBootstrapResource, lookupCount, and the
httptest.NewServer handler to track POST attempts too, then assert the exact GET
-> POST(409) -> GET flow so the conflict-retry behavior is actually exercised
and locked in.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@rest-api/cli/pkg/site_bootstrap_test.go`:
- Around line 91-106: The resolver tests in resolveBootstrapValue currently only
assert on error text, which can miss regressions in the actual error type.
Update the table-driven cases in site_bootstrap_test.go for the missing
reference and malformed reference scenarios to also assert the returned error
matches errBootstrapReference using require.ErrorIs, while keeping the existing
substring checks for message detail. Use the resolveBootstrapValue test block
and errBootstrapReference as the key symbols to locate the assertions.
- Around line 123-154: The test for ensureBootstrapResource is not proving the
409 recovery path because the second GET can satisfy the assertion without ever
hitting the POST conflict branch. Update the test around
ensureBootstrapResource, lookupCount, and the httptest.NewServer handler to
track POST attempts too, then assert the exact GET -> POST(409) -> GET flow so
the conflict-retry behavior is actually exercised and locked in.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e9fd8f1d-c546-45a3-b7ba-7b622f7903ce

📥 Commits

Reviewing files that changed from the base of the PR and between 362278d and ad3c90d.

📒 Files selected for processing (2)
  • rest-api/cli/pkg/site_bootstrap.go
  • rest-api/cli/pkg/site_bootstrap_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • rest-api/cli/pkg/site_bootstrap.go

@osu osu requested a review from mmou-nv June 29, 2026 08:06
@osu osu self-assigned this Jun 29, 2026
@thossain-nv

Copy link
Copy Markdown
Contributor

Implementation second pass notes:

  • Command needs to call the GET Service Account endpoint when applicable https://docs.nvidia.com/infra-controller/rest-api-reference/getting-started
  • IP Block creation by Provider should probably not be part of this flow since they should be auto-created based on recent work
  • We are hardcoding the API endpoint paths again instead of using existing CLI command/structs
  • Lots of arbitrarily named functions are being introduced without properly anchoring them as receiver functions of structs that should own them

Signed-off-by: Hasan Khan <hasank@nvidia.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rest-api/cli/pkg/site_bootstrap.go`:
- Around line 677-713: The logic in discoverExistingResource currently rejects
ID-only selectors because it resolves and type-checks resource.Match before
honoring resource.ID, and it also stops at a stale ID instead of retrying via
the selector. Update discoverExistingResource so bootstrapExistingResource can
be resolved from ID alone when present, and when api.getResource in the
resource.ID path fails due to a missing/stale object, fall back to
api.findMatching using resource.Match before returning an error. Keep the
existing error wrapping and update resource.ID from bootstrapResponseID after a
successful fallback.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d4559e46-ec54-469c-963a-4484e6cf4733

📥 Commits

Reviewing files that changed from the base of the PR and between ad3c90d and bce84ac.

📒 Files selected for processing (6)
  • rest-api/cli/README.md
  • rest-api/cli/examples/site-prerequisites.yaml
  • rest-api/cli/pkg/app.go
  • rest-api/cli/pkg/commands.go
  • rest-api/cli/pkg/site_bootstrap.go
  • rest-api/cli/pkg/site_bootstrap_test.go
✅ Files skipped from review due to trivial changes (2)
  • rest-api/cli/examples/site-prerequisites.yaml
  • rest-api/cli/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • rest-api/cli/pkg/app.go

Comment thread rest-api/cli/pkg/site_bootstrap.go
osu added 2 commits June 29, 2026 11:53
Signed-off-by: Hasan Khan <hasank@nvidia.com>
Signed-off-by: Hasan Khan <hasank@nvidia.com>
Comment thread rest-api/cli/examples/site-prerequisites.yaml Outdated
Comment thread rest-api/cli/examples/site-prerequisites.yaml Outdated
Signed-off-by: Hasan Khan <hasank@nvidia.com>
Comment thread rest-api/cli/examples/site-prerequisites.yaml

@thossain-nv thossain-nv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @osu, but before merging please make sure that operator is not creating Sites on the fly and then expecting resources to be created on Site.

Signed-off-by: Hasan Khan <hasank@nvidia.com>
@osu osu requested review from mmou-nv and thossain-nv June 30, 2026 01:37
@osu osu enabled auto-merge (squash) June 30, 2026 01:40

@thossain-nv thossain-nv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @osu

@osu osu merged commit c89a2f3 into NVIDIA:main Jun 30, 2026
117 checks passed
@osu osu deleted the feat/1889-site-prerequisite-bootstrap branch June 30, 2026 03:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide an orchestrated workflow for site prerequisite REST resources

3 participants