Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughThis PR adds comprehensive Azure cloud provider support to the Defang CLI. It introduces Azure BYOC (Bring Your Own Cloud) implementation with container orchestration via Azure Container Instances, configuration management via App Configuration, container registry integration via Azure Container Registry, and log streaming. Supporting changes include Azure SDK dependencies, provider infrastructure updates, HTTP utilities for blob uploads, and nix flake tooling additions. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as Defang CLI
participant ACI as Azure Container<br/>Instances
participant ACR as Azure Container<br/>Registry
participant AppCfg as Azure App<br/>Configuration
participant Blob as Azure Blob<br/>Storage
participant Pulumi as Pulumi Backend<br/>(S3/Blob)
User->>CLI: deploy --provider=azure
CLI->>CLI: NewByocProvider(Azure)
CLI->>AppCfg: SetUp() - discover/create store
CLI->>ACI: SetUpResourceGroup()
CLI->>ACI: SetUpStorageAccount()
CLI->>ACI: SetUpManagedIdentity()
CLI->>ACR: SetUpRegistry()
User->>CLI: build/push container
CLI->>ACR: RunTask(image, command)
ACR->>ACR: schedule task run
CLI->>ACR: TailRunLogs() - poll completion
CLI->>Blob: CreateUploadURL() - SAS URL
CLI->>Blob: PUT ProjectUpdate protobuf
CLI->>ACI: CdCommand(env, args)
ACI->>ACI: Run container group with CD
ACI->>Pulumi: store state via Pulumi backend
CLI->>ACI: QueryLogs() - poll for output
ACI->>CLI: return log entries
CLI->>User: display deployment logs
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.11.4)level=warning msg="[linters_context] running gomodguard failed: unable to read module file go.mod: current working directory must have a go.mod file: if you are not using go modules it is suggested to disable this linter" Comment |
There was a problem hiding this comment.
Actionable comments posted: 17
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/cmd/cli/command/estimate.go (1)
70-75:⚠️ Potential issue | 🟡 MinorAdd Azure and DigitalOcean to
providerDescriptionmap.The auto-selection logic now supports Azure and DigitalOcean (lines 91-94), but the
providerDescriptionmap used by the survey prompt (line 106) is missing entries for these providers. This results in empty descriptions when users select them.🔧 Proposed fix
var providerDescription = map[client.ProviderID]string{ client.ProviderDefang: "The Defang Playground is a free platform intended for testing purposes only.", client.ProviderAWS: "Deploy to AWS using the AWS_* environment variables or the AWS CLI configuration.", client.ProviderDO: "Deploy to DigitalOcean using the DIGITALOCEAN_TOKEN, SPACES_ACCESS_KEY_ID, and SPACES_SECRET_ACCESS_KEY environment variables.", client.ProviderGCP: "Deploy to Google Cloud Platform using gcloud Application Default Credentials.", + client.ProviderAzure: "Deploy to Azure using the AZURE_* environment variables or Azure CLI authentication.", }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/cmd/cli/command/estimate.go` around lines 70 - 75, The providerDescription map is missing entries for Azure (and ensure DigitalOcean is present) so the survey shows empty descriptions; update the providerDescription variable to include a descriptive string for client.ProviderAzure (e.g., mention Azure CLI/ARM/ADC or environment variables) and verify/adjust the client.ProviderDO entry text for DigitalOcean so both client.ProviderAzure and client.ProviderDO have clear descriptions used by the survey prompt.
🟡 Minor comments (7)
src/pkg/http/post.go-48-50 (1)
48-50:⚠️ Potential issue | 🟡 MinorAvoid forcing an empty
Content-Typeheader.At Line 49,
Content-Typeis always sent, even whencontentType == "". That changes previous behavior (header omitted) and can break strict endpoints.Proposed fix
func PostWithContext(ctx context.Context, url, contentType string, body io.Reader) (*http.Response, error) { - return PostWithHeader(ctx, url, http.Header{"Content-Type": []string{contentType}}, body) + if contentType == "" { + return PostWithHeader(ctx, url, nil, body) + } + return PostWithHeader(ctx, url, http.Header{"Content-Type": []string{contentType}}, body) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/http/post.go` around lines 48 - 50, PostWithContext currently always adds a "Content-Type" header (even when contentType == ""), which forces an empty header value and changes previous behavior; update PostWithContext so it only passes a header to PostWithHeader when contentType is non-empty—i.e., if contentType == "" call PostWithHeader with a nil/empty header (or omit the header argument) and otherwise call PostWithHeader with http.Header{"Content-Type": []string{contentType}} to preserve prior behavior; refer to PostWithContext and PostWithHeader to locate and implement this conditional header construction.src/pkg/clouds/azure/aci/setup_test.go-17-22 (1)
17-22:⚠️ Potential issue | 🟡 MinorUse
t.Fatalfto fail fast if SetUpResourceGroup fails.Using
t.Errorfallows the test to continue to TearDown even if resource group creation failed, which will likely cause a confusing cascading failure. Since TearDown depends on a successfully created resource group, uset.Fatalfor check for prior failures.💚 Proposed fix
t.Run("SetUpResourceGroup", func(t *testing.T) { err := c.SetUpResourceGroup(context.Background()) if err != nil { - t.Errorf("Failed to set up resource group: %v", err) + t.Fatalf("Failed to set up resource group: %v", err) } })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/setup_test.go` around lines 17 - 22, Replace the non-fatal assertion in the test's "SetUpResourceGroup" subtest so the test fails immediately if resource group creation fails: change the call that currently uses t.Errorf when c.SetUpResourceGroup(ctx) returns an error to t.Fatalf (or otherwise abort the test) to prevent subsequent TearDown from running against an uncreated resource; update the subtest around SetUpResourceGroup and references to c.SetUpResourceGroup/context.Background() accordingly.src/pkg/clouds/azure/aca/common.go-152-175 (1)
152-175:⚠️ Potential issue | 🟡 MinorPotential nil pointer dereferences when accessing Azure SDK response fields.
The code assumes that pointer fields like
Containers[0].Name,Value[0].Name, andValue[0].Properties.Containers[0].Nameare non-nil after checking slice length. Azure SDK types often have nil pointer fields even when elements exist.🛡️ Suggested nil checks
// Opportunistically pick the container name from the app template. if container == "" && app.Properties.Template != nil && len(app.Properties.Template.Containers) > 0 { - container = *app.Properties.Template.Containers[0].Name + if name := app.Properties.Template.Containers[0].Name; name != nil { + container = *name + } } } if replica == "" { // ... if len(list.Value) == 0 { return "", "", "", fmt.Errorf("no replicas found for revision %q", revision) } - replica = *list.Value[0].Name + if list.Value[0].Name == nil { + return "", "", "", fmt.Errorf("replica name is nil for revision %q", revision) + } + replica = *list.Value[0].Name // Opportunistically pick the container from the replica if still unset. - if container == "" && len(list.Value[0].Properties.Containers) > 0 { - container = *list.Value[0].Properties.Containers[0].Name + if container == "" && list.Value[0].Properties != nil && len(list.Value[0].Properties.Containers) > 0 { + if name := list.Value[0].Properties.Containers[0].Name; name != nil { + container = *name + } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aca/common.go` around lines 152 - 175, The code dereferences Azure SDK pointer fields without nil checks (e.g., app.Properties.Template.Containers[0].Name, list.Value[0].Name, list.Value[0].Properties.Containers[0].Name) which can panic; update the logic in the block that calls c.newReplicasClient() and processes replicas (and the earlier template-based container pick) to guard every pointer access: verify app.Properties, app.Properties.Template, the container slice length, and that Containers[0] and its Name are non-nil before dereferencing; after ListReplicas also check list.Value is non-nil, that list.Value[0] is non-nil, that list.Value[0].Name is non-nil, that list.Value[0].Properties is non-nil, and that its Containers[0] and Name are non-nil, and return a clear error if any expected field is nil rather than dereferencing.src/pkg/clouds/azure/aci/tail_test.go-23-28 (1)
23-28:⚠️ Potential issue | 🟡 MinorUncomment or remove the resource group cleanup.
The commented-out
TearDowncall will leave Azure resource groups lingering when this test is eventually enabled. Either uncomment the cleanup or remove the emptyt.Cleanupblock entirely.🧹 Proposed fix
t.Cleanup(func() { - // err := containerInstance.TearDown(ctx) - // if err != nil { - // t.Fatalf("Failed to tear down container instance: %v", err) - // } + err := containerInstance.TearDown(ctx) + if err != nil { + t.Logf("Failed to tear down container instance: %v", err) + } })Note: Changed
t.Fatalftot.Logfin cleanup since fatals in cleanup can mask the original test result.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/tail_test.go` around lines 23 - 28, Uncomment and run the resource group cleanup by calling containerInstance.TearDown(ctx) inside the existing t.Cleanup closure (or remove the empty t.Cleanup if you intentionally want no cleanup); if you restore the call, handle errors by logging rather than fatalling in cleanup (use t.Logf to report any TearDown error) and ensure ctx and containerInstance are in scope for the cleanup closure so the test won't leave Azure resource groups lingering.src/pkg/clouds/azure/aca/tail.go-54-74 (1)
54-74:⚠️ Potential issue | 🟡 MinorSilent error swallowing may hide configuration or authentication issues.
The
poll()function silently returns on client creation errors (line 57) and paging errors (line 63). This could mask authentication failures or API issues, making debugging difficult. Consider logging these errors at debug level.🔧 Proposed fix to add debug logging
poll := func() { client, err := c.newContainerAppsClient() if err != nil { + // Log at debug level to aid troubleshooting return } pager := client.NewListByResourceGroupPager(c.ResourceGroup, nil) for pager.More() { page, err := pager.NextPage(ctx) if err != nil { + // Log at debug level to aid troubleshooting return }Note: You'll need to import a logging package or pass a logger to surface these errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aca/tail.go` around lines 54 - 74, poll currently swallows errors from c.newContainerAppsClient() and pager.NextPage(), which hides auth/config issues; update the poll function to log those errors at debug level before returning (use the logger available on the receiver struct c or add/pass one if missing), e.g., log the error and contextual info (resource group, operation) when c.newContainerAppsClient() fails and when pager.NextPage(ctx) returns an error, then preserve the original control flow (return after logging) so failures are visible for debugging; update references in the poll function around c.newContainerAppsClient, client.NewListByResourceGroupPager and pager.NextPage accordingly.src/pkg/clouds/azure/acr/acr.go-207-214 (1)
207-214:⚠️ Potential issue | 🟡 MinorPotential nil pointer dereference on
props.Status.
props.Statusis dereferenced at line 209 without a nil check. If the API returns a run with no status, this will panic.🔧 Proposed fix
+ if props.Status == nil { + return nil, fmt.Errorf("run %s has no status", runID) + } + status := &RunStatus{ RunID: runID, Status: *props.Status, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/acr/acr.go` around lines 207 - 214, The code dereferences props.Status when constructing a RunStatus, which can panic if props.Status is nil; update the construction (in the block that returns *RunStatus) to check props.Status for nil before using *props.Status (e.g., set statusStr := "unknown" or empty string and overwrite it only if props.Status != nil), assign that statusStr to RunStatus.Status, and keep the existing nil check for props.RunErrorMessage; references: RunStatus, props.Status, props.RunErrorMessage.src/pkg/clouds/azure/acr/acr.go-90-105 (1)
90-105:⚠️ Potential issue | 🟡 Minor
RegistryNameis set before creation is confirmed.
a.RegistryNameis assigned at line 90 beforeBeginCreateorPollUntilDonesucceeds. If registry creation fails, the struct is left in an inconsistent state with a name that doesn't exist.🔧 Proposed fix to set name after successful creation
- a.RegistryName = registryName - poller, err := client.BeginCreate(ctx, a.resourceGroupName, registryName, armcontainerregistry.Registry{ Location: a.Location.Ptr(), SKU: &armcontainerregistry.SKU{ Name: to.Ptr(armcontainerregistry.SKUNameBasic), }, }, nil) if err != nil { return fmt.Errorf("failed to create container registry: %w", err) } _, err = poller.PollUntilDone(ctx, nil) if err != nil { return fmt.Errorf("failed to poll registry creation: %w", err) } + a.RegistryName = registryName return nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/acr/acr.go` around lines 90 - 105, Do not assign a.RegistryName until after successful creation: move the a.RegistryName = registryName assignment to after client.BeginCreate and poller.PollUntilDone complete without error so the struct isn't left in an inconsistent state; specifically, keep the calls to client.BeginCreate(...) and poller.PollUntilDone(ctx, nil) as-is, and only set a.RegistryName = registryName after the second call returns nil error (use the existing variables poller and err to gate the assignment).
🧹 Nitpick comments (15)
src/pkg/cli/compose/context.go (1)
262-264: Azure-specific header in shared upload code.The
X-Ms-Blob-Type: BlockBlobheader is Azure-specific. As noted by the HACK comment, this should ideally be handled in the Azure provider rather than in shared compose code. For now, this works but may cause issues if other storage backends reject unknown headers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/compose/context.go` around lines 262 - 264, Shared compose upload code currently sets the Azure-specific header "X-Ms-Blob-Type" on the common header variable before calling http.PutWithHeader; remove that Azure-specific header from the shared function (leave Content-Type intact) and instead add logic in the Azure provider to inject "X-Ms-Blob-Type: BlockBlob" for uploads to Azure blobs. Locate the header construction and http.PutWithHeader call (the header variable and the call to http.PutWithHeader) to remove header.Set("X-Ms-Blob-Type", "BlockBlob") here, and update the Azure provider upload implementation to set that header when targeting BlockBlob storage. Ensure other providers are unaffected and tests cover that Azure uploads include the header while other backends do not.src/pkg/http/put.go (1)
24-30: Consider preserving any default request headers.The current implementation replaces the entire
Headermap with the caller-provided header. Whilehttp.NewRequestWithContextcurrently initializes an emptyHeadermap, a more defensive approach would be to merge headers:♻️ Optional: Merge headers instead of replacing
func PutWithHeader(ctx context.Context, url string, header http.Header, body io.Reader) (*http.Response, error) { req, err := http.NewRequestWithContext(ctx, http.MethodPut, url, body) if err != nil { return nil, err } - req.Header = header + for key, values := range header { + for _, value := range values { + req.Header.Add(key, value) + } + } return DefaultClient.Do(req) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/http/put.go` around lines 24 - 30, PutWithHeader currently overwrites the request's Header map instead of merging with any existing/default headers; change the logic so after creating the request with http.NewRequestWithContext you iterate over the provided header (if non-nil) and copy each key/value into req.Header using Add/Set rather than replacing the map, preserving any defaults; keep the current return path via DefaultClient.Do(req) and ensure nil-safety when header is nil.src/pkg/clouds/azure/common.go (2)
53-79: Deduplicate ARM storage client-factory creation.Line 53-79 repeats the same
NewCreds()+armstorage.NewClientFactory(...)flow. A small helper would reduce repetition and keep future client additions consistent.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/common.go` around lines 53 - 79, Create a small helper on the Azure type to encapsulate the duplicated NewCreds() + armstorage.NewClientFactory(...) logic (e.g., a method named newStorageClientFactory or storageClientFactory that returns (*armstorage.ClientFactory, error}); have Azure.NewStorageAccountsClient and Azure.NewBlobContainersClient call that helper and return clientFactory.NewAccountsClient() and clientFactory.NewBlobContainersClient() respectively, propagating errors unchanged (use the same error wrapping used today when creating the factory).
81-93: Remove commented-out constructor stub from production code.The commented block is stale and adds noise. Prefer deleting it or tracking it in an issue/PR note.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/common.go` around lines 81 - 93, Remove the stale commented-out constructor stub for NewRoleAssignmentsClient in the Azure type: delete the entire commented block that references func (a Azure) NewRoleAssignmentsClient(), armauthorization.RoleAssignmentsClient, and related NewRoleAssignmentsClient creation logic so production code contains no dead/commented code; if this logic needs tracking, add a separate ticket/PR note instead of leaving commented code in src/pkg/clouds/azure/common.go.src/pkg/cli/tailAndMonitor.go (1)
65-66: Consider documenting or making the delay configurable.The delay before cancelling tail was tripled from 2s to 6s. While this ensures log capture, it also increases wait time for all providers. Consider:
- Adding a comment explaining why 6s was chosen (Azure log propagation latency?).
- Making this provider-specific if Azure needs more time but AWS/GCP don't.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/tailAndMonitor.go` around lines 65 - 66, The hardcoded 6s sleep before cancelling the tail (pkg.SleepWithContext(ctx, 6*time.Second) followed by cancelTail(errMonitoringDone)) should be documented and made configurable; add a short comment near that call explaining why 6s was chosen (e.g., Azure log propagation) and replace the literal with a configurable value or provider-specific constant (e.g., tailCancelDelay or providerTailDelays map used by pkg.SleepWithContext), wiring a flag/env/config option so Azure can use 6s while AWS/GCP use a shorter delay, and keep cancelTail(errMonitoringDone) behavior unchanged.src/pkg/clouds/azure/aci/setup.go (2)
51-53: Consider resolving the TODO for storage account cleanup.The TODO suggests uncertainty about storage account deletion. If the storage account is created within the same resource group (as
SetUpStorageAccountdoes), it will be deleted with the resource group. However, ifAZURE_STORAGE_ACCOUNTis used, the external account won't be affected—which is likely the correct behavior.Would you like me to open an issue to track clarifying this behavior?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/setup.go` around lines 51 - 53, Replace the TODO about deleting the storage account with a clear comment documenting the intended behavior: explain that when SetUpStorageAccount creates a storage account in the same resource group it will be removed when the resource group is deleted, while an externally provided AZURE_STORAGE_ACCOUNT must not and will not be deleted by this code; remove or avoid adding any deletion logic here and, if desired, add an explicit TODO to open an issue for lifecycle policy decisions rather than leaving ambiguous text.
108-134: Remove commented-out debug code.This large block of commented-out code appears to be debug/experimental code. Consider removing it to improve readability, or if it's needed for future work, track it in an issue.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/setup.go` around lines 108 - 134, Remove the large commented-out debug block containing getCurrentUserObjectID, assignBlobDataContributor, NewStorageAccountsClient, ListKeys, and RegenerateKey usage (and related prints and error handling around storageAccount and c.resourceGroupName); either delete these commented lines to improve readability or extract them into a tracked issue/PR with a short TODO comment referencing that issue if you want to preserve the intent for future work.src/pkg/clouds/azure/aca/common.go (1)
71-74: Consider adding timeouts to HTTP requests.Using
http.DefaultClientfor REST API calls has no timeout configured, which could cause the caller to hang indefinitely if the Azure management endpoint is slow or unresponsive. Consider using a client with a timeout or leveraging the context's deadline.♻️ Suggested improvement
+var httpClient = &http.Client{Timeout: 30 * time.Second} + func (c *ContainerApp) getAuthToken(ctx context.Context, appName string) (string, error) { // ... - resp, err := http.DefaultClient.Do(req) + resp, err := httpClient.Do(req)Alternatively, create a client per-request that respects the context deadline.
Also applies to: 109-112
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aca/common.go` around lines 71 - 74, The code uses http.DefaultClient.Do(req) (see the resp, err := http.DefaultClient.Do(req) call) which has no timeout; change to use a context-aware HTTP client or honor the request context: ensure the request is created/associated with a context (req = req.WithContext(ctx) or accept a context param) and replace http.DefaultClient with an http.Client configured with a timeout or with Transport that respects context deadlines; update the other occurrences noted (around the second call at lines 109-112) to the same pattern so all Azure management calls won't hang indefinitely.src/pkg/cli/client/byoc/azure/login.go (1)
5-7: Authentication stub needs implementation.This method silently succeeds without performing any authentication. The
ctxandinteractiveparameters are unused. Before production use, this should integrate with Azure's authentication SDK (e.g.,azidentity.NewDefaultAzureCredentialor interactive browser flow).Do you want me to open an issue to track the Azure authentication implementation?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/client/byoc/azure/login.go` around lines 5 - 7, The Authenticate method on ByocAzure currently returns nil and ignores ctx and interactive; implement real Azure authentication by using Azure Identity (e.g., call azidentity.NewDefaultAzureCredential or the interactive browser credential when interactive==true) and store the resulting credentials on the ByocAzure instance so subsequent calls can use them; ensure you propagate ctx into credential creation and return any errors from credential initialization (update ByocAzure.Authenticate to perform the credential creation and assignment, and handle both default and interactive flows).src/pkg/clouds/azure/aci/tail_test.go (1)
11-12: Test is permanently skipped.Consider adding a comment explaining when/how this test should be run manually, or use
t.Skip("reason")with an explanation instead oft.SkipNow().📝 Suggested improvement
func TestTail(t *testing.T) { - t.SkipNow() // too slow for CI + t.Skip("Integration test: run manually with 'go test -tags=integration' against a real Azure subscription")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/tail_test.go` around lines 11 - 12, The TestTail unit test is unconditionally skipped via t.SkipNow(), remove the unconditional skip and replace it with t.Skip("reason") inside TestTail so the skip message documents why it's omitted from CI and how/when to run it manually (e.g., "too slow for CI; run with -run TestTail-local or set env TEST_AZURE_INTEGRATION=true"). Also add a short comment above TestTail explaining manual invocation steps and any required environment/setup so reviewers and devs know how to execute the test outside CI.src/pkg/clouds/azure/aci/run_test.go (1)
22-27: Re-enable teardown in cleanup to avoid cloud resource leaks.Lines 23-26 are commented, so this test leaves resources behind once enabled.
Proposed fix
t.Cleanup(func() { - // err := containerInstance.TearDown(ctx) - // if err != nil { - // t.Fatalf("Failed to tear down container instance: %v", err) - // } + if err := containerInstance.TearDown(ctx); err != nil { + t.Logf("cleanup: failed to tear down container instance: %v", err) + } })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/run_test.go` around lines 22 - 27, Re-enable the teardown inside the test cleanup so the created Azure container is always removed: inside the existing t.Cleanup(func() { ... }) call call containerInstance.TearDown(ctx) and check its returned error, invoking t.Fatalf("Failed to tear down container instance: %v", err) (or t.Errorf if you prefer non-fatal) when non-nil; ensure ctx and containerInstance used in the test are in scope for the cleanup closure so resources are not leaked.src/pkg/clouds/azure/acr/acr_test.go (1)
66-75: Avoid silent cleanup failures in integration tests.Lines 68-73 ignore cleanup errors, making leaked ACR registries hard to detect.
Proposed fix
t.Cleanup(func() { ctx := context.Background() client, err := acr.newRegistriesClient() if err != nil { + t.Logf("cleanup: failed to create client: %v", err) return } - poller, _ := client.BeginDelete(ctx, acr.resourceGroupName, registryName, nil) + poller, err := client.BeginDelete(ctx, acr.resourceGroupName, registryName, nil) + if err != nil { + t.Logf("cleanup: failed to delete registry: %v", err) + return + } if poller != nil { _, _ = poller.PollUntilDone(ctx, nil) } })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/acr/acr_test.go` around lines 66 - 75, The test's t.Cleanup closure silently ignores errors from acr.newRegistriesClient(), client.BeginDelete and poller.PollUntilDone which can leak ACR registries; update the cleanup to capture and surface errors (use t.Logf or t.Errorf/t.Fatalf as appropriate): call acr.newRegistriesClient() and if it returns an error log/fail the test, check the error from client.BeginDelete and log/fail on non-nil error, and check the error returned by poller.PollUntilDone and log/fail there too, referencing the acr.newRegistriesClient, client.BeginDelete and poller.PollUntilDone calls so failures are not dropped.src/pkg/clouds/azure/aca/tail.go (1)
128-131: Consider adding a timeout or using a client with transport-level timeouts.Using
http.DefaultClientfor a streaming endpoint without timeouts could cause the goroutine to hang indefinitely if the server stops responding without closing the connection. For a streaming/follow scenario, consider at least a response header timeout.♻️ Example: use a client with response header timeout
+var streamClient = &http.Client{ + Transport: &http.Transport{ + ResponseHeaderTimeout: 30 * time.Second, + }, +} + func (c *ContainerApp) StreamLogs(...) (<-chan LogEntry, error) { // ... - resp, err := http.DefaultClient.Do(req) + resp, err := streamClient.Do(req)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aca/tail.go` around lines 128 - 131, The code uses http.DefaultClient.Do(req) for a long-running streaming request (resp, req) which can hang; replace it with a dedicated http.Client that sets timeouts (e.g., an overall Timeout or a Transport with ResponseHeaderTimeout/IdleConnTimeout) and call client.Do(req) instead of http.DefaultClient.Do(req). Locate the request site (the resp, err := http.DefaultClient.Do(req) call in the tail/streaming function) and create or inject a client (e.g., a package-level or function-scoped client configured with timeouts) so the streaming goroutine still closes resp.Body but the connection cannot hang indefinitely.src/pkg/cli/client/byoc/azure/byoc.go (1)
229-230: Security: Sensitive values exposed as plain environment variables.The TODO comments acknowledge this, but
pulumiBackendValueandPULUMI_CONFIG_PASSPHRASEcontain sensitive credentials. Consider tracking this as a follow-up to use Azure Key Vault or secure environment variables.Would you like me to open an issue to track converting these to secrets?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/client/byoc/azure/byoc.go` around lines 229 - 230, The code is exposing sensitive values pulumiBackendValue and byoc.PulumiConfigPassphrase via plain environment variables (pulumiBackendKey and "PULUMI_CONFIG_PASSPHRASE"); update the flow to fetch these secrets from a secure store (e.g., Azure Key Vault or your secret manager) instead of hardcoding/setting them directly, mark them as secret when passing into Pulumi (so they are redacted), avoid logging their values, and add a clear TODO or create a tracked issue to migrate pulumiBackendKey/pulumiBackendValue and byoc.PulumiConfigPassphrase to the secret store with CI/runner fallback for development credentials.src/pkg/clouds/azure/acr/acr.go (1)
133-140: YAML construction via string concatenation is fragile.Building YAML by concatenating strings without proper escaping could produce malformed YAML if
req.Imageorreq.Commandcontains special characters (colons, quotes, newlines, etc.). Consider using a YAML library or at minimum quoting the values.♻️ Safer approach using proper quoting
var yaml strings.Builder yaml.WriteString("steps:\n") - yaml.WriteString(" - cmd: " + req.Image) + // Quote the image to handle special characters + yaml.WriteString(" - cmd: \"" + strings.ReplaceAll(req.Image, "\"", "\\\"") + "\"") if len(req.Command) > 0 { - // Arguments after the image are passed as the command - yaml.WriteString(" " + strings.Join(req.Command, " ")) + // Properly quote each argument + for _, arg := range req.Command { + yaml.WriteString(" \"" + strings.ReplaceAll(arg, "\"", "\\\"") + "\"") + } } yaml.WriteString("\n")Alternatively, use
gopkg.in/yaml.v3(orgo.yaml.in/yaml/v3) to marshal a struct.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/acr/acr.go` around lines 133 - 140, The current YAML is built by concatenating into the strings.Builder (see variables yaml, and the use of req.Image and req.Command) which risks producing invalid YAML when values contain special characters; replace this with structured marshaling: create a small Go struct representing the pipeline (e.g., type step struct { Cmd string `yaml:"cmd"` } and wrapper { Steps []step `yaml:"steps"` }), populate Cmd by joining req.Image and req.Command, then use gopkg.in/yaml.v3 (yaml.Marshal) to produce the YAML bytes and write those instead of manual string concatenation; ensure you remove the existing yaml.WriteString concatenation logic and handle any marshal errors returned by yaml.Marshal.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/go.mod`:
- Around line 20-31: Replace the beta Azure SDK module reference by updating the
module entry
"github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/authorization/armauthorization/v3
v3.0.0-beta.2" to the stable release "v2.2.0"; locate the dependency line for
armauthorization (the module name contains armauthorization/v3) in the go.mod
and change the version string to v2.2.0, then run go mod tidy to ensure the
lockfile and module graph are updated.
In `@src/pkg/clouds/azure/aci/blob.go`:
- Around line 81-91: The DownloadBlob method on ContainerInstance uses
io.ReadAll on resp.Body which can OOM for large blobs; change it to enforce a
max read bound by wrapping resp.Body with io.LimitReader (or use io.CopyN) using
a defined constant like maxBlobDownloadBytes and return an explicit error if the
blob exceeds that limit (or change the signature to stream by returning
io.ReadCloser). Locate ContainerInstance.DownloadBlob and the call to
client.NewBlobClient(blobName).DownloadStream(ctx, nil) / resp.Body, replace
io.ReadAll(resp.Body) with a bounded read approach (or streaming return) and
ensure the function documents/handles the max size error path.
- Around line 30-35: When handling the result of accountsClient.ListKeys in the
blob storage logic, guard against empty or malformed responses before
dereferencing keys.Keys[0].Value: check that keys is non-nil, keys.Keys has
length > 0, and keys.Keys[0].Value is non-nil; if any check fails return a clear
error instead of dereferencing. Update the code around the ListKeys call (the
variables keys, err and storageKey assignment) to validate the response and
propagate a meaningful error when keys are missing or malformed.
In `@src/pkg/clouds/azure/aci/common.go`:
- Around line 22-41: NewContainerInstance currently silently allows an empty
Azure location and produces an invalid resource group name; change
NewContainerInstance to return (*ContainerInstance, error) and validate the
effective location (argument or AZURE_LOCATION) immediately—if empty return a
descriptive error (e.g., "azure location not set") instead of constructing the
object; if valid, construct the ContainerInstance, set SubscriptionID and
StorageAccount as before, call c.SetLocation(loc) to compute resourceGroupName,
and return the instance with nil error; update any callers of
NewContainerInstance to handle the error.
In `@src/pkg/clouds/azure/aci/run_test.go`:
- Around line 10-12: The test TestRun currently calls t.SkipNow()
unconditionally which prevents it from ever exercising the ACI run path; change
the unconditional skip to a conditional skip (e.g., check testing.Short() or an
integration-specific env var/flag) so the test runs in integration contexts but
is skipped during fast/unit runs/CI by default, and ensure the skip call
references the same TestRun function/t variable and includes a descriptive
message.
In `@src/pkg/clouds/azure/aci/run.go`:
- Around line 49-57: The code currently builds clone.Containers based only on
the supplied containers slice, causing an empty container list when containers
is nil/empty; modify the Run logic to fall back to the default container
definitions held on the receiver (c.ContainerGroupProps.Containers or a
predefined default) when the incoming containers slice is nil or has length 0,
then iterate over that effectiveContainers slice to populate clone.Containers;
ensure you still apply safeAppend(newProps.Command, commandArgs...) for the
first container and safeAppend(newProps.EnvironmentVariables, envVars...) for
each container so the behavior of newProps.Command and
newProps.EnvironmentVariables and creation of armcontainerinstance.Container
entries remains correct.
- Around line 70-76: The BeginCreateOrUpdate and BeginStart calls on
containerGroupClient (BeginCreateOrUpdate and BeginStart) are starting
long-running operations but not waiting for them to finish; change the code to
capture the poller returned by BeginCreateOrUpdate and call PollUntilDone (or
the SDK's equivalent) to wait for creation to complete before calling
BeginStart, and likewise capture the poller from BeginStart and PollUntilDone to
wait for the start operation to finish (follow the pattern used in setup.go for
PollUntilDone).
In `@src/pkg/clouds/azure/aci/setup.go`:
- Around line 208-216: The current code assigns Contributor and User Access
Administrator at subscription scope (subscriptionScope built from
c.SubscriptionID) via assignRole using contributorRoleID and
userAccessAdministratorRoleID; update this by documenting the security
implications in user-facing docs and in-code comments, and add an option/flag
(e.g., in the configuration struct where c.SubscriptionID is defined) to allow
resource-group-scoped role assignment instead of subscription-scoped; implement
the alternate flow by constructing a resource-group scope
("/subscriptions/"+c.SubscriptionID+"/resourceGroups/"+c.ResourceGroupName) and
calling assignRole with that scope when the flag indicates least-privilege is
desired, and keep the existing subscriptionScope path as the opt-in behavior
while updating comments near assignment calls and function assignRole usage to
reflect this change.
In `@src/pkg/clouds/azure/aci/stop.go`:
- Around line 7-14: The Stop method on ContainerInstance dereferences groupName
unconditionally which can panic; add a nil guard at the start of
ContainerInstance.Stop to return a clear error if groupName is nil, then only
dereference *groupName after that check before calling
containerGroupClient.Stop; update any error message to indicate "nil groupName"
and keep the rest of the logic unchanged.
In `@src/pkg/clouds/azure/aci/tail.go`:
- Around line 131-142: The retry loop around client.Attach in tail.go currently
uses time.Sleep which ignores context cancellation; change the retry sleep to a
cancellable wait using select with time.After and ctx.Done() so that when
errors.As detects ContainerNotFound you do: select { case
<-time.After(time.Second): continue; case <-ctx.Done(): return nil, ctx.Err() },
keeping the rest of error handling the same; update the loop around
attachResponse, err, client.Attach, resourceGroupName, groupName, and
containerName accordingly.
- Around line 50-57: The retry loop in tail.go that calls client.ListLogs
currently uses time.Sleep and ignores context cancellation; modify the loop (the
for { ... if errors.As(err, &respErr) && respErr.ErrorCode ==
"ContainerGroupDeploymentNotReady" { ... } } block around client.ListLogs) to
wait in a context-aware way (e.g., replace time.Sleep(time.Second) with a select
on ctx.Done() and time.After(time.Second)) so that if ctx is cancelled you
break/return promptly instead of continuing to retry; ensure you return the
original error or ctx.Err() as appropriate when ctx.Done() is selected.
- Around line 175-177: The code indexes logLine[0] without checking length,
which panics on empty websocket messages; before extracting stdioFd and slicing
logLine[1:], guard with a length check (e.g., if len(logLine) == 0 { continue }
or treat as stdout) so you only access logLine[0] when present; update the logic
around variables logLine, stdioFd, and the channel send ch <- LogEntry{Message:
string(logLine[1:]), Stderr: stdioFd == 2} to safely handle empty messages.
- Around line 146-150: The defer resp.Body.Close() is called before checking the
error from websocket.DefaultDialer.DialContext, which can panic if resp is nil;
move the defer so it runs only after a successful DialContext and resp is
non-nil, and when returning the error from DialContext, guard the error message
to avoid dereferencing resp (e.g., include resp.Status only when resp != nil or
include a fallback like "no response"). Locate the call to
websocket.DefaultDialer.DialContext and the surrounding error handling near
attachResponse.WebSocketURI and adjust ordering and nil checks accordingly.
In `@src/pkg/clouds/azure/aci/upload_test.go`:
- Line 29: Test code prints a full pre-signed SAS upload URL (t.Logf("Upload
URL: %s", url)) which leaks credentials; instead redact the query string before
logging. Update the test to parse the URL value (use url.Parse or equivalent),
remove or replace RawQuery (e.g., set to empty or "<redacted>") and log only the
scheme://host/path or a redacted string; locate the t.Logf call in
upload_test.go and replace it with code that sanitizes the url variable before
logging.
In `@src/pkg/clouds/azure/aci/upload.go`:
- Around line 51-56: The code dereferences keys.Keys[0].Value without checking
the result from accountsClient.ListKeys (in the upload flow that sets
storageKey), which can panic if keys.Keys is empty or the Value is nil; update
the block that calls accountsClient.ListKeys(ctx, c.resourceGroupName,
c.StorageAccount, nil) to validate that keys != nil, len(keys.Keys) > 0 and
keys.Keys[0].Value != nil before assigning storageKey, and return a clear error
if the checks fail so storageKey is only set from a safe, non-nil value.
In `@src/pkg/clouds/azure/acr/tail.go`:
- Around line 107-113: The code in tail.go reads body into text and indexes
text[len(text)-1] which panics on empty chunks; update the logic around
variables text, lines and complete to guard for empty text (e.g., if len(text)
== 0) before accessing the last byte, and treat an empty text as producing no
complete lines (skip slicing or set complete to nil/empty) so the subsequent
processing of complete and lines is safe; locate the block that declares text :=
string(body), lines := strings.Split(...), and the conditional using
text[len(text)-1] to add the guard.
- Around line 92-94: The current check that returns (offset, nil) when
resp.StatusCode is neither 200 nor 206 silently swallows server errors; update
the code in the tail logic that inspects resp.StatusCode (the block that
currently returns offset, nil for non-StatusOK/StatusPartialContent) to return a
descriptive error instead — include the resp.StatusCode and any short response
body or status text in the error (and/or log it via the existing logger) so
callers can detect and surface failed log endpoint responses rather than
treating them as a successful no-op.
---
Outside diff comments:
In `@src/cmd/cli/command/estimate.go`:
- Around line 70-75: The providerDescription map is missing entries for Azure
(and ensure DigitalOcean is present) so the survey shows empty descriptions;
update the providerDescription variable to include a descriptive string for
client.ProviderAzure (e.g., mention Azure CLI/ARM/ADC or environment variables)
and verify/adjust the client.ProviderDO entry text for DigitalOcean so both
client.ProviderAzure and client.ProviderDO have clear descriptions used by the
survey prompt.
---
Minor comments:
In `@src/pkg/clouds/azure/aca/common.go`:
- Around line 152-175: The code dereferences Azure SDK pointer fields without
nil checks (e.g., app.Properties.Template.Containers[0].Name,
list.Value[0].Name, list.Value[0].Properties.Containers[0].Name) which can
panic; update the logic in the block that calls c.newReplicasClient() and
processes replicas (and the earlier template-based container pick) to guard
every pointer access: verify app.Properties, app.Properties.Template, the
container slice length, and that Containers[0] and its Name are non-nil before
dereferencing; after ListReplicas also check list.Value is non-nil, that
list.Value[0] is non-nil, that list.Value[0].Name is non-nil, that
list.Value[0].Properties is non-nil, and that its Containers[0] and Name are
non-nil, and return a clear error if any expected field is nil rather than
dereferencing.
In `@src/pkg/clouds/azure/aca/tail.go`:
- Around line 54-74: poll currently swallows errors from
c.newContainerAppsClient() and pager.NextPage(), which hides auth/config issues;
update the poll function to log those errors at debug level before returning
(use the logger available on the receiver struct c or add/pass one if missing),
e.g., log the error and contextual info (resource group, operation) when
c.newContainerAppsClient() fails and when pager.NextPage(ctx) returns an error,
then preserve the original control flow (return after logging) so failures are
visible for debugging; update references in the poll function around
c.newContainerAppsClient, client.NewListByResourceGroupPager and pager.NextPage
accordingly.
In `@src/pkg/clouds/azure/aci/setup_test.go`:
- Around line 17-22: Replace the non-fatal assertion in the test's
"SetUpResourceGroup" subtest so the test fails immediately if resource group
creation fails: change the call that currently uses t.Errorf when
c.SetUpResourceGroup(ctx) returns an error to t.Fatalf (or otherwise abort the
test) to prevent subsequent TearDown from running against an uncreated resource;
update the subtest around SetUpResourceGroup and references to
c.SetUpResourceGroup/context.Background() accordingly.
In `@src/pkg/clouds/azure/aci/tail_test.go`:
- Around line 23-28: Uncomment and run the resource group cleanup by calling
containerInstance.TearDown(ctx) inside the existing t.Cleanup closure (or remove
the empty t.Cleanup if you intentionally want no cleanup); if you restore the
call, handle errors by logging rather than fatalling in cleanup (use t.Logf to
report any TearDown error) and ensure ctx and containerInstance are in scope for
the cleanup closure so the test won't leave Azure resource groups lingering.
In `@src/pkg/clouds/azure/acr/acr.go`:
- Around line 207-214: The code dereferences props.Status when constructing a
RunStatus, which can panic if props.Status is nil; update the construction (in
the block that returns *RunStatus) to check props.Status for nil before using
*props.Status (e.g., set statusStr := "unknown" or empty string and overwrite it
only if props.Status != nil), assign that statusStr to RunStatus.Status, and
keep the existing nil check for props.RunErrorMessage; references: RunStatus,
props.Status, props.RunErrorMessage.
- Around line 90-105: Do not assign a.RegistryName until after successful
creation: move the a.RegistryName = registryName assignment to after
client.BeginCreate and poller.PollUntilDone complete without error so the struct
isn't left in an inconsistent state; specifically, keep the calls to
client.BeginCreate(...) and poller.PollUntilDone(ctx, nil) as-is, and only set
a.RegistryName = registryName after the second call returns nil error (use the
existing variables poller and err to gate the assignment).
In `@src/pkg/http/post.go`:
- Around line 48-50: PostWithContext currently always adds a "Content-Type"
header (even when contentType == ""), which forces an empty header value and
changes previous behavior; update PostWithContext so it only passes a header to
PostWithHeader when contentType is non-empty—i.e., if contentType == "" call
PostWithHeader with a nil/empty header (or omit the header argument) and
otherwise call PostWithHeader with http.Header{"Content-Type":
[]string{contentType}} to preserve prior behavior; refer to PostWithContext and
PostWithHeader to locate and implement this conditional header construction.
---
Nitpick comments:
In `@src/pkg/cli/client/byoc/azure/byoc.go`:
- Around line 229-230: The code is exposing sensitive values pulumiBackendValue
and byoc.PulumiConfigPassphrase via plain environment variables
(pulumiBackendKey and "PULUMI_CONFIG_PASSPHRASE"); update the flow to fetch
these secrets from a secure store (e.g., Azure Key Vault or your secret manager)
instead of hardcoding/setting them directly, mark them as secret when passing
into Pulumi (so they are redacted), avoid logging their values, and add a clear
TODO or create a tracked issue to migrate pulumiBackendKey/pulumiBackendValue
and byoc.PulumiConfigPassphrase to the secret store with CI/runner fallback for
development credentials.
In `@src/pkg/cli/client/byoc/azure/login.go`:
- Around line 5-7: The Authenticate method on ByocAzure currently returns nil
and ignores ctx and interactive; implement real Azure authentication by using
Azure Identity (e.g., call azidentity.NewDefaultAzureCredential or the
interactive browser credential when interactive==true) and store the resulting
credentials on the ByocAzure instance so subsequent calls can use them; ensure
you propagate ctx into credential creation and return any errors from credential
initialization (update ByocAzure.Authenticate to perform the credential creation
and assignment, and handle both default and interactive flows).
In `@src/pkg/cli/compose/context.go`:
- Around line 262-264: Shared compose upload code currently sets the
Azure-specific header "X-Ms-Blob-Type" on the common header variable before
calling http.PutWithHeader; remove that Azure-specific header from the shared
function (leave Content-Type intact) and instead add logic in the Azure provider
to inject "X-Ms-Blob-Type: BlockBlob" for uploads to Azure blobs. Locate the
header construction and http.PutWithHeader call (the header variable and the
call to http.PutWithHeader) to remove header.Set("X-Ms-Blob-Type", "BlockBlob")
here, and update the Azure provider upload implementation to set that header
when targeting BlockBlob storage. Ensure other providers are unaffected and
tests cover that Azure uploads include the header while other backends do not.
In `@src/pkg/cli/tailAndMonitor.go`:
- Around line 65-66: The hardcoded 6s sleep before cancelling the tail
(pkg.SleepWithContext(ctx, 6*time.Second) followed by
cancelTail(errMonitoringDone)) should be documented and made configurable; add a
short comment near that call explaining why 6s was chosen (e.g., Azure log
propagation) and replace the literal with a configurable value or
provider-specific constant (e.g., tailCancelDelay or providerTailDelays map used
by pkg.SleepWithContext), wiring a flag/env/config option so Azure can use 6s
while AWS/GCP use a shorter delay, and keep cancelTail(errMonitoringDone)
behavior unchanged.
In `@src/pkg/clouds/azure/aca/common.go`:
- Around line 71-74: The code uses http.DefaultClient.Do(req) (see the resp, err
:= http.DefaultClient.Do(req) call) which has no timeout; change to use a
context-aware HTTP client or honor the request context: ensure the request is
created/associated with a context (req = req.WithContext(ctx) or accept a
context param) and replace http.DefaultClient with an http.Client configured
with a timeout or with Transport that respects context deadlines; update the
other occurrences noted (around the second call at lines 109-112) to the same
pattern so all Azure management calls won't hang indefinitely.
In `@src/pkg/clouds/azure/aca/tail.go`:
- Around line 128-131: The code uses http.DefaultClient.Do(req) for a
long-running streaming request (resp, req) which can hang; replace it with a
dedicated http.Client that sets timeouts (e.g., an overall Timeout or a
Transport with ResponseHeaderTimeout/IdleConnTimeout) and call client.Do(req)
instead of http.DefaultClient.Do(req). Locate the request site (the resp, err :=
http.DefaultClient.Do(req) call in the tail/streaming function) and create or
inject a client (e.g., a package-level or function-scoped client configured with
timeouts) so the streaming goroutine still closes resp.Body but the connection
cannot hang indefinitely.
In `@src/pkg/clouds/azure/aci/run_test.go`:
- Around line 22-27: Re-enable the teardown inside the test cleanup so the
created Azure container is always removed: inside the existing t.Cleanup(func()
{ ... }) call call containerInstance.TearDown(ctx) and check its returned error,
invoking t.Fatalf("Failed to tear down container instance: %v", err) (or
t.Errorf if you prefer non-fatal) when non-nil; ensure ctx and containerInstance
used in the test are in scope for the cleanup closure so resources are not
leaked.
In `@src/pkg/clouds/azure/aci/setup.go`:
- Around line 51-53: Replace the TODO about deleting the storage account with a
clear comment documenting the intended behavior: explain that when
SetUpStorageAccount creates a storage account in the same resource group it will
be removed when the resource group is deleted, while an externally provided
AZURE_STORAGE_ACCOUNT must not and will not be deleted by this code; remove or
avoid adding any deletion logic here and, if desired, add an explicit TODO to
open an issue for lifecycle policy decisions rather than leaving ambiguous text.
- Around line 108-134: Remove the large commented-out debug block containing
getCurrentUserObjectID, assignBlobDataContributor, NewStorageAccountsClient,
ListKeys, and RegenerateKey usage (and related prints and error handling around
storageAccount and c.resourceGroupName); either delete these commented lines to
improve readability or extract them into a tracked issue/PR with a short TODO
comment referencing that issue if you want to preserve the intent for future
work.
In `@src/pkg/clouds/azure/aci/tail_test.go`:
- Around line 11-12: The TestTail unit test is unconditionally skipped via
t.SkipNow(), remove the unconditional skip and replace it with t.Skip("reason")
inside TestTail so the skip message documents why it's omitted from CI and
how/when to run it manually (e.g., "too slow for CI; run with -run
TestTail-local or set env TEST_AZURE_INTEGRATION=true"). Also add a short
comment above TestTail explaining manual invocation steps and any required
environment/setup so reviewers and devs know how to execute the test outside CI.
In `@src/pkg/clouds/azure/acr/acr_test.go`:
- Around line 66-75: The test's t.Cleanup closure silently ignores errors from
acr.newRegistriesClient(), client.BeginDelete and poller.PollUntilDone which can
leak ACR registries; update the cleanup to capture and surface errors (use
t.Logf or t.Errorf/t.Fatalf as appropriate): call acr.newRegistriesClient() and
if it returns an error log/fail the test, check the error from
client.BeginDelete and log/fail on non-nil error, and check the error returned
by poller.PollUntilDone and log/fail there too, referencing the
acr.newRegistriesClient, client.BeginDelete and poller.PollUntilDone calls so
failures are not dropped.
In `@src/pkg/clouds/azure/acr/acr.go`:
- Around line 133-140: The current YAML is built by concatenating into the
strings.Builder (see variables yaml, and the use of req.Image and req.Command)
which risks producing invalid YAML when values contain special characters;
replace this with structured marshaling: create a small Go struct representing
the pipeline (e.g., type step struct { Cmd string `yaml:"cmd"` } and wrapper {
Steps []step `yaml:"steps"` }), populate Cmd by joining req.Image and
req.Command, then use gopkg.in/yaml.v3 (yaml.Marshal) to produce the YAML bytes
and write those instead of manual string concatenation; ensure you remove the
existing yaml.WriteString concatenation logic and handle any marshal errors
returned by yaml.Marshal.
In `@src/pkg/clouds/azure/common.go`:
- Around line 53-79: Create a small helper on the Azure type to encapsulate the
duplicated NewCreds() + armstorage.NewClientFactory(...) logic (e.g., a method
named newStorageClientFactory or storageClientFactory that returns
(*armstorage.ClientFactory, error}); have Azure.NewStorageAccountsClient and
Azure.NewBlobContainersClient call that helper and return
clientFactory.NewAccountsClient() and clientFactory.NewBlobContainersClient()
respectively, propagating errors unchanged (use the same error wrapping used
today when creating the factory).
- Around line 81-93: Remove the stale commented-out constructor stub for
NewRoleAssignmentsClient in the Azure type: delete the entire commented block
that references func (a Azure) NewRoleAssignmentsClient(),
armauthorization.RoleAssignmentsClient, and related NewRoleAssignmentsClient
creation logic so production code contains no dead/commented code; if this logic
needs tracking, add a separate ticket/PR note instead of leaving commented code
in src/pkg/clouds/azure/common.go.
In `@src/pkg/http/put.go`:
- Around line 24-30: PutWithHeader currently overwrites the request's Header map
instead of merging with any existing/default headers; change the logic so after
creating the request with http.NewRequestWithContext you iterate over the
provided header (if non-nil) and copy each key/value into req.Header using
Add/Set rather than replacing the map, preserving any defaults; keep the current
return path via DefaultClient.Do(req) and ensure nil-safety when header is nil.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9af409ca-3a77-4d21-9294-b56a7a0f273e
⛔ Files ignored due to path filters (2)
src/go.sumis excluded by!**/*.sumsrc/protos/io/defang/v1/fabric.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (44)
flake.nixpkgs/defang/cli.nixsrc/cmd/cli/command/estimate.gosrc/go.modsrc/pkg/agent/tools/estimate_test.gosrc/pkg/cli/client/byoc/aws/byoc.gosrc/pkg/cli/client/byoc/azure/byoc.gosrc/pkg/cli/client/byoc/azure/login.gosrc/pkg/cli/client/byoc/gcp/byoc.gosrc/pkg/cli/client/provider_id.gosrc/pkg/cli/client/region.gosrc/pkg/cli/compose/context.gosrc/pkg/cli/connect.gosrc/pkg/cli/tailAndMonitor.gosrc/pkg/clouds/azure/aca/common.gosrc/pkg/clouds/azure/aca/tail.gosrc/pkg/clouds/azure/aci/blob.gosrc/pkg/clouds/azure/aci/common.gosrc/pkg/clouds/azure/aci/common_test.gosrc/pkg/clouds/azure/aci/run.gosrc/pkg/clouds/azure/aci/run_test.gosrc/pkg/clouds/azure/aci/setup.gosrc/pkg/clouds/azure/aci/setup_test.gosrc/pkg/clouds/azure/aci/status.gosrc/pkg/clouds/azure/aci/stop.gosrc/pkg/clouds/azure/aci/tail.gosrc/pkg/clouds/azure/aci/tail_test.gosrc/pkg/clouds/azure/aci/upload.gosrc/pkg/clouds/azure/aci/upload_test.gosrc/pkg/clouds/azure/acr/acr.gosrc/pkg/clouds/azure/acr/acr_test.gosrc/pkg/clouds/azure/acr/tail.gosrc/pkg/clouds/azure/acr/types.gosrc/pkg/clouds/azure/acr/types_test.gosrc/pkg/clouds/azure/appcfg/appcfg.gosrc/pkg/clouds/azure/common.gosrc/pkg/clouds/azure/location.gosrc/pkg/http/post.gosrc/pkg/http/put.gosrc/pkg/session/session.gosrc/pkg/stacks/selector_test.gosrc/pkg/stacks/wizard.gosrc/pkg/utils.gosrc/protos/io/defang/v1/fabric.proto
| github.com/Azure/azure-sdk-for-go/sdk/azcore v1.20.0 | ||
| github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.13.1 | ||
| github.com/Azure/azure-sdk-for-go/sdk/data/azappconfig v1.2.0 | ||
| github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/appconfiguration/armappconfiguration v1.1.1 | ||
| github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/appcontainers/armappcontainers v1.1.0 | ||
| github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/authorization/armauthorization/v3 v3.0.0-beta.2 | ||
| github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/containerinstance/armcontainerinstance/v2 v2.4.0 | ||
| github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/containerregistry/armcontainerregistry/v2 v2.0.0 | ||
| github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/msi/armmsi v1.3.0 | ||
| github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/resources/armresources/v2 v2.1.0 | ||
| github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/storage/armstorage/v2 v2.0.0 | ||
| github.com/Azure/azure-sdk-for-go/sdk/storage/azblob v1.6.4 |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the latest stable version of github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/authorization/armauthorization?
💡 Result:
The latest stable version is v2.2.0 for the Go module github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/authorization/armauthorization/v2 (published Nov 27, 2023). [1]
Note: there is a newer v3 line, but it’s currently tagged as beta (e.g. v3.0.0-beta.2), so it’s not stable. [2]
You’d typically install the latest stable with:
go get github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/authorization/armauthorization/v2@v2.2.0Sources: [1], [2]
Replace armauthorization/v3 v3.0.0-beta.2 with the stable version v2.2.0.
The Azure SDK dependencies are appropriate for BYOC implementation, but armauthorization/v3 v3.0.0-beta.2 (line 25) is a beta version that may introduce breaking changes in future updates. A stable alternative exists at v2.2.0. Update the dependency to use the stable release instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/go.mod` around lines 20 - 31, Replace the beta Azure SDK module reference
by updating the module entry
"github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/authorization/armauthorization/v3
v3.0.0-beta.2" to the stable release "v2.2.0"; locate the dependency line for
armauthorization (the module name contains armauthorization/v3) in the go.mod
and change the version string to v2.2.0, then run go mod tidy to ensure the
lockfile and module graph are updated.
| func NewContainerInstance(resourceGroupPrefix string, location azure.Location) *ContainerInstance { | ||
| if location == "" { | ||
| location = azure.Location(os.Getenv("AZURE_LOCATION")) | ||
| } | ||
| ci := &ContainerInstance{ | ||
| Azure: azure.Azure{ | ||
| SubscriptionID: os.Getenv("AZURE_SUBSCRIPTION_ID"), | ||
| }, | ||
| resourceGroupPrefix: resourceGroupPrefix, | ||
| StorageAccount: os.Getenv("DEFANG_CD_BUCKET"), | ||
| } | ||
| ci.SetLocation(location) | ||
| return ci | ||
| } | ||
|
|
||
| // SetLocation updates the location and recomputes the resource group name. | ||
| func (c *ContainerInstance) SetLocation(loc azure.Location) { | ||
| c.Location = loc | ||
| c.resourceGroupName = c.resourceGroupPrefix + loc.String() | ||
| } |
There was a problem hiding this comment.
Fail fast when Azure location is unset.
If both the argument and AZURE_LOCATION are empty, the instance is created with an invalid effective location and a degraded resource-group name. Validate at construction time and return an explicit error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/pkg/clouds/azure/aci/common.go` around lines 22 - 41,
NewContainerInstance currently silently allows an empty Azure location and
produces an invalid resource group name; change NewContainerInstance to return
(*ContainerInstance, error) and validate the effective location (argument or
AZURE_LOCATION) immediately—if empty return a descriptive error (e.g., "azure
location not set") instead of constructing the object; if valid, construct the
ContainerInstance, set SubscriptionID and StorageAccount as before, call
c.SetLocation(loc) to compute resourceGroupName, and return the instance with
nil error; update any callers of NewContainerInstance to handle the error.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/pkg/session/session.go (1)
135-138: Improve Azure warning to catch partial configuration.Line 136 only checks whether any Azure env var is set. If one key exists but others are missing, this warning won’t trigger, and failures get deferred. Consider warning on incomplete Azure env setup too.
Suggested patch
import ( "context" "fmt" + "os" "github.com/DefangLabs/defang/src/pkg" @@ case client.ProviderAzure: - if env := pkg.AzureInEnv(); env == "" { + required := []string{ + "AZURE_SUBSCRIPTION_ID", + "AZURE_TENANT_ID", + "AZURE_CLIENT_ID", + "AZURE_CLIENT_SECRET", + } + missing := make([]string, 0, len(required)) + for _, key := range required { + if os.Getenv(key) == "" { + missing = append(missing, key) + } + } + if len(missing) == len(required) { term.Warn("Azure provider was selected, but no Azure environment variables are set") + } else if len(missing) > 0 { + term.Warnf("Azure provider was selected, but Azure configuration is incomplete; missing: %v", missing) } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/session/session.go` around lines 135 - 138, The current Azure branch only checks pkg.AzureInEnv() == "" which misses partial configurations; update the client.ProviderAzure case to validate that all required Azure env vars (e.g., AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET, and any other keys your code expects) are present and call term.Warn if any are missing or only a subset is set. You can either change pkg.AzureInEnv() to return the list/status of missing keys (or a boolean plus missing-key info) or add a new helper like pkg.AzureMissingEnvKeys() and use it here; ensure the warning via term.Warn mentions which keys are missing so partial configurations are caught early.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/pkg/session/session.go`:
- Around line 135-138: The current Azure branch only checks pkg.AzureInEnv() ==
"" which misses partial configurations; update the client.ProviderAzure case to
validate that all required Azure env vars (e.g., AZURE_CLIENT_ID,
AZURE_TENANT_ID, AZURE_CLIENT_SECRET, and any other keys your code expects) are
present and call term.Warn if any are missing or only a subset is set. You can
either change pkg.AzureInEnv() to return the list/status of missing keys (or a
boolean plus missing-key info) or add a new helper like
pkg.AzureMissingEnvKeys() and use it here; ensure the warning via term.Warn
mentions which keys are missing so partial configurations are caught early.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ddb2d57c-037b-48a8-b3ef-fcac29efdcfc
⛔ Files ignored due to path filters (2)
src/go.sumis excluded by!**/*.sumsrc/protos/io/defang/v1/fabric.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (6)
src/cmd/cli/command/estimate.gosrc/go.modsrc/pkg/cli/client/byoc/aws/byoc.gosrc/pkg/cli/client/byoc/gcp/byoc.gosrc/pkg/session/session.gosrc/protos/io/defang/v1/fabric.proto
✅ Files skipped from review due to trivial changes (2)
- src/pkg/cli/client/byoc/gcp/byoc.go
- src/pkg/cli/client/byoc/aws/byoc.go
🚧 Files skipped from review as they are similar to previous changes (3)
- src/cmd/cli/command/estimate.go
- src/go.mod
- src/protos/io/defang/v1/fabric.proto
There was a problem hiding this comment.
Actionable comments posted: 6
♻️ Duplicate comments (6)
src/pkg/clouds/azure/acr/tail.go (2)
92-94:⚠️ Potential issue | 🟠 MajorSurface unexpected log endpoint statuses instead of silently succeeding.
Line 92 currently treats non-200/206 responses as a no-op, which can hide real failures (e.g., expired SAS URL returning 403) and drop logs silently.
Proposed fix
if resp.StatusCode == http.StatusRequestedRangeNotSatisfiable { return offset, nil } if resp.StatusCode != http.StatusPartialContent && resp.StatusCode != http.StatusOK { - return offset, nil + bodyPreview, _ := io.ReadAll(io.LimitReader(resp.Body, 1024)) + err := fmt.Errorf("unexpected log response status: %s: %s", resp.Status, strings.TrimSpace(string(bodyPreview))) + if !yield("", err) { + return offset, errStopped + } + return offset, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/acr/tail.go` around lines 92 - 94, The current check in the function handling the blob log fetch (look for the resp.StatusCode check around resp, offset in src/pkg/clouds/azure/acr/tail.go) silently returns offset when the response is not 200 or 206; change this so unexpected statuses are surfaced: when resp.StatusCode is not http.StatusPartialContent and not http.StatusOK, read any response body for context and either log the status+body via the existing logger or return a descriptive error (including the status code and body) instead of returning offset(nil) so callers can detect and handle failures (e.g., expired SAS/403).
107-113:⚠️ Potential issue | 🔴 CriticalGuard empty chunks before indexing the last byte.
Line 112 can panic when
textis empty (index out of range).Proposed fix
text := string(body) + if len(text) == 0 { + return offset, nil + } lines := strings.Split(text, "\n")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/acr/tail.go` around lines 107 - 113, Guard against empty body before indexing the last byte: change the logic around variables text, lines and complete so you only access text[len(text)-1] when len(text) > 0 (e.g., if len(text) == 0 then set complete to an empty slice or lines and skip the byte check). Update the condition that determines complete (currently using text[len(text)-1] != '\n') to first check len(text) > 0, keeping variable names text, lines and complete to locate and modify the code in tail.go.src/pkg/clouds/azure/aci/blob.go (2)
23-36:⚠️ Potential issue | 🔴 CriticalPick a usable storage key instead of blindly dereferencing the first one.
Line 34 can panic when
ListKeysreturns no keys or the first entry has a nil/empty value. Iterate the returned keys and fail with a clear error when none are usable.🛡️ Suggested change
keys, err := accountsClient.ListKeys(ctx, c.resourceGroupName, c.StorageAccount, nil) if err != nil { return nil, err } - storageKey = *keys.Keys[0].Value + for _, key := range keys.Keys { + if key.Value != nil && *key.Value != "" { + storageKey = *key.Value + break + } + } + if storageKey == "" { + return nil, fmt.Errorf("no usable storage account keys returned for %q", c.StorageAccount) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/blob.go` around lines 23 - 36, The newSharedKeyCredential method currently assumes ListKeys returns at least one key and dereferences keys.Keys[0].Value which can panic; update ContainerInstance.newSharedKeyCredential to (1) check that keys and keys.Keys are non-nil and non-empty, (2) iterate over keys.Keys and select the first entry whose Value is non-nil and non-empty, and (3) return a clear error from newSharedKeyCredential if no usable key is found instead of dereferencing a nil/empty value; reference ListKeys and keys.Keys when making these checks and the azblob.NewSharedKeyCredential call.
81-91:⚠️ Potential issue | 🟠 MajorBound blob downloads instead of reading them fully into memory.
Line 91 pulls the entire remote blob into memory with no size limit. A large or unexpected object can OOM the CLI here; cap the read or stream the body to the caller instead.
💾 Suggested change
+const maxBlobDownloadBytes = 32 << 20 + // DownloadBlob fetches the contents of a blob by name. func (c *ContainerInstance) DownloadBlob(ctx context.Context, blobName string) ([]byte, error) { client, err := c.newBlobContainerClient(ctx) if err != nil { return nil, err @@ if err != nil { return nil, err } defer resp.Body.Close() - return io.ReadAll(resp.Body) + data, err := io.ReadAll(io.LimitReader(resp.Body, maxBlobDownloadBytes+1)) + if err != nil { + return nil, err + } + if len(data) > maxBlobDownloadBytes { + return nil, fmt.Errorf("blob %q exceeds %d bytes", blobName, maxBlobDownloadBytes) + } + return data, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/blob.go` around lines 81 - 91, DownloadBlob currently reads the entire blob into memory (resp.Body) which can OOM; change it to either stream the response or enforce a size cap. Two options: 1) Preferable: change DownloadBlob signature to return (io.ReadCloser, error), create client via newBlobContainerClient, call client.NewBlobClient(blobName).DownloadStream, and return resp.Body directly so callers stream the content. 2) If keeping []byte, introduce a constant (e.g. maxBlobDownloadBytes), wrap resp.Body with io.LimitReader and read via io.ReadAll, then error if the result equals the limit (indicating truncation). Update callers of ContainerInstance.DownloadBlob accordingly and ensure resp.Body is closed only when appropriate for streaming vs capped-read behavior.src/pkg/clouds/azure/aci/upload_test.go (1)
29-29:⚠️ Potential issue | 🟠 MajorRedact the SAS URL before logging it.
Line 29 prints the full upload URL, including its SAS query string. That leaks write credentials into test logs.
🔒 Suggested change
- t.Logf("Upload URL: %s", url) + t.Log("Upload URL generated")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/upload_test.go` at line 29, The test currently logs the full upload URL (t.Logf("Upload URL: %s", url)) which exposes the SAS query string; fix it by parsing the url value (use net/url.Parse), strip or replace the RawQuery (e.g., set RawQuery = "" or replace with "REDACTED"), and log the sanitized version (e.g., safeURL) instead of the original url so no SAS credentials appear in test logs.src/pkg/clouds/azure/aci/upload.go (1)
44-56:⚠️ Potential issue | 🔴 CriticalSelect a usable storage key instead of dereferencing
keys[0].Line 55 can panic when
ListKeysreturns no keys or the first key has a nil/empty value. Iterate the returned keys and fail with a descriptive error when none are usable.🛡️ Suggested change
keys, err := accountsClient.ListKeys(ctx, c.resourceGroupName, c.StorageAccount, nil) if err != nil { return "", err } - storageKey = *keys.Keys[0].Value // or [1]? + for _, key := range keys.Keys { + if key.Value != nil && *key.Value != "" { + storageKey = *key.Value + break + } + } + if storageKey == "" { + return "", fmt.Errorf("no usable storage account keys returned for %q", c.StorageAccount) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/upload.go` around lines 44 - 56, The code currently dereferences keys.Keys[0].Value which can panic if ListKeys returns zero keys or a nil/empty value; update the logic in the block that obtains storageKey (around storageKey declaration, c.NewStorageAccountsClient(), and accountsClient.ListKeys) to iterate over keys.Keys, pick the first non-nil/non-empty Value, assign it to storageKey, and return a descriptive error if no usable key is found (include context like c.resourceGroupName and c.StorageAccount in the error message).
🧹 Nitpick comments (10)
src/pkg/clouds/azure/acr/acr_test.go (1)
66-75: Log cleanup failures inTestRunTaskfor resource-leak diagnosability.This cleanup path currently swallows client/delete/poll errors, making leaked integration resources harder to investigate.
Proposed fix
t.Cleanup(func() { ctx := context.Background() client, err := acr.newRegistriesClient() if err != nil { + t.Logf("cleanup: failed to create client: %v", err) return } - poller, _ := client.BeginDelete(ctx, acr.resourceGroupName, registryName, nil) + poller, err := client.BeginDelete(ctx, acr.resourceGroupName, registryName, nil) + if err != nil { + t.Logf("cleanup: failed to delete registry: %v", err) + return + } if poller != nil { - _, _ = poller.PollUntilDone(ctx, nil) + if _, err := poller.PollUntilDone(ctx, nil); err != nil { + t.Logf("cleanup: poll failed: %v", err) + } } })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/acr/acr_test.go` around lines 66 - 75, The cleanup currently swallows errors; update the t.Cleanup closure to log any failures from acr.newRegistriesClient, client.BeginDelete and poller.PollUntilDone so leaked resources are diagnosable: call acr.newRegistriesClient() and if err != nil call t.Logf with a clear message including the error, then when calling client.BeginDelete(ctx, acr.resourceGroupName, registryName, nil) check its returned error and log it, and if poller is non-nil call poller.PollUntilDone(ctx, nil) and log any returned error as well (use t.Logf/t.Errorf to include the error values and context).src/pkg/http/put.go (1)
24-29: Clone the header map before attaching it to the request.Line 29 hands the request the caller’s header object verbatim.
net/http.Headeris amap[string][]stringand already exposesClone(), so cloning here avoids sharing mutable header state between the caller and the in-flight request. (pkg.go.dev)♻️ Suggested change
func PutWithHeader(ctx context.Context, url string, header http.Header, body io.Reader) (*http.Response, error) { req, err := http.NewRequestWithContext(ctx, http.MethodPut, url, body) if err != nil { return nil, err } - req.Header = header + req.Header = header.Clone() return DefaultClient.Do(req) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/http/put.go` around lines 24 - 29, The PutWithHeader function assigns the caller's http.Header map directly to req.Header which can share mutable state; update PutWithHeader to clone the provided header before attaching it (use header.Clone()) so the request gets its own copy and the caller's header map isn't shared with the in-flight request.src/pkg/clouds/azure/aci/setup.go (2)
108-133: Large block of commented-out code should be removed or documented.There's a significant block of commented-out code (lines 108-133) related to permission assignment and key regeneration. If this is work-in-progress, consider adding a TODO comment explaining the intent. Otherwise, remove dead code to improve maintainability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/setup.go` around lines 108 - 133, The large commented-out block handling permission assignment and key regeneration (references: getCurrentUserObjectID, assignBlobDataContributor, NewStorageAccountsClient, ListKeys, RegenerateKey, resourceGroupName, storageAccount) should be removed or annotated: if this is WIP, replace the commented code with a single TODO comment explaining the planned work and linking any relevant issue/PR; otherwise delete the dead code to keep setup.go clean and maintainable. Ensure any removed logic is preserved in the PR description or an issue so it can be restored if needed.
52-53: TODO comment indicates incomplete teardown logic.The
TearDownfunction has a TODO about deleting the storage account. This should either be implemented or tracked as a separate issue to ensure complete cleanup of Azure resources.Would you like me to open an issue to track the storage account cleanup implementation?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/setup.go` around lines 52 - 53, The TODO in TearDown indicates the storage account isn't being cleaned up; update the TearDown method in src/pkg/clouds/azure/aci/setup.go to either implement deletion of the Azure storage account tied to this deployment (use the existing Azure client/credentials used elsewhere in this file to locate and delete the storage account resource) or, if deletion must be deferred, replace the TODO with a clear TODO comment and create a tracked issue (include the resource name/ID logic and steps required) and add a reference to that issue in the comment so resource cleanup is not forgotten.src/pkg/cli/client/byoc/azure/byoc.go (4)
461-516: Channel nil-check placement aftercontinuemay delay termination.When a channel closes and is set to nil (lines 466, 490), the code uses
continueto restart the loop. This skips the termination check at lines 512-514, meaning the loop will iterate once more before checking if both channels are nil. While not a bug, refactoring to check immediately after setting a channel to nil would be clearer.♻️ Clearer termination check
case entry, ok := <-cdCh: if !ok { cdCh = nil - continue + if acaCh == nil { + return + } + continue }Apply similar pattern for
acaCh.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/client/byoc/azure/byoc.go` around lines 461 - 516, The loop should terminate immediately when both channels are closed instead of relying on the loop head after a continue; after you set cdCh = nil (in the cdCh case) and after you set acaCh = nil (in the acaCh case) add an immediate check if cdCh == nil && acaCh == nil and return if so (or replace the continue with that check + return), so the anonymous tailing function (the func returning from the caller) exits right away when both channels are closed; apply this change for both cdCh and acaCh handling in the yielding loop.
319-321: Status code check may be overly strict or include unexpected codes.Azure Blob Storage "Put Blob" operation typically returns
201 Createdon success. The check includes200which is unusual for this operation. Consider whether201alone is sufficient, or add a comment explaining when200is expected.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/client/byoc/azure/byoc.go` around lines 319 - 321, The status-code check in byoc.go currently treats both 200 and 201 as success (checking resp.StatusCode), but Azure Put Blob returns 201 Created; update the check to accept 201 only (e.g., if resp.StatusCode != 201 return error) and add a brief comment at that check explaining that Put Blob should return 201 and why 200 is not expected (or explicitly note any special-case where 200 may be acceptable) so future readers understand the choice; reference the resp.StatusCode/resp.Status validation in the upload path in byoc.go when making the change.
354-360: All errors wrapped as deployment failure may mask transient issues.
GetDeploymentStatuswraps all errors fromGetContainerGroupStatusasErrDeploymentFailed. This could mask transient network errors or rate limiting that might succeed on retry. Consider distinguishing between permanent failures (container exited with error) and transient errors.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/client/byoc/azure/byoc.go` around lines 354 - 360, GetDeploymentStatus currently converts any error from driver.GetContainerGroupStatus into client.ErrDeploymentFailed which hides transient issues; update ByocAzure.GetDeploymentStatus to distinguish transient vs permanent failures by inspecting the returned error or status from b.driver.GetContainerGroupStatus: if the driver returns a sentinel/transient error (e.g., network/timeouts, rate-limit type or a driver-specific transient error type), return that error unchanged (or wrap it with context but preserve its type) so callers can retry, but only wrap errors as client.ErrDeploymentFailed when the container group status explicitly indicates a permanent deployment failure (use b.cdContainerGroup or the status value returned); use errors.As / errors.Is to detect known transient driver errors and document the behavior in the function comment.
233-234: Sensitive values (Pulumi backend/passphrase) passed as plain environment variables.The Pulumi backend credentials and config passphrase are passed as plain environment variables (lines 233-234). The TODO comments acknowledge this. Consider using Azure Key Vault references or secure environment variable injection before production use.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/client/byoc/azure/byoc.go` around lines 233 - 234, The Pulumi backend and passphrase are being injected as plain env vars (pulumiBackendKey/pulumiBackendValue and "PULUMI_CONFIG_PASSPHRASE" using byoc.PulumiConfigPassphrase); replace this with secure secret retrieval instead of hard-setting env values: fetch the backend URL and passphrase from a secure store (e.g., Azure Key Vault or your secret manager) at runtime and populate them as secrets (or use Key Vault references / secret provider) so they are not exposed in process environment or source; update the initialization in byoc.go to use the secure-secret getter for pulumiBackendValue and byoc.PulumiConfigPassphrase and remove plain-text assignment (honor the existing TODO to make them secret).src/pkg/clouds/azure/aca/common.go (2)
61-64: URL path segments are not encoded.
appName,ResourceGroup, and other values are interpolated directly into URLs without URL encoding. While Azure resource names have naming restrictions that typically prevent problematic characters, usingurl.PathEscapewould be more defensive.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aca/common.go` around lines 61 - 64, The URL is built by interpolating values directly into fmt.Sprintf, leaving path segments like appName, c.ResourceGroup, c.SubscriptionID and apiVersion unescaped; replace those direct interpolations with url.PathEscape(...) calls (e.g., url.PathEscape(appName), url.PathEscape(c.ResourceGroup), url.PathEscape(c.SubscriptionID), url.PathEscape(apiVersion)) when constructing the string in the URL assignment so all path segments are safely encoded before being inserted.
71-74:http.DefaultClienthas no timeout, risking indefinite hangs.Direct REST calls using
http.DefaultClient.Dohave no timeout configured. If the Azure management endpoint becomes unresponsive, this could block indefinitely. Consider using a client with a configured timeout or deriving deadlines from the context.♻️ Suggested approach
+var httpClient = &http.Client{Timeout: 30 * time.Second} + func (c *ContainerApp) getAuthToken(ctx context.Context, appName string) (string, error) { // ... - resp, err := http.DefaultClient.Do(req) + resp, err := httpClient.Do(req)Apply the same change to
getEventStreamBase. Alternatively, usehttp.NewRequestWithContextcombined with context deadline propagation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aca/common.go` around lines 71 - 74, The call to http.DefaultClient.Do in the function that performs the Azure REST request should not use the default client (which has no timeout); update the code in the function (and similarly in getEventStreamBase) to create and use an http.Client with a sensible Timeout or use http.NewRequestWithContext and pass a context with deadline derived from the incoming ctx so the request respects cancellation/deadlines; ensure you replace http.DefaultClient.Do(req) with client.Do(req) (or use req = req.WithContext(ctx)) and propagate errors unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/pkg/clouds/azure/aca/common.go`:
- Around line 153-155: The code currently dereferences SDK pointers without nil
checks (e.g., app.Properties.Template.Containers[0].Name, list.Value[0].Name,
list.Value[0].Properties.Containers[0].Name); update the logic to defensively
validate each nullable field and slice before dereferencing: check that app,
app.Properties, app.Properties.Template, and Containers are non-nil and have
length>0 and that Container.Name is non-nil before using *Name; similarly check
list.Value length, list.Value[0].Properties non-nil, its Containers length, and
their Name pointers before dereferencing; if any check fails, use a safe
fallback (empty string or skip assignment) or return an explicit error.
In `@src/pkg/clouds/azure/aca/tail.go`:
- Around line 32-52: The code marks tailed apps in the known map and never
removes them, so when c.StreamLogs(...) returns the app is never retried; modify
startTailing (and the other tailing site around the similar 67-71 block) to
remove the app from known when the goroutine exits: add a sync.Mutex (or switch
known to a sync.Map) to protect concurrent access, lock around reads/writes to
known, and defer a deletion like defer func(){ mutex.Lock(); delete(known,
appName); mutex.Unlock() }() at the start of the goroutine so the app can be
retried after StreamLogs() ends (do this both on error and normal exit).
- Around line 54-64: The poll() closure currently swallows failures from
c.newContainerAppsClient() and pager.NextPage(ctx); change it to surface those
errors instead of silently returning by (1) logging the error with context
(include the error value and which call failed) and (2) propagating the failure
to the outer watch loop so the caller can react (e.g., send the error on the
existing watch/error channel or cancel the watch context). Update the code paths
around poll(), newContainerAppsClient(), NewListByResourceGroupPager, and
pager.NextPage(ctx) to ensure errors are logged and forwarded (or cause the
watcher to stop) rather than being ignored.
In `@src/pkg/clouds/azure/aci/setup.go`:
- Around line 139-143: The blob container creation can error if the container
already exists; update the setUp flow around containerClient.Create to treat the
specific "ContainerAlreadyExists" error as success: after calling
containerClient.Create, if err != nil inspect the returned error for Azure's
container-already-exists error code (e.g., "ContainerAlreadyExists" or check
azcore/armstorage error code) and only return a failure for other errors,
otherwise proceed to set c.BlobContainerName = *container.Name (or if container
is nil when existing, set c.BlobContainerName from the requested
blobContainerName) so repeated setUp runs are idempotent.
In `@src/pkg/clouds/azure/aci/upload.go`:
- Around line 32-34: The SAS token currently sets StartTime using the local
variable now (and computes expiry), which can cause 403s for immediate PUTs;
update the SAS creation in upload.go to omit StartTime (or set it to a short
time in the past, e.g., now.Add(-15*time.Minute)) so the generated SAS is valid
immediately for the upload used by upload_test; adjust the code that uses the
now and expiry variables and the StartTime field when building the SAS (the
SAS-generation block that assigns StartTime) accordingly.
In `@src/pkg/clouds/azure/acr/acr_test.go`:
- Line 101: The test currently logs the full SAS URL stored in logURL via t.Logf
which can expose credentials; change the test to avoid printing query parameters
— e.g., parse logURL, strip or mask the RawQuery before logging, or log only the
hostname/path or a fixed placeholder instead of the full URL; update the t.Logf
usage (the call referencing logURL) to output the redacted or partial URL so no
SAS token appears in CI logs.
---
Duplicate comments:
In `@src/pkg/clouds/azure/aci/blob.go`:
- Around line 23-36: The newSharedKeyCredential method currently assumes
ListKeys returns at least one key and dereferences keys.Keys[0].Value which can
panic; update ContainerInstance.newSharedKeyCredential to (1) check that keys
and keys.Keys are non-nil and non-empty, (2) iterate over keys.Keys and select
the first entry whose Value is non-nil and non-empty, and (3) return a clear
error from newSharedKeyCredential if no usable key is found instead of
dereferencing a nil/empty value; reference ListKeys and keys.Keys when making
these checks and the azblob.NewSharedKeyCredential call.
- Around line 81-91: DownloadBlob currently reads the entire blob into memory
(resp.Body) which can OOM; change it to either stream the response or enforce a
size cap. Two options: 1) Preferable: change DownloadBlob signature to return
(io.ReadCloser, error), create client via newBlobContainerClient, call
client.NewBlobClient(blobName).DownloadStream, and return resp.Body directly so
callers stream the content. 2) If keeping []byte, introduce a constant (e.g.
maxBlobDownloadBytes), wrap resp.Body with io.LimitReader and read via
io.ReadAll, then error if the result equals the limit (indicating truncation).
Update callers of ContainerInstance.DownloadBlob accordingly and ensure
resp.Body is closed only when appropriate for streaming vs capped-read behavior.
In `@src/pkg/clouds/azure/aci/upload_test.go`:
- Line 29: The test currently logs the full upload URL (t.Logf("Upload URL: %s",
url)) which exposes the SAS query string; fix it by parsing the url value (use
net/url.Parse), strip or replace the RawQuery (e.g., set RawQuery = "" or
replace with "REDACTED"), and log the sanitized version (e.g., safeURL) instead
of the original url so no SAS credentials appear in test logs.
In `@src/pkg/clouds/azure/aci/upload.go`:
- Around line 44-56: The code currently dereferences keys.Keys[0].Value which
can panic if ListKeys returns zero keys or a nil/empty value; update the logic
in the block that obtains storageKey (around storageKey declaration,
c.NewStorageAccountsClient(), and accountsClient.ListKeys) to iterate over
keys.Keys, pick the first non-nil/non-empty Value, assign it to storageKey, and
return a descriptive error if no usable key is found (include context like
c.resourceGroupName and c.StorageAccount in the error message).
In `@src/pkg/clouds/azure/acr/tail.go`:
- Around line 92-94: The current check in the function handling the blob log
fetch (look for the resp.StatusCode check around resp, offset in
src/pkg/clouds/azure/acr/tail.go) silently returns offset when the response is
not 200 or 206; change this so unexpected statuses are surfaced: when
resp.StatusCode is not http.StatusPartialContent and not http.StatusOK, read any
response body for context and either log the status+body via the existing logger
or return a descriptive error (including the status code and body) instead of
returning offset(nil) so callers can detect and handle failures (e.g., expired
SAS/403).
- Around line 107-113: Guard against empty body before indexing the last byte:
change the logic around variables text, lines and complete so you only access
text[len(text)-1] when len(text) > 0 (e.g., if len(text) == 0 then set complete
to an empty slice or lines and skip the byte check). Update the condition that
determines complete (currently using text[len(text)-1] != '\n') to first check
len(text) > 0, keeping variable names text, lines and complete to locate and
modify the code in tail.go.
---
Nitpick comments:
In `@src/pkg/cli/client/byoc/azure/byoc.go`:
- Around line 461-516: The loop should terminate immediately when both channels
are closed instead of relying on the loop head after a continue; after you set
cdCh = nil (in the cdCh case) and after you set acaCh = nil (in the acaCh case)
add an immediate check if cdCh == nil && acaCh == nil and return if so (or
replace the continue with that check + return), so the anonymous tailing
function (the func returning from the caller) exits right away when both
channels are closed; apply this change for both cdCh and acaCh handling in the
yielding loop.
- Around line 319-321: The status-code check in byoc.go currently treats both
200 and 201 as success (checking resp.StatusCode), but Azure Put Blob returns
201 Created; update the check to accept 201 only (e.g., if resp.StatusCode !=
201 return error) and add a brief comment at that check explaining that Put Blob
should return 201 and why 200 is not expected (or explicitly note any
special-case where 200 may be acceptable) so future readers understand the
choice; reference the resp.StatusCode/resp.Status validation in the upload path
in byoc.go when making the change.
- Around line 354-360: GetDeploymentStatus currently converts any error from
driver.GetContainerGroupStatus into client.ErrDeploymentFailed which hides
transient issues; update ByocAzure.GetDeploymentStatus to distinguish transient
vs permanent failures by inspecting the returned error or status from
b.driver.GetContainerGroupStatus: if the driver returns a sentinel/transient
error (e.g., network/timeouts, rate-limit type or a driver-specific transient
error type), return that error unchanged (or wrap it with context but preserve
its type) so callers can retry, but only wrap errors as
client.ErrDeploymentFailed when the container group status explicitly indicates
a permanent deployment failure (use b.cdContainerGroup or the status value
returned); use errors.As / errors.Is to detect known transient driver errors and
document the behavior in the function comment.
- Around line 233-234: The Pulumi backend and passphrase are being injected as
plain env vars (pulumiBackendKey/pulumiBackendValue and
"PULUMI_CONFIG_PASSPHRASE" using byoc.PulumiConfigPassphrase); replace this with
secure secret retrieval instead of hard-setting env values: fetch the backend
URL and passphrase from a secure store (e.g., Azure Key Vault or your secret
manager) at runtime and populate them as secrets (or use Key Vault references /
secret provider) so they are not exposed in process environment or source;
update the initialization in byoc.go to use the secure-secret getter for
pulumiBackendValue and byoc.PulumiConfigPassphrase and remove plain-text
assignment (honor the existing TODO to make them secret).
In `@src/pkg/clouds/azure/aca/common.go`:
- Around line 61-64: The URL is built by interpolating values directly into
fmt.Sprintf, leaving path segments like appName, c.ResourceGroup,
c.SubscriptionID and apiVersion unescaped; replace those direct interpolations
with url.PathEscape(...) calls (e.g., url.PathEscape(appName),
url.PathEscape(c.ResourceGroup), url.PathEscape(c.SubscriptionID),
url.PathEscape(apiVersion)) when constructing the string in the URL assignment
so all path segments are safely encoded before being inserted.
- Around line 71-74: The call to http.DefaultClient.Do in the function that
performs the Azure REST request should not use the default client (which has no
timeout); update the code in the function (and similarly in getEventStreamBase)
to create and use an http.Client with a sensible Timeout or use
http.NewRequestWithContext and pass a context with deadline derived from the
incoming ctx so the request respects cancellation/deadlines; ensure you replace
http.DefaultClient.Do(req) with client.Do(req) (or use req =
req.WithContext(ctx)) and propagate errors unchanged.
In `@src/pkg/clouds/azure/aci/setup.go`:
- Around line 108-133: The large commented-out block handling permission
assignment and key regeneration (references: getCurrentUserObjectID,
assignBlobDataContributor, NewStorageAccountsClient, ListKeys, RegenerateKey,
resourceGroupName, storageAccount) should be removed or annotated: if this is
WIP, replace the commented code with a single TODO comment explaining the
planned work and linking any relevant issue/PR; otherwise delete the dead code
to keep setup.go clean and maintainable. Ensure any removed logic is preserved
in the PR description or an issue so it can be restored if needed.
- Around line 52-53: The TODO in TearDown indicates the storage account isn't
being cleaned up; update the TearDown method in
src/pkg/clouds/azure/aci/setup.go to either implement deletion of the Azure
storage account tied to this deployment (use the existing Azure
client/credentials used elsewhere in this file to locate and delete the storage
account resource) or, if deletion must be deferred, replace the TODO with a
clear TODO comment and create a tracked issue (include the resource name/ID
logic and steps required) and add a reference to that issue in the comment so
resource cleanup is not forgotten.
In `@src/pkg/clouds/azure/acr/acr_test.go`:
- Around line 66-75: The cleanup currently swallows errors; update the t.Cleanup
closure to log any failures from acr.newRegistriesClient, client.BeginDelete and
poller.PollUntilDone so leaked resources are diagnosable: call
acr.newRegistriesClient() and if err != nil call t.Logf with a clear message
including the error, then when calling client.BeginDelete(ctx,
acr.resourceGroupName, registryName, nil) check its returned error and log it,
and if poller is non-nil call poller.PollUntilDone(ctx, nil) and log any
returned error as well (use t.Logf/t.Errorf to include the error values and
context).
In `@src/pkg/http/put.go`:
- Around line 24-29: The PutWithHeader function assigns the caller's http.Header
map directly to req.Header which can share mutable state; update PutWithHeader
to clone the provided header before attaching it (use header.Clone()) so the
request gets its own copy and the caller's header map isn't shared with the
in-flight request.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 36b30cca-2bda-44f9-901e-9b7b665094c4
⛔ Files ignored due to path filters (2)
src/go.sumis excluded by!**/*.sumsrc/protos/io/defang/v1/fabric.pb.gois excluded by!**/*.pb.go
📒 Files selected for processing (46)
flake.nixpkgs/defang/cli.nixpkgs/npm/README.mdsrc/README.mdsrc/cmd/cli/command/estimate.gosrc/go.modsrc/pkg/agent/tools/estimate_test.gosrc/pkg/cli/client/byoc/aws/byoc.gosrc/pkg/cli/client/byoc/azure/byoc.gosrc/pkg/cli/client/byoc/azure/login.gosrc/pkg/cli/client/byoc/gcp/byoc.gosrc/pkg/cli/client/provider_id.gosrc/pkg/cli/client/region.gosrc/pkg/cli/compose/context.gosrc/pkg/cli/connect.gosrc/pkg/cli/tailAndMonitor.gosrc/pkg/clouds/azure/aca/common.gosrc/pkg/clouds/azure/aca/tail.gosrc/pkg/clouds/azure/aci/blob.gosrc/pkg/clouds/azure/aci/common.gosrc/pkg/clouds/azure/aci/common_test.gosrc/pkg/clouds/azure/aci/run.gosrc/pkg/clouds/azure/aci/run_test.gosrc/pkg/clouds/azure/aci/setup.gosrc/pkg/clouds/azure/aci/setup_test.gosrc/pkg/clouds/azure/aci/status.gosrc/pkg/clouds/azure/aci/stop.gosrc/pkg/clouds/azure/aci/tail.gosrc/pkg/clouds/azure/aci/tail_test.gosrc/pkg/clouds/azure/aci/upload.gosrc/pkg/clouds/azure/aci/upload_test.gosrc/pkg/clouds/azure/acr/acr.gosrc/pkg/clouds/azure/acr/acr_test.gosrc/pkg/clouds/azure/acr/tail.gosrc/pkg/clouds/azure/acr/types.gosrc/pkg/clouds/azure/acr/types_test.gosrc/pkg/clouds/azure/appcfg/appcfg.gosrc/pkg/clouds/azure/common.gosrc/pkg/clouds/azure/location.gosrc/pkg/http/post.gosrc/pkg/http/put.gosrc/pkg/session/session.gosrc/pkg/stacks/selector_test.gosrc/pkg/stacks/wizard.gosrc/pkg/utils.gosrc/protos/io/defang/v1/fabric.proto
✅ Files skipped from review due to trivial changes (13)
- src/README.md
- pkgs/npm/README.md
- src/pkg/cli/client/byoc/gcp/byoc.go
- src/pkg/agent/tools/estimate_test.go
- src/pkg/cli/tailAndMonitor.go
- pkgs/defang/cli.nix
- src/pkg/cli/client/byoc/aws/byoc.go
- src/pkg/clouds/azure/acr/types_test.go
- src/pkg/cli/client/byoc/azure/login.go
- src/cmd/cli/command/estimate.go
- src/pkg/clouds/azure/acr/types.go
- src/pkg/clouds/azure/aci/status.go
- src/pkg/clouds/azure/location.go
🚧 Files skipped from review as they are similar to previous changes (17)
- src/pkg/session/session.go
- src/pkg/cli/client/region.go
- src/pkg/clouds/azure/aci/common_test.go
- src/pkg/stacks/wizard.go
- src/pkg/stacks/selector_test.go
- src/go.mod
- src/pkg/clouds/azure/aci/setup_test.go
- src/pkg/clouds/azure/aci/stop.go
- src/pkg/clouds/azure/aci/run_test.go
- src/pkg/cli/client/provider_id.go
- src/pkg/cli/connect.go
- src/pkg/http/post.go
- src/pkg/clouds/azure/aci/tail_test.go
- src/pkg/cli/compose/context.go
- src/pkg/clouds/azure/aci/run.go
- src/pkg/clouds/azure/common.go
- src/pkg/clouds/azure/acr/acr.go
| known := map[string]struct{}{} | ||
|
|
||
| startTailing := func(appName string) { | ||
| go func() { | ||
| appCh, err := c.StreamLogs(ctx, appName, "", "", "", true) | ||
| if err != nil { | ||
| select { | ||
| case out <- ServiceLogEntry{AppName: appName, LogEntry: LogEntry{Err: err}}: | ||
| case <-ctx.Done(): | ||
| } | ||
| return | ||
| } | ||
| for entry := range appCh { | ||
| select { | ||
| case out <- ServiceLogEntry{AppName: appName, LogEntry: entry}: | ||
| case <-ctx.Done(): | ||
| return | ||
| } | ||
| } | ||
| }() | ||
| } |
There was a problem hiding this comment.
Allow a tailed app to be retried after the stream exits.
Once an app name is inserted into known, it is never removed. If StreamLogs() returns because the SSE connection drops, the replica rolls, or the stream ends for any other transient reason, later polls will skip that app forever and follow mode permanently loses its logs.
Also applies to: 67-71
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/pkg/clouds/azure/aca/tail.go` around lines 32 - 52, The code marks tailed
apps in the known map and never removes them, so when c.StreamLogs(...) returns
the app is never retried; modify startTailing (and the other tailing site around
the similar 67-71 block) to remove the app from known when the goroutine exits:
add a sync.Mutex (or switch known to a sync.Map) to protect concurrent access,
lock around reads/writes to known, and defer a deletion like defer func(){
mutex.Lock(); delete(known, appName); mutex.Unlock() }() at the start of the
goroutine so the app can be retried after StreamLogs() ends (do this both on
error and normal exit).
| poll := func() { | ||
| client, err := c.newContainerAppsClient() | ||
| if err != nil { | ||
| return | ||
| } | ||
| pager := client.NewListByResourceGroupPager(c.ResourceGroup, nil) | ||
| for pager.More() { | ||
| page, err := pager.NextPage(ctx) | ||
| if err != nil { | ||
| return | ||
| } |
There was a problem hiding this comment.
Don’t silently swallow discovery failures.
If newContainerAppsClient() or NextPage() fails here, poll() just returns and the watch loop keeps running with no indication that app discovery is broken. In follow mode that leaves the caller with an apparently healthy stream that never picks up Container App logs.
🧭 Suggested change
poll := func() {
client, err := c.newContainerAppsClient()
if err != nil {
+ select {
+ case out <- ServiceLogEntry{LogEntry: LogEntry{Err: fmt.Errorf("list container apps client: %w", err)}}:
+ case <-ctx.Done():
+ }
return
}
pager := client.NewListByResourceGroupPager(c.ResourceGroup, nil)
for pager.More() {
page, err := pager.NextPage(ctx)
if err != nil {
+ select {
+ case out <- ServiceLogEntry{LogEntry: LogEntry{Err: fmt.Errorf("list container apps page: %w", err)}}:
+ case <-ctx.Done():
+ }
return
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| poll := func() { | |
| client, err := c.newContainerAppsClient() | |
| if err != nil { | |
| return | |
| } | |
| pager := client.NewListByResourceGroupPager(c.ResourceGroup, nil) | |
| for pager.More() { | |
| page, err := pager.NextPage(ctx) | |
| if err != nil { | |
| return | |
| } | |
| poll := func() { | |
| client, err := c.newContainerAppsClient() | |
| if err != nil { | |
| select { | |
| case out <- ServiceLogEntry{LogEntry: LogEntry{Err: fmt.Errorf("list container apps client: %w", err)}}: | |
| case <-ctx.Done(): | |
| } | |
| return | |
| } | |
| pager := client.NewListByResourceGroupPager(c.ResourceGroup, nil) | |
| for pager.More() { | |
| page, err := pager.NextPage(ctx) | |
| if err != nil { | |
| select { | |
| case out <- ServiceLogEntry{LogEntry: LogEntry{Err: fmt.Errorf("list container apps page: %w", err)}}: | |
| case <-ctx.Done(): | |
| } | |
| return | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/pkg/clouds/azure/aca/tail.go` around lines 54 - 64, The poll() closure
currently swallows failures from c.newContainerAppsClient() and
pager.NextPage(ctx); change it to surface those errors instead of silently
returning by (1) logging the error with context (include the error value and
which call failed) and (2) propagating the failure to the outer watch loop so
the caller can react (e.g., send the error on the existing watch/error channel
or cancel the watch context). Update the code paths around poll(),
newContainerAppsClient(), NewListByResourceGroupPager, and pager.NextPage(ctx)
to ensure errors are logged and forwarded (or cause the watcher to stop) rather
than being ignored.
| if err != nil { | ||
| t.Fatalf("GetRunLogURL failed: %v", err) | ||
| } | ||
| t.Logf("Log URL: %s", logURL) |
There was a problem hiding this comment.
Do not log full SAS log URLs.
Line 101 logs the full logURL, which includes the SAS token in query parameters and can leak credentials in CI logs.
Proposed fix
- t.Logf("Log URL: %s", logURL)
+ t.Log("Log URL acquired")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| t.Logf("Log URL: %s", logURL) | |
| t.Log("Log URL acquired") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/pkg/clouds/azure/acr/acr_test.go` at line 101, The test currently logs
the full SAS URL stored in logURL via t.Logf which can expose credentials;
change the test to avoid printing query parameters — e.g., parse logURL, strip
or mask the RawQuery before logging, or log only the hostname/path or a fixed
placeholder instead of the full URL; update the t.Logf usage (the call
referencing logURL) to output the redacted or partial URL so no SAS token
appears in CI logs.
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
src/pkg/clouds/azure/aci/upload_test.go (1)
30-30: Minor: Non-standard Content-Type value.
"application/text"is not a registered MIME type. While Azure Blob Storage may accept it, the standard type is"text/plain".♻️ Suggested fix
- header := http.Header{"Content-Type": []string{"application/text"}} + header := http.Header{"Content-Type": []string{"text/plain"}}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/upload_test.go` at line 30, The test uses a non-standard MIME type string in the header variable (header := http.Header{"Content-Type": []string{"application/text"}}) — change it to the standard "text/plain" value; update the header declaration in src/pkg/clouds/azure/aci/upload_test.go and any assertions or mocks in the same test that expect "application/text" so they now expect "text/plain".src/pkg/clouds/azure/aci/setup.go (1)
38-54: TODO: Storage account deletion not implemented in TearDown.The TearDown function only deletes the resource group, but the storage account may have been created outside the resource group scope. The TODO on line 52-53 indicates this is a known gap.
Would you like me to help implement storage account cleanup, or open an issue to track this task?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/azure/aci/setup.go` around lines 38 - 54, TearDown currently only removes the resource group but skips deleting the storage account; update ContainerInstance.TearDown to also delete the storage account when present by checking the instance fields that hold the storage account identity (e.g. c.storageAccountName and any storage account resource group field), create the storage accounts client (use the existing helper like c.newStorageAccountsClient or add one if missing), call BeginDelete(ctx, accountName) and PollUntilDone(ctx, nil), and return/report any errors; keep the resource-group deletion logic and only attempt storage account deletion when the storage account name is non-empty so we don't error for cases where no account was created.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/pkg/clouds/azure/aci/run_test.go`:
- Around line 24-29: The test registers a t.Cleanup but the actual teardown call
to containerInstance.TearDown(ctx) is commented out, which can leave Azure
resources orphaned; either uncomment and invoke containerInstance.TearDown(ctx)
inside the t.Cleanup and handle any error by calling t.Errorf("Failed to tear
down container instance: %v", err) (not t.Fatalf), or if teardown is
intentionally skipped for debugging, add a clear comment above the t.Cleanup
explaining why it’s omitted and how/when to re-enable it; locate the cleanup
block referencing containerInstance.TearDown and update accordingly.
In `@src/pkg/clouds/azure/acr/tail_test.go`:
- Around line 59-68: The test is asserting behavior for the old buggy
Split/offset logic; after fixing tail.go to ignore the trailing empty element
from strings.Split, update the test assertions: set expectedOffset to 18 (actual
byte length) instead of int64(len(body))+1 and assert len(yielded) == 3 (three
real lines) instead of 4; change the two t.Errorf checks that compare
expectedOffset/newOffset and len(yielded) to use these new expected values so
the test matches the corrected logic in tail.go.
In `@src/pkg/clouds/azure/acr/tail.go`:
- Around line 43-54: In the terminal-status branch in the function where
status.IsTerminal() is handled, stop ignoring the return values from
readLogChunk(ctx, logURL, offset, yield); capture its (offset, err) result and
if err indicates the consumer stopped (e.g., err == errStopped or a boolean
false from yield), return immediately so you do not call yield again with the
final failure message for runID; otherwise continue building the msg and call
yield as before. Ensure you reference readLogChunk and runID in the fix and
propagate/handle the err return to avoid violating the iterator contract.
- Around line 114-127: The offset calculation overcounts when text ends with
'\n' because strings.Split creates a trailing empty element; update the logic
around lines/complete to safely handle empty text and drop the trailing empty
element when text[len(text)-1] == '\n' instead of treating it as a real line.
Concretely: check len(text)>0 before indexing, set complete = lines; if
text[len(text)-1] == '\n' && len(lines) > 0 && lines[len(lines)-1] == "" {
complete = lines[:len(lines)-1] } else if len(lines) > 0 { complete =
lines[:len(lines)-1] } so offset increments (offset += int64(len(line)+1)) and
yield(line, nil) remain correct without producing the off-by-one.
---
Nitpick comments:
In `@src/pkg/clouds/azure/aci/setup.go`:
- Around line 38-54: TearDown currently only removes the resource group but
skips deleting the storage account; update ContainerInstance.TearDown to also
delete the storage account when present by checking the instance fields that
hold the storage account identity (e.g. c.storageAccountName and any storage
account resource group field), create the storage accounts client (use the
existing helper like c.newStorageAccountsClient or add one if missing), call
BeginDelete(ctx, accountName) and PollUntilDone(ctx, nil), and return/report any
errors; keep the resource-group deletion logic and only attempt storage account
deletion when the storage account name is non-empty so we don't error for cases
where no account was created.
In `@src/pkg/clouds/azure/aci/upload_test.go`:
- Line 30: The test uses a non-standard MIME type string in the header variable
(header := http.Header{"Content-Type": []string{"application/text"}}) — change
it to the standard "text/plain" value; update the header declaration in
src/pkg/clouds/azure/aci/upload_test.go and any assertions or mocks in the same
test that expect "application/text" so they now expect "text/plain".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 78fa2c15-9753-402e-b452-4a391db2709b
📒 Files selected for processing (12)
src/pkg/clouds/azure/aca/common.gosrc/pkg/clouds/azure/aci/blob.gosrc/pkg/clouds/azure/aci/run.gosrc/pkg/clouds/azure/aci/run_test.gosrc/pkg/clouds/azure/aci/setup.gosrc/pkg/clouds/azure/aci/stop.gosrc/pkg/clouds/azure/aci/tail.gosrc/pkg/clouds/azure/aci/tail_test.gosrc/pkg/clouds/azure/aci/upload.gosrc/pkg/clouds/azure/aci/upload_test.gosrc/pkg/clouds/azure/acr/tail.gosrc/pkg/clouds/azure/acr/tail_test.go
✅ Files skipped from review due to trivial changes (1)
- src/pkg/clouds/azure/aca/common.go
🚧 Files skipped from review as they are similar to previous changes (4)
- src/pkg/clouds/azure/aci/stop.go
- src/pkg/clouds/azure/aci/tail_test.go
- src/pkg/clouds/azure/aci/upload.go
- src/pkg/clouds/azure/aci/blob.go
| t.Cleanup(func() { | ||
| // err := containerInstance.TearDown(ctx) | ||
| // if err != nil { | ||
| // t.Fatalf("Failed to tear down container instance: %v", err) | ||
| // } | ||
| }) |
There was a problem hiding this comment.
Commented-out teardown may leave orphaned Azure resources.
The cleanup callback registers but doesn't actually tear down the resource group. This will leave Azure resources behind after test runs, potentially incurring costs.
Consider either:
- Uncommenting the teardown code
- Adding a comment explaining why teardown is intentionally skipped (e.g., for debugging, shared test resources)
🧹 Proposed fix to enable cleanup
t.Cleanup(func() {
- // err := containerInstance.TearDown(ctx)
- // if err != nil {
- // t.Fatalf("Failed to tear down container instance: %v", err)
- // }
+ if err := containerInstance.TearDown(context.Background()); err != nil {
+ t.Errorf("Failed to tear down container instance: %v", err)
+ }
})Note: Use t.Errorf instead of t.Fatalf in cleanup since t.Fatalf may not work correctly in cleanup functions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/pkg/clouds/azure/aci/run_test.go` around lines 24 - 29, The test
registers a t.Cleanup but the actual teardown call to
containerInstance.TearDown(ctx) is commented out, which can leave Azure
resources orphaned; either uncomment and invoke containerInstance.TearDown(ctx)
inside the t.Cleanup and handle any error by calling t.Errorf("Failed to tear
down container instance: %v", err) (not t.Fatalf), or if teardown is
intentionally skipped for debugging, add a clear comment above the t.Cleanup
explaining why it’s omitted and how/when to re-enable it; locate the cleanup
block referencing containerInstance.TearDown and update accordingly.
| // strings.Split on a trailing-newline body produces one extra empty element, | ||
| // each element (including the empty one) adds len(line)+1 to the offset. | ||
| expectedOffset := int64(len(body)) + 1 // trailing empty element adds +1 | ||
| if newOffset != expectedOffset { | ||
| t.Errorf("expected offset %d, got %d", expectedOffset, newOffset) | ||
| } | ||
| // 3 real lines + 1 empty string from the trailing newline split | ||
| if len(yielded) != 4 { | ||
| t.Errorf("expected 4 yielded elements, got %d: %v", len(yielded), yielded) | ||
| } |
There was a problem hiding this comment.
Test validates buggy offset calculation.
This test documents the off-by-one behavior flagged in tail.go. Once the implementation is fixed to exclude the trailing empty element from Split, update this test to expect:
expectedOffset = 18(actual byte length)len(yielded) == 3(actual line count)
📝 Expected test changes after fix
- // strings.Split on a trailing-newline body produces one extra empty element,
- // each element (including the empty one) adds len(line)+1 to the offset.
- expectedOffset := int64(len(body)) + 1 // trailing empty element adds +1
+ expectedOffset := int64(len(body))
if newOffset != expectedOffset {
t.Errorf("expected offset %d, got %d", expectedOffset, newOffset)
}
- // 3 real lines + 1 empty string from the trailing newline split
- if len(yielded) != 4 {
- t.Errorf("expected 4 yielded elements, got %d: %v", len(yielded), yielded)
+ if len(yielded) != 3 {
+ t.Errorf("expected 3 lines, got %d: %v", len(yielded), yielded)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // strings.Split on a trailing-newline body produces one extra empty element, | |
| // each element (including the empty one) adds len(line)+1 to the offset. | |
| expectedOffset := int64(len(body)) + 1 // trailing empty element adds +1 | |
| if newOffset != expectedOffset { | |
| t.Errorf("expected offset %d, got %d", expectedOffset, newOffset) | |
| } | |
| // 3 real lines + 1 empty string from the trailing newline split | |
| if len(yielded) != 4 { | |
| t.Errorf("expected 4 yielded elements, got %d: %v", len(yielded), yielded) | |
| } | |
| expectedOffset := int64(len(body)) | |
| if newOffset != expectedOffset { | |
| t.Errorf("expected offset %d, got %d", expectedOffset, newOffset) | |
| } | |
| if len(yielded) != 3 { | |
| t.Errorf("expected 3 lines, got %d: %v", len(yielded), yielded) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/pkg/clouds/azure/acr/tail_test.go` around lines 59 - 68, The test is
asserting behavior for the old buggy Split/offset logic; after fixing tail.go to
ignore the trailing empty element from strings.Split, update the test
assertions: set expectedOffset to 18 (actual byte length) instead of
int64(len(body))+1 and assert len(yielded) == 3 (three real lines) instead of 4;
change the two t.Errorf checks that compare expectedOffset/newOffset and
len(yielded) to use these new expected values so the test matches the corrected
logic in tail.go.
| if status.IsTerminal() { | ||
| // Read any remaining log content | ||
| readLogChunk(ctx, logURL, offset, yield) | ||
| if !status.IsSuccess() { | ||
| msg := string(status.Status) | ||
| if status.ErrorMessage != "" { | ||
| msg += ": " + status.ErrorMessage | ||
| } | ||
| yield("", fmt.Errorf("build %s: %s", runID, msg)) | ||
| } | ||
| return | ||
| } |
There was a problem hiding this comment.
Ignored return value may cause yield to be called after consumer stopped.
Line 45 discards both return values from readLogChunk. If the consumer stops iteration (yield returns false), readLogChunk returns errStopped, but this is ignored and the code proceeds to potentially call yield again on line 51, violating the iterator contract.
🛡️ Proposed fix to respect early termination
if status.IsTerminal() {
// Read any remaining log content
- readLogChunk(ctx, logURL, offset, yield)
+ if _, err := readLogChunk(ctx, logURL, offset, yield); err != nil {
+ return // consumer stopped iteration
+ }
if !status.IsSuccess() {
msg := string(status.Status)
if status.ErrorMessage != "" {
msg += ": " + status.ErrorMessage
}
yield("", fmt.Errorf("build %s: %s", runID, msg))
}
return
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/pkg/clouds/azure/acr/tail.go` around lines 43 - 54, In the
terminal-status branch in the function where status.IsTerminal() is handled,
stop ignoring the return values from readLogChunk(ctx, logURL, offset, yield);
capture its (offset, err) result and if err indicates the consumer stopped
(e.g., err == errStopped or a boolean false from yield), return immediately so
you do not call yield again with the final failure message for runID; otherwise
continue building the msg and call yield as before. Ensure you reference
readLogChunk and runID in the fix and propagate/handle the err return to avoid
violating the iterator contract.
| lines := strings.Split(text, "\n") | ||
|
|
||
| // If the last byte is not a newline, the last line is incomplete — hold it back | ||
| complete := lines | ||
| if text[len(text)-1] != '\n' { | ||
| complete = lines[:len(lines)-1] | ||
| } | ||
|
|
||
| for _, line := range complete { | ||
| offset += int64(len(line) + 1) // +1 for the newline | ||
| if !yield(line, nil) { | ||
| return offset, errStopped | ||
| } | ||
| } |
There was a problem hiding this comment.
Off-by-one in offset calculation when body ends with newline.
When text ends with \n, strings.Split produces a trailing empty string (e.g., "a\nb\n" → ["a", "b", ""]). The code includes this empty element in complete and adds +1 for a non-existent newline, causing offset to exceed actual content length by 1 byte.
For "line1\nline2\n" (12 bytes): offset becomes 13 instead of 12.
🐛 Proposed fix to exclude trailing empty element
lines := strings.Split(text, "\n")
// If the last byte is not a newline, the last line is incomplete — hold it back
complete := lines
if text[len(text)-1] != '\n' {
complete = lines[:len(lines)-1]
+ } else if len(lines) > 0 && lines[len(lines)-1] == "" {
+ // Remove trailing empty element from Split
+ complete = lines[:len(lines)-1]
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/pkg/clouds/azure/acr/tail.go` around lines 114 - 127, The offset
calculation overcounts when text ends with '\n' because strings.Split creates a
trailing empty element; update the logic around lines/complete to safely handle
empty text and drop the trailing empty element when text[len(text)-1] == '\n'
instead of treating it as a real line. Concretely: check len(text)>0 before
indexing, set complete = lines; if text[len(text)-1] == '\n' && len(lines) > 0
&& lines[len(lines)-1] == "" { complete = lines[:len(lines)-1] } else if
len(lines) > 0 { complete = lines[:len(lines)-1] } so offset increments (offset
+= int64(len(line)+1)) and yield(line, nil) remain correct without producing the
off-by-one.
Description
Add support azure
Linked Issues
Checklist
Summary by CodeRabbit