replace openai-access-gateway with litellm#2012
replace openai-access-gateway with litellm#2012jordanstephens wants to merge 21 commits intomainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughFetch provider AccountInfo once and thread it into model/provider fixups; switch model-provider services to LiteLLM (image, command, port 4000, LITELLM_MASTER_KEY behavior); change GCP BYOC log parsing to query live instance-group-manager labels; update test fixtures, warnings, docs, and supporting mocks. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI (FixupServices)
participant Provider as Provider (AccountInfo)
participant Fixup as fixupModel / fixupModelProvider
participant Maker as makeAccessGatewayService
participant Service as Dependent Service(s)
CLI->>Provider: AccountInfo(ctx)
Provider-->>CLI: *AccountInfo (may be empty)
CLI->>Fixup: fixupModel(..., accountInfo)
Fixup->>Maker: makeAccessGatewayService(..., accountInfo)
Maker->>Maker: choose image/port/command (litellm, port=4000) and derive LITELLM_MASTER_KEY
Maker->>Service: inject envs (LITELLM_MASTER_KEY / OPENAI_API_KEY fallback) and endpoint URLs
Service-->>CLI: updated service compose config
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/pkg/cli/compose/fixup.go (1)
362-387:⚠️ Potential issue | 🔴 CriticalThe generated
*_URLis missing the port and uses an incorrect API path.Line 366 generates
http://<service>/api/v1/, but after switching to LiteLLM on port 4000 (line 387), dependents need both the port and the correct OpenAI-compatible API path. On the sharedmodel_provider_privatenetwork, the URL without an explicit port defaults to port 80, so clients cannot reach the LiteLLM listener. Additionally, LiteLLM's OpenAI-compatible API uses/v1/, not/api/v1/. Dependent services will fail unless they can reach the correct endpoint on port 4000.💡 Suggested fix
- urlVal := "http://" + svccfg.Name + "/api/v1/" + urlVal := fmt.Sprintf("http://%s:%d/v1", svccfg.Name, liteLLMPort)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/compose/fixup.go` around lines 362 - 387, The generated endpoint URL in makeAccessGatewayService is incorrect: change urlVal to include the LiteLLM port and the OpenAI-compatible path (use "http://<service>:<port>/v1/"), referencing liteLLMPort and svccfg.Name, and assign it into the service environment under the endpointEnvVar (and ensure modelEnvVar is also set to the passed model if needed); update the env map population logic in makeAccessGatewayService so svccfg.Environment[endpointEnvVar] = urlVal (and svccfg.Environment[modelEnvVar] = model) instead of leaving the URL without port and using /api/v1/.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/pkg/cli/compose/fixup.go`:
- Around line 232-239: The current fixupLLM uses GetImageRepo which strips parts
incorrectly when a registry includes a port (e.g.
registry.example:5000/litellm:latest), causing the "/litellm" suffix check to
fail; change fixupLLM to compute the image path without removing a registry port
by stripping only the image tag/digest (remove the substring after the last ':'
or '@' only if it occurs after the last '/'), then run strings.HasSuffix on that
sanitized path; update references in fixupLLM to use svccfg.Image (or a local
sanitizedImage variable) for the suffix check and preserve adding the host port
using liteLLMPort if the suffix matches and svccfg.Ports is empty.
---
Outside diff comments:
In `@src/pkg/cli/compose/fixup.go`:
- Around line 362-387: The generated endpoint URL in makeAccessGatewayService is
incorrect: change urlVal to include the LiteLLM port and the OpenAI-compatible
path (use "http://<service>:<port>/v1/"), referencing liteLLMPort and
svccfg.Name, and assign it into the service environment under the endpointEnvVar
(and ensure modelEnvVar is also set to the passed model if needed); update the
env map population logic in makeAccessGatewayService so
svccfg.Environment[endpointEnvVar] = urlVal (and svccfg.Environment[modelEnvVar]
= model) instead of leaving the URL without port and using /api/v1/.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e90edff0-1afb-496e-b665-e6d2c7d6af51
📒 Files selected for processing (6)
src/pkg/cli/compose/fixup.gosrc/testdata/llm/compose.yamlsrc/testdata/llm/compose.yaml.fixupsrc/testdata/llm/compose.yaml.goldensrc/testdata/models/compose.yaml.fixupsrc/testdata/provider/compose.yaml.fixup
| func fixupLLM(svccfg *composeTypes.ServiceConfig) { | ||
| image := GetImageRepo(svccfg.Image) | ||
| if strings.HasSuffix(image, "/openai-access-gateway") && len(svccfg.Ports) == 0 { | ||
| if strings.HasSuffix(image, "/litellm") && len(svccfg.Ports) == 0 { | ||
| // HACK: we must have at least one host port to get a CNAME for the service | ||
| var port uint32 = 80 | ||
| // litellm listens on 4000 by default | ||
| var port uint32 = liteLLMPort | ||
| term.Debugf("service %q: adding LLM host port %d", svccfg.Name, port) | ||
| svccfg.Ports = []composeTypes.ServicePortConfig{{Target: port, Mode: Mode_HOST, Protocol: Protocol_TCP}} |
There was a problem hiding this comment.
Handle registry ports before matching /litellm.
This suffix check inherits GetImageRepo's first-: split, so registry.example:5000/litellm:latest is reduced to registry.example before the match. Those LiteLLM images will skip the 4000-port fixup entirely.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/pkg/cli/compose/fixup.go` around lines 232 - 239, The current fixupLLM
uses GetImageRepo which strips parts incorrectly when a registry includes a port
(e.g. registry.example:5000/litellm:latest), causing the "/litellm" suffix check
to fail; change fixupLLM to compute the image path without removing a registry
port by stripping only the image tag/digest (remove the substring after the last
':' or '@' only if it occurs after the last '/'), then run strings.HasSuffix on
that sanitized path; update references in fixupLLM to use svccfg.Image (or a
local sanitizedImage variable) for the suffix check and preserve adding the host
port using liteLLMPort if the suffix matches and svccfg.Ports is empty.
e614bb1 to
eb7a99b
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/pkg/cli/compose/fixup.go (1)
437-452:⚠️ Potential issue | 🟠 MajorPotential nil map assignment on
dependency.Networks.At line 442,
dependency.Networks[modelProviderNetwork] = nilwill panic ifdependency.Networksis nil. The same pattern at line 466 has the same issue. The code at lines 400-406 correctly handles this case forsvccfg.Networks.🐛 Proposed fix
for _, dependency := range project.Services { if _, ok := dependency.DependsOn[svccfg.Name]; ok { if dependency.Environment == nil { dependency.Environment = make(composeTypes.MappingWithEquals) } + if dependency.Networks == nil { + dependency.Networks = make(map[string]*composeTypes.ServiceNetworkConfig) + } dependency.Networks[modelProviderNetwork] = nilAnd similarly for the
dependency.Modelsblock around line 466:if dependency.Environment == nil { dependency.Environment = make(composeTypes.MappingWithEquals) } + if dependency.Networks == nil { + dependency.Networks = make(map[string]*composeTypes.ServiceNetworkConfig) + } dependency.Networks[modelProviderNetwork] = nil🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/compose/fixup.go` around lines 437 - 452, The code assigns into dependency.Networks without ensuring the map is initialized (causing a panic when nil); update the loop over project.Services that references dependency.Networks to check if dependency.Networks == nil and if so set dependency.Networks = make(composeTypes.MappingWithEquals) before setting dependency.Networks[modelProviderNetwork] = nil, and apply the same nil-check + initialization pattern used for svccfg.Networks to the other similar block that modifies dependency.Models (the second occurrence around the later block) so both maps are safely initialized before assignment.
🧹 Nitpick comments (3)
src/pkg/cli/subscribe.go (1)
78-85: Verbose logging on every message iteration.This logs on every iteration of the subscribe loop, which could produce excessive output if the stream is chatty. Additionally, the log executes before the
msg == nilcheck (line 87), so it prints even when there's nothing meaningful to process.Consider either:
- Moving this logging after the
msg == nilcheck, or- Only logging when
pendingServiceschanges from the previous iteration, or- Logging at a lower level (e.g.,
term.Debug) sincetailAndMonitor.gonow also logs a waiting message.♻️ Option: Move logging after nil check and reduce verbosity
+ if msg == nil { + continue + } + pendingServices := []string{} for _, service := range services { if serviceStates[service] != targetState { pendingServices = append(pendingServices, service) } } - term.Infof("Waiting for %q to be in state %s...\n", pendingServices, targetState) // TODO: don't print in Go-routine + term.Debugf("Pending services waiting for state %s: %v\n", targetState, pendingServices) // TODO: don't print in Go-routine - if msg == nil { - continue - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/subscribe.go` around lines 78 - 85, The current loop logs via term.Infof every iteration and does so before checking msg == nil, causing excessive/irrelevant output; move the term.Infof call so it executes after the nil-check for msg, and reduce verbosity by either using term.Debug (or a lower log level) or only emitting when pendingServices (computed from services, serviceStates, targetState) changes from its previous value; update the call site where term.Infof is used (the pendingServices computation and logging block) and ensure you track the lastPendingServices value to compare before logging if choosing the “only when changed” approach.src/pkg/clouds/gcp/compute.go (1)
14-18: Consider caching or reusing the Compute client.A new
compute.Serviceclient is created on every call toGetInstanceGroupManagerLabels. If this method is called frequently (e.g., for each audit log entry during a deployment), this could introduce unnecessary latency and resource overhead from repeated client initialization.This is likely acceptable for the current use case (sparse audit log entries), but if performance becomes a concern, consider passing a pre-initialized client or caching it at the
Gcpstruct level.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/clouds/gcp/compute.go` around lines 14 - 18, The GetInstanceGroupManagerLabels function creates a new compute.Service on every call; to reduce repeated client initialization, add a reusable client on the Gcp struct (e.g., a field computeSvc *compute.Service) or accept a pre-initialized client, initialize it lazily in a method like ensureComputeService() and reuse it in GetInstanceGroupManagerLabels, and protect lazy init with appropriate synchronization (e.g., sync.Mutex or sync.Once) to make compute.Service creation thread-safe.src/pkg/cli/compose/fixup.go (1)
40-44: Consider escalating the warning when AccountInfo is needed for model services.When
AccountInfofails and the project contains model services, the resulting emptyAccountInfowill cause provider-specific environment variables (likeAWS_REGION,VERTEXAI_PROJECT,VERTEXAI_LOCATION) to be omitted. This could lead to runtime failures that are difficult to diagnose since the error is only logged at debug level.Consider logging a more visible warning when model services exist in the project and
AccountInfofails.💡 Suggested improvement
accountInfo, err := provider.AccountInfo(ctx) if err != nil { term.Debugf("failed to get account info to fixup services: %v", err) accountInfo = &client.AccountInfo{} + // Check if any model services exist that might need account info + hasModelServices := len(project.Models) > 0 + if !hasModelServices { + for _, svc := range project.Services { + if svc.Provider != nil && svc.Provider.Type == "model" { + hasModelServices = true + break + } + } + } + if hasModelServices { + term.Warnf("failed to get account info; model services may not have required cloud provider configuration: %v", err) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/compose/fixup.go` around lines 40 - 44, When provider.AccountInfo(ctx) fails, the code currently only calls term.Debugf and falls back to an empty client.AccountInfo, which hides failures that matter when the project contains model services; update the logic around provider.AccountInfo (the call to provider.AccountInfo, the accountInfo variable and the debug log) to check whether any model services are present and, if so, log a visible warning (e.g., term.Warnf/term.Warningf) that includes the error details and notes that provider-specific env vars (AWS_REGION, VERTEXAI_PROJECT, VERTEXAI_LOCATION) may be omitted, while still falling back to accountInfo = &client.AccountInfo{} to preserve current behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/pkg/cli/compose/fixup.go`:
- Around line 411-434: The LITELLM_MASTER_KEY derivation loop can panic and
continues after a fatal inconsistency: ensure you nil-check
service.Environment["OPENAI_API_KEY"] before dereferencing (i.e., verify key !=
nil before using *key) when accumulating into openAIKey, and on detecting
multiple different OPENAI_API_KEY values (where currently term.Errorf is called)
stop further processing immediately instead of break — propagate the error or
return from the enclosing function (or set an explicit failure path) so
liteLLMMasterKey is not set from an inconsistent first value; update references
in this block (openAIKey, service.Environment["OPENAI_API_KEY"], term.Errorf,
liteLLMMasterKey, svccfg.Environment) accordingly.
- Line 398: The image tag assigned to svccfg.Image is incorrect; update the
value of svccfg.Image from "litellm/litellm:v1.82.3-stable.patch.3" to the
available tag "litellm/litellm:v1.82.3-stable.patch.2" so the code uses the
existing Docker image.
---
Outside diff comments:
In `@src/pkg/cli/compose/fixup.go`:
- Around line 437-452: The code assigns into dependency.Networks without
ensuring the map is initialized (causing a panic when nil); update the loop over
project.Services that references dependency.Networks to check if
dependency.Networks == nil and if so set dependency.Networks =
make(composeTypes.MappingWithEquals) before setting
dependency.Networks[modelProviderNetwork] = nil, and apply the same nil-check +
initialization pattern used for svccfg.Networks to the other similar block that
modifies dependency.Models (the second occurrence around the later block) so
both maps are safely initialized before assignment.
---
Nitpick comments:
In `@src/pkg/cli/compose/fixup.go`:
- Around line 40-44: When provider.AccountInfo(ctx) fails, the code currently
only calls term.Debugf and falls back to an empty client.AccountInfo, which
hides failures that matter when the project contains model services; update the
logic around provider.AccountInfo (the call to provider.AccountInfo, the
accountInfo variable and the debug log) to check whether any model services are
present and, if so, log a visible warning (e.g., term.Warnf/term.Warningf) that
includes the error details and notes that provider-specific env vars
(AWS_REGION, VERTEXAI_PROJECT, VERTEXAI_LOCATION) may be omitted, while still
falling back to accountInfo = &client.AccountInfo{} to preserve current
behavior.
In `@src/pkg/cli/subscribe.go`:
- Around line 78-85: The current loop logs via term.Infof every iteration and
does so before checking msg == nil, causing excessive/irrelevant output; move
the term.Infof call so it executes after the nil-check for msg, and reduce
verbosity by either using term.Debug (or a lower log level) or only emitting
when pendingServices (computed from services, serviceStates, targetState)
changes from its previous value; update the call site where term.Infof is used
(the pendingServices computation and logging block) and ensure you track the
lastPendingServices value to compare before logging if choosing the “only when
changed” approach.
In `@src/pkg/clouds/gcp/compute.go`:
- Around line 14-18: The GetInstanceGroupManagerLabels function creates a new
compute.Service on every call; to reduce repeated client initialization, add a
reusable client on the Gcp struct (e.g., a field computeSvc *compute.Service) or
accept a pre-initialized client, initialize it lazily in a method like
ensureComputeService() and reuse it in GetInstanceGroupManagerLabels, and
protect lazy init with appropriate synchronization (e.g., sync.Mutex or
sync.Once) to make compute.Service creation thread-safe.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 67f36a31-3342-44b2-a2d6-5a69183e29d3
📒 Files selected for processing (23)
pkgs/defang/cli.nixpkgs/npm/README.mdsrc/README.mdsrc/go.modsrc/pkg/cli/client/byoc/gcp/byoc_test.gosrc/pkg/cli/client/byoc/gcp/query.gosrc/pkg/cli/client/byoc/gcp/query_test.gosrc/pkg/cli/client/byoc/gcp/stream.gosrc/pkg/cli/client/byoc/gcp/stream_test.gosrc/pkg/cli/client/mock.gosrc/pkg/cli/compose/fixup.gosrc/pkg/cli/subscribe.gosrc/pkg/cli/tailAndMonitor.gosrc/pkg/clouds/aws/login.gosrc/pkg/clouds/gcp/compute.gosrc/pkg/clouds/gcp/login.gosrc/testdata/llm/compose.yamlsrc/testdata/llm/compose.yaml.fixupsrc/testdata/llm/compose.yaml.goldensrc/testdata/models/compose.yaml.fixupsrc/testdata/models/compose.yaml.warningssrc/testdata/provider/compose.yaml.fixupsrc/testdata/provider/compose.yaml.warnings
✅ Files skipped from review due to trivial changes (6)
- pkgs/npm/README.md
- src/README.md
- src/pkg/clouds/aws/login.go
- pkgs/defang/cli.nix
- src/testdata/models/compose.yaml.warnings
- src/testdata/provider/compose.yaml.warnings
🚧 Files skipped from review as they are similar to previous changes (5)
- src/testdata/llm/compose.yaml.fixup
- src/testdata/llm/compose.yaml.golden
- src/testdata/models/compose.yaml.fixup
- src/testdata/provider/compose.yaml.fixup
- src/testdata/llm/compose.yaml
4013d22 to
67c00bf
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
src/pkg/cli/compose/fixup.go (2)
238-246:⚠️ Potential issue | 🟡 MinorRegistry port in image names breaks
/litellmsuffix check.
GetImageReposplits on the first:, soregistry.example:5000/litellm:latestbecomesregistry.example, losing the/litellmpath. Such images won't get the port 4000 fixup.Consider splitting only on the last
:that follows a/(to separate tag from path), or use a proper image reference parser.,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/compose/fixup.go` around lines 238 - 246, The suffix check in fixupLLM fails because GetImageRepo splits on the first ':' and drops the path for registry URLs (e.g., registry:5000/...); update GetImageRepo usage or its implementation so it preserves the repository path before checking suffix: either change GetImageRepo to strip the image tag by splitting on the last ':' only if that colon appears after the last '/' (so registry:port is not treated as a tag), or replace GetImageRepo with a proper image reference parser and then call fixupLLM using the parsed repository/name to correctly detect strings.HasSuffix(repo, "/litellm") in the fixupLLM function.
424-446:⚠️ Potential issue | 🔴 CriticalNil pointer dereference risk when reading
OPENAI_API_KEY.At line 431,
*keyis dereferenced without checking ifkeyisnil. In compose files, environment variables can be declared without a value (OPENAI_API_KEY:orOPENAI_API_KEY: null), resulting in anilpointer that will panic on dereference.Additionally, at line 433-434, after logging the error about multiple different keys, execution continues with
breakand uses whateveropenAIKeywas found first—potentially inconsistent behavior.,
🐛 Proposed fix
if _, ok := service.DependsOn[svccfg.Name]; ok { if key, ok := service.Environment["OPENAI_API_KEY"]; ok { + if key == nil { + continue // skip nil (unset) keys + } if openAIKey == "" { openAIKey = *key } else if *key != openAIKey { term.Errorf("multiple different OPENAI_API_KEY values found in services depending on %q", svccfg.Name) - break + return // abort with inconsistent configuration } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pkg/cli/compose/fixup.go` around lines 424 - 446, The loop that reads OPENAI_API_KEY from dependent services (involving svccfg, project.Services and openAIKey) can dereference a nil *string (key) and may continue using an inconsistent first-found value after detecting differing keys; fix it by checking key != nil before using *key, skip nil/empty values, and when you detect multiple different non-nil keys (the condition where term.Errorf is called) clear openAIKey (e.g., set openAIKey = "") or otherwise mark it invalid and break so you don't keep using the first value, then proceed to set LITELLM_MASTER_KEY as before (falling back to the default when openAIKey is empty).
🧹 Nitpick comments (1)
src/testdata/llm/compose.yaml (1)
2-10: Good addition of alt-repo test case for custom registry.This tests the scenario of litellm images from alternate registries. Note that line 9 uses
target: "4000"as a string—compose-go handles both, but using an unquoted integer would be more conventional:target: 4000🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/testdata/llm/compose.yaml` around lines 2 - 10, The ports entry for the alt-repo service uses a string for target ("4000"); update the ports block for service name alt-repo so target is an integer (target: 4000) instead of a quoted string; locate the ports mapping under the alt-repo service in src/testdata/llm/compose.yaml and remove the quotes around 4000 to make it an unquoted integer.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/pkg/cli/compose/fixup.go`:
- Around line 238-246: The suffix check in fixupLLM fails because GetImageRepo
splits on the first ':' and drops the path for registry URLs (e.g.,
registry:5000/...); update GetImageRepo usage or its implementation so it
preserves the repository path before checking suffix: either change GetImageRepo
to strip the image tag by splitting on the last ':' only if that colon appears
after the last '/' (so registry:port is not treated as a tag), or replace
GetImageRepo with a proper image reference parser and then call fixupLLM using
the parsed repository/name to correctly detect strings.HasSuffix(repo,
"/litellm") in the fixupLLM function.
- Around line 424-446: The loop that reads OPENAI_API_KEY from dependent
services (involving svccfg, project.Services and openAIKey) can dereference a
nil *string (key) and may continue using an inconsistent first-found value after
detecting differing keys; fix it by checking key != nil before using *key, skip
nil/empty values, and when you detect multiple different non-nil keys (the
condition where term.Errorf is called) clear openAIKey (e.g., set openAIKey =
"") or otherwise mark it invalid and break so you don't keep using the first
value, then proceed to set LITELLM_MASTER_KEY as before (falling back to the
default when openAIKey is empty).
---
Nitpick comments:
In `@src/testdata/llm/compose.yaml`:
- Around line 2-10: The ports entry for the alt-repo service uses a string for
target ("4000"); update the ports block for service name alt-repo so target is
an integer (target: 4000) instead of a quoted string; locate the ports mapping
under the alt-repo service in src/testdata/llm/compose.yaml and remove the
quotes around 4000 to make it an unquoted integer.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: b4094c22-2f2d-4ea5-a0af-a3b0af72a866
📒 Files selected for processing (10)
src/pkg/cli/client/mock.gosrc/pkg/cli/compose/fixup.gosrc/pkg/cli/subscribe.gosrc/testdata/llm/compose.yamlsrc/testdata/llm/compose.yaml.fixupsrc/testdata/llm/compose.yaml.goldensrc/testdata/models/compose.yaml.fixupsrc/testdata/models/compose.yaml.warningssrc/testdata/provider/compose.yaml.fixupsrc/testdata/provider/compose.yaml.warnings
✅ Files skipped from review due to trivial changes (3)
- src/testdata/models/compose.yaml.warnings
- src/testdata/provider/compose.yaml.warnings
- src/testdata/provider/compose.yaml.fixup
🚧 Files skipped from review as they are similar to previous changes (4)
- src/pkg/cli/subscribe.go
- src/testdata/llm/compose.yaml.golden
- src/testdata/llm/compose.yaml.fixup
- src/testdata/models/compose.yaml.fixup
67c00bf to
1fa0947
Compare
G101 is a gosec rule ID, not a standalone linter name. Using it in //nolint directives caused golangci-lint to warn about unknown linters. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GCE allInstancesConfig.properties.labels is a map<string,string>, not a
list of {key,value} structs. The query filters were using the list format
(labels.key="defang-service" / labels.value="...") which never matched any
audit log entries, so gce_instance_group_manager events were never returned
by Cloud Logging. Even if events had arrived, the parser was iterating over
the field as a list (GetListInStruct) which always returned nil, leaving the
computeEngineRootTriggers map empty. As a result, all gce_instance_group
addInstances events were silently dropped and WaitServiceState never
received DEPLOYMENT_COMPLETED for Compute Engine services.
Fix the query to use map-style key access:
labels."defang-service"=~"^(svc)$"
Fix the parser to use GetValueInStruct with the label name as a path key,
replacing the 10-line list iteration with a single call.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
GCE audit logs for regionInstanceGroupManagers.patch only carry the fields that changed (e.g. the new instance template version). The allInstancesConfig.properties.labels — where the defang-service label lives — is absent from the request body for every update after the initial create. As a result, the computeEngineRootTriggers map was never populated and all gce_instance_group addInstances events were silently dropped, so WaitServiceState never received DEPLOYMENT_COMPLETED for Compute Engine services. Fix: instead of reading labels from the audit log request body, read the instance group manager name, project, and region from the always- present entry.Resource.Labels and call the GCE REST API to get the live resource's allInstancesConfig.properties.labels. This mirrors the fallback used by the server-side fabric_gcp.go implementation. Add GetInstanceGroupManagerLabels to GcpLogsClient and implement it using the already-present google.golang.org/api/compute/v1 dependency (no new deps required). Also add the missing isQuotaError helper to the gcpquota debug tool, which was preventing the pre-commit lint check from passing. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
PATCH requests for regionInstanceGroupManagers only carry changed fields (e.g. a new instance template reference). When Pulumi re-deploys a CE service, it patches the instance template without including allInstancesConfig.properties.labels in the request body. The Cloud Logging filter on those absent label fields never matched, so no gce_instance_group_manager events were returned for re-deploys, leaving computeEngineRootTriggers empty and causing all gce_instance_group addInstances events to be silently dropped. The parser already handles service-specific filtering by reading labels from the live MIG resource via GetInstanceGroupManagerLabels, so the query-level label filters are redundant and harmful. Remove them and keep only the method name and operation.first filters, consistent with how AddComputeEngineInstanceGroupAddInstances works. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Lio李歐 <lionello@users.noreply.github.com>
Cover the three bugs fixed in this branch: - TestAddComputeEngineInstanceGroupInsertOrPatch: asserts the query contains no allInstancesConfig or defang-* label filters (guarding against the old list-format filters that never matched) - TestActivityParser_GceInstanceGroupManager: table-driven tests for the gce_instance_group_manager parser path — happy path, API error, nil labels, missing defang-service label, and missing root_trigger_id - TestActivityParser_GceInstanceGroupFlow: end-to-end test that a manager insert/patch entry populates the trigger map and a subsequent addInstances entry uses it to emit DEPLOYMENT_COMPLETED - TestActivityParser_GceInstanceGroupDropsUnknownTrigger: events with an unrecognized root_trigger_id are silently dropped Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Instead of * Waiting for ["app" "worker" "chat" "embedding"] to be in state DEPLOYMENT_COMPLETED... * Waiting for ["app" "worker" "embedding"] to be in state DEPLOYMENT_COMPLETED... * Waiting for ["app" "worker"] to be in state DEPLOYMENT_COMPLETED... I prefer * Waiting for services to finish deploying: ["app" "worker" "chat" "embedding"] * Waiting for services to finish deploying: ["app" "worker" "embedding"] * Waiting for services to finish deploying: ["app" "worker"] message
i always find myself grepping for 'with state' which only feels tangentially related. A better prefix is "Service update:"
Co-authored-by: Lio李歐 <lionello@users.noreply.github.com> Co-authored-by: Jordan Stephens <jordan@stephens.io>
Range-iterating project.Services yields value copies, so assigning a newly created DependsOn map to the local copy was silently dropped. Switch to keyed iteration and write the service back whenever it was changed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Split makeAccessGatewayService into two focused helpers: - configureAccessGateway: resolves the model for the target cloud provider, sets up the LiteLLM container (image, command, network, port), and derives LITELLM_MASTER_KEY - wireDependentServices: injects URL/model/key env vars and network membership into every service that depends on the model service Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0ee34a9 to
9352b23
Compare
Description
This PR replaces the openai-access-gateway with litellm when users deploy compose files with the docker model provider.
For example:
Will get transformed into:
Linked Issues
Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores