Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughDefers commitments API construction until after Nova client init; adds Nova server listing and usage-reporting flow: a UsageCalculator that builds LIQUID usage reports, a POST /v1/commitments/projects/:project_id/report-usage endpoint, Prometheus monitors/metrics, API wiring to accept a Nova client, plus tests and alerting rules. Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(200,220,255,0.5)
participant Client as Client
participant API as Commitments API
participant Calc as UsageCalculator
participant Knowledge as Knowledge CRD
participant K8s as Kubernetes (Reservations)
participant Nova as Nova Client
end
Client->>API: POST /v1/commitments/projects/:project_id/report-usage
API->>API: validate method, extract project_id, start timer
API->>Calc: CalculateUsage(ctx, logger, project_id, allAZs)
Calc->>Knowledge: List flavor groups
Knowledge-->>Calc: flavor group data
Calc->>K8s: List Reservations (project_id)
K8s-->>Calc: Commitments list
Calc->>Nova: ListProjectServers(project_id)
Nova-->>Calc: ServerDetail list
Calc->>Calc: build VMUsageInfo, sort VMs & commitments
Calc->>Calc: assign VMs → commitments (az:flavorGroup) or PAYG
Calc-->>API: ServiceUsageReport
API-->>Client: 200 OK + JSON report
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
internal/scheduling/reservations/commitments/api_report_usage.go (1)
60-64: Minor: Error after WriteHeader cannot change HTTP status.After
WriteHeader(http.StatusOK)is called, any subsequent error fromjson.Encodecan only be logged—the HTTP status code cannot be changed. This is correct behavior, but consider buffering the response or using a response wrapper to detect encoding errors before committing the status.For this use case, the current approach is acceptable since JSON encoding of the report struct is unlikely to fail in practice.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_report_usage.go` around lines 60 - 64, The handler in internal/scheduling/reservations/commitments/api_report_usage.go currently calls w.WriteHeader(http.StatusOK) before json.NewEncoder(w).Encode(report), so any encoding error cannot change the committed HTTP status; to fix, encode into a temporary buffer first (e.g., use bytes.Buffer and json.NewEncoder(&buf).Encode(report)) or use a ResponseWriter wrapper that buffers output, check for encode errors, then set headers and call w.WriteHeader/http.ServeContent and write the buffer only if encoding succeeded; update the code paths around w.WriteHeader(http.StatusOK) and json.NewEncoder(w).Encode(report) accordingly.internal/scheduling/reservations/commitments/api_report_usage_test.go (1)
584-594: Consider adding request timeout for robustness.The HTTP request uses
http.DefaultClientwithout a timeout. While acceptable in tests, a context with timeout would prevent test hangs if the server doesn't respond.Optional: Add context with timeout
- req, err := http.NewRequest(method, url, bytes.NewReader(reqJSON)) //nolint:noctx + ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) + defer cancel() + req, err := http.NewRequestWithContext(ctx, method, url, bytes.NewReader(reqJSON))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_report_usage_test.go` around lines 584 - 594, The test currently creates a request with http.NewRequest and uses http.DefaultClient.Do(req) without any timeout, risking test hangs; update the request to use a context with timeout (e.g., context.WithTimeout) and create the request with that context (use http.NewRequestWithContext or attach ctx via req = req.WithContext(ctx)), ensure you call cancel() in a defer to release resources, and then perform the request (replace http.DefaultClient.Do(req) or create an http.Client with a Timeout if preferred) so the test will fail fast on slow/unresponsive servers; key identifiers: req, http.DefaultClient.Do, http.NewRequest (or http.NewRequestWithContext), context.WithTimeout, cancel.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduling/nova/nova_client.go`:
- Around line 196-200: The HTTP response body is being deferred inside a loop
(resp, req from api.sc.HTTPClient.Do), causing many open bodies; change the code
in the loop to call resp.Body.Close() immediately after you're finished
reading/parsing the response (e.g., after io.ReadAll or json decoding) instead
of deferring, or refactor the request+read logic into a helper function (e.g.,
extract DoRequest/parsePage helper) so that a defer resp.Body.Close() is scoped
to that helper and the body is closed before the next iteration; ensure resp is
closed on all error paths as well.
In `@internal/scheduling/reservations/commitments/usage_test.go`:
- Around line 549-573: The reservation name generation in
makeUsageTestReservation uses string(rune('0'+slot)) which only yields correct
single-digit names and breaks for slot >= 10; change the name construction to
use fmt.Sprintf to format the slot as a decimal (e.g.
fmt.Sprintf("commitment-%s-%d", commitmentUUID, slot)) and add the "fmt" import;
update any tests relying on the exact name if needed.
In `@internal/scheduling/reservations/commitments/usage.go`:
- Around line 189-199: Remove the redundant fallback parse and simplify the time
parsing in the block that sets createdAt from server.Created: use only
time.Parse(time.RFC3339, server.Created) (referenced symbols: createdAt,
server.Created) and, if it returns an error, log the failure via log.V(1).Info
including server.ID and server.Created and set createdAt = time.Time{}; delete
the secondary time.Parse using the identical layout "2006-01-02T15:04:05Z".
---
Nitpick comments:
In `@internal/scheduling/reservations/commitments/api_report_usage_test.go`:
- Around line 584-594: The test currently creates a request with http.NewRequest
and uses http.DefaultClient.Do(req) without any timeout, risking test hangs;
update the request to use a context with timeout (e.g., context.WithTimeout) and
create the request with that context (use http.NewRequestWithContext or attach
ctx via req = req.WithContext(ctx)), ensure you call cancel() in a defer to
release resources, and then perform the request (replace
http.DefaultClient.Do(req) or create an http.Client with a Timeout if preferred)
so the test will fail fast on slow/unresponsive servers; key identifiers: req,
http.DefaultClient.Do, http.NewRequest (or http.NewRequestWithContext),
context.WithTimeout, cancel.
In `@internal/scheduling/reservations/commitments/api_report_usage.go`:
- Around line 60-64: The handler in
internal/scheduling/reservations/commitments/api_report_usage.go currently calls
w.WriteHeader(http.StatusOK) before json.NewEncoder(w).Encode(report), so any
encoding error cannot change the committed HTTP status; to fix, encode into a
temporary buffer first (e.g., use bytes.Buffer and
json.NewEncoder(&buf).Encode(report)) or use a ResponseWriter wrapper that
buffers output, check for encode errors, then set headers and call
w.WriteHeader/http.ServeContent and write the buffer only if encoding succeeded;
update the code paths around w.WriteHeader(http.StatusOK) and
json.NewEncoder(w).Encode(report) accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a9f77294-ae30-4a5b-ac2a-bc4bb307d305
📒 Files selected for processing (11)
cmd/main.gointernal/scheduling/nova/deschedulings_executor_test.gointernal/scheduling/nova/detector_cycle_breaker_test.gointernal/scheduling/nova/nova_client.gointernal/scheduling/reservations/commitments/api.gointernal/scheduling/reservations/commitments/api_change_commitments_test.gointernal/scheduling/reservations/commitments/api_report_usage.gointernal/scheduling/reservations/commitments/api_report_usage_test.gointernal/scheduling/reservations/commitments/state.gointernal/scheduling/reservations/commitments/usage.gointernal/scheduling/reservations/commitments/usage_test.go
| resp, err := api.sc.HTTPClient.Do(req) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
| defer resp.Body.Close() |
There was a problem hiding this comment.
Resource leak: defer inside loop accumulates closers.
Using defer resp.Body.Close() inside a loop means all response bodies stay open until the function returns. For paginated responses with many pages, this can exhaust file descriptors.
🐛 Proposed fix: close body immediately after use
resp, err := api.sc.HTTPClient.Do(req)
if err != nil {
return nil, err
}
- defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
+ resp.Body.Close()
return nil, fmt.Errorf("unexpected status code: %d", resp.StatusCode)
}
// ... decode logic ...
if err := json.NewDecoder(resp.Body).Decode(&list); err != nil {
+ resp.Body.Close()
return nil, err
}
+ resp.Body.Close()
// Convert to ServerDetailAlternatively, extract the HTTP request logic into a helper function where defer would be scoped correctly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/nova/nova_client.go` around lines 196 - 200, The HTTP
response body is being deferred inside a loop (resp, req from
api.sc.HTTPClient.Do), causing many open bodies; change the code in the loop to
call resp.Body.Close() immediately after you're finished reading/parsing the
response (e.g., after io.ReadAll or json decoding) instead of deferring, or
refactor the request+read logic into a helper function (e.g., extract
DoRequest/parsePage helper) so that a defer resp.Body.Close() is scoped to that
helper and the body is closed before the next iteration; ensure resp is closed
on all error paths as well.
| // makeUsageTestReservation creates a test reservation for UsageCalculator tests. | ||
| func makeUsageTestReservation(commitmentUUID, projectID, flavorGroup, az string, memoryBytes int64, slot int) *v1alpha1.Reservation { | ||
| name := "commitment-" + commitmentUUID + "-" + string(rune('0'+slot)) | ||
|
|
||
| return &v1alpha1.Reservation{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: name, | ||
| Labels: map[string]string{ | ||
| v1alpha1.LabelReservationType: v1alpha1.ReservationTypeLabelCommittedResource, | ||
| }, | ||
| }, | ||
| Spec: v1alpha1.ReservationSpec{ | ||
| Type: v1alpha1.ReservationTypeCommittedResource, | ||
| AvailabilityZone: az, | ||
| Resources: map[hv1.ResourceName]resource.Quantity{ | ||
| "memory": *resource.NewQuantity(memoryBytes, resource.BinarySI), | ||
| }, | ||
| CommittedResourceReservation: &v1alpha1.CommittedResourceReservationSpec{ | ||
| CommitmentUUID: commitmentUUID, | ||
| ProjectID: projectID, | ||
| ResourceGroup: flavorGroup, | ||
| }, | ||
| }, | ||
| } | ||
| } |
There was a problem hiding this comment.
Bug: Slot naming breaks for values >= 10.
The expression string(rune('0'+slot)) only works for single-digit slots (0-9). For slot=10, this produces ':' (ASCII 58), not "10".
Proposed fix using fmt.Sprintf
func makeUsageTestReservation(commitmentUUID, projectID, flavorGroup, az string, memoryBytes int64, slot int) *v1alpha1.Reservation {
- name := "commitment-" + commitmentUUID + "-" + string(rune('0'+slot))
+ name := fmt.Sprintf("commitment-%s-%d", commitmentUUID, slot)Note: This requires adding "fmt" to the imports (it's not currently imported in this file).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/usage_test.go` around lines 549
- 573, The reservation name generation in makeUsageTestReservation uses
string(rune('0'+slot)) which only yields correct single-digit names and breaks
for slot >= 10; change the name construction to use fmt.Sprintf to format the
slot as a decimal (e.g. fmt.Sprintf("commitment-%s-%d", commitmentUUID, slot))
and add the "fmt" import; update any tests relying on the exact name if needed.
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/main.go`:
- Around line 336-339: The commitments HTTP API is currently only initialized
when the nova pipeline controller is enabled which removes /v1/commitments when
that controller is disabled and forces a required commitments.Config via
GetConfigOrDie; change this so commitments.NewAPIWithConfig and
commitmentsAPI.Init(mux, metrics.Registry) are always called regardless of the
nova-pipeline-controllers flag, obtain commitments.Config in a non-fatal way
(e.g., use a non-Die getter or nil/default config instead of GetConfigOrDie) and
pass nil for novaClient when the Nova pipeline controller is not enabled so the
API remains served while Nova integration stays optional (adjust code around
commitmentsConfig, commitments.NewAPIWithConfig, and commitmentsAPI.Init).
In `@internal/scheduling/reservations/commitments/api_report_usage.go`:
- Around line 71-79: The path parser currently allows extra trailing segments
because it only checks len(parts) < 5; change the validation to require an exact
match of parts length and values: require len(parts) == 5 and validate parts[0]
== "v1", parts[1] == "commitments", parts[2] == "projects" and parts[4] ==
"report-usage" before extracting projectID (parts[3]). This makes the route
exact and will reject paths like
/v1/commitments/projects/<id>/report-usage/extra.
In `@internal/scheduling/reservations/commitments/usage.go`:
- Around line 171-183: The current mapping can leave servers with unknown
flavors (flavorToGroup or flavorToSmallestMemory missing) which silently yields
FlavorGroup=="" and UsageMultiple==0; update the code that assigns
FlavorGroup/UsageMultiple (the place that reads server.FlavorName before calling
buildUsageResponse) to check for existence in flavorToGroup and
flavorToSmallestMemory and fail fast (return an error or log and return) when a
flavor is not found, including the server ID/name and the missing flavor name in
the error; ensure the same validation is applied for any similar logic that
computes UsageMultiple so buildUsageResponse never receives a server with an
empty FlavorGroup or zero UsageMultiple.
- Around line 132-140: The loop currently appends every reconstructed commitment
state (from FromReservations) regardless of timing; change it to skip
commitments that are not active now by checking the state's StartTime and
EndTime on the reconstructed commitment (the object returned by
FromReservations) and continue when now is before StartTime or now is after
EndTime; then only call NewCommitmentStateWithUsage and azFlavorGroupKey and
append to result for commitments whose StartTime <= now <= EndTime (use
time.Now() in UTC or the package's time source consistently).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b5b048ba-b17d-4414-9d2d-210457ad6e35
📒 Files selected for processing (8)
cmd/main.gointernal/scheduling/reservations/commitments/api.gointernal/scheduling/reservations/commitments/api_change_commitments_test.gointernal/scheduling/reservations/commitments/api_report_usage.gointernal/scheduling/reservations/commitments/api_report_usage_test.gointernal/scheduling/reservations/commitments/state.gointernal/scheduling/reservations/commitments/usage.gointernal/scheduling/reservations/commitments/usage_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/scheduling/reservations/commitments/api_change_commitments_test.go
- internal/scheduling/reservations/commitments/api.go
| // Initialize commitments API for LIQUID interface (with Nova client for usage reporting) | ||
| commitmentsConfig := conf.GetConfigOrDie[commitments.Config]() | ||
| commitmentsAPI := commitments.NewAPIWithConfig(multiclusterClient, commitmentsConfig, novaClient) | ||
| commitmentsAPI.Init(mux, metrics.Registry) |
There was a problem hiding this comment.
Don't gate the commitments HTTP API behind nova-pipeline-controllers.
This is the only commitmentsAPI.Init() in cmd/main.go, so deployments that do not enable that controller will stop serving /v1/commitments/... entirely. It also makes the Nova pipeline controller path hard-require commitments.Config via GetConfigOrDie(), even though the usage path already tolerates a nil Nova client.
Suggested direction
+var commitmentsNovaClient commitments.UsageNovaClient
+
if slices.Contains(mainConfig.EnabledControllers, "nova-pipeline-controllers") {
...
- commitmentsConfig := conf.GetConfigOrDie[commitments.Config]()
- commitmentsAPI := commitments.NewAPIWithConfig(multiclusterClient, commitmentsConfig, novaClient)
- commitmentsAPI.Init(mux, metrics.Registry)
+ commitmentsNovaClient = novaClient
...
}
+
+commitmentsConfig := conf.GetConfigOrDie[commitments.Config]()
+commitmentsAPI := commitments.NewAPIWithConfig(multiclusterClient, commitmentsConfig, commitmentsNovaClient)
+commitmentsAPI.Init(mux, metrics.Registry)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/main.go` around lines 336 - 339, The commitments HTTP API is currently
only initialized when the nova pipeline controller is enabled which removes
/v1/commitments when that controller is disabled and forces a required
commitments.Config via GetConfigOrDie; change this so
commitments.NewAPIWithConfig and commitmentsAPI.Init(mux, metrics.Registry) are
always called regardless of the nova-pipeline-controllers flag, obtain
commitments.Config in a non-fatal way (e.g., use a non-Die getter or nil/default
config instead of GetConfigOrDie) and pass nil for novaClient when the Nova
pipeline controller is not enabled so the API remains served while Nova
integration stays optional (adjust code around commitmentsConfig,
commitments.NewAPIWithConfig, and commitmentsAPI.Init).
| parts := strings.Split(strings.Trim(path, "/"), "/") | ||
| // Expected: ["v1", "commitments", "projects", "<uuid>", "report-usage"] | ||
| if len(parts) < 5 { | ||
| return "", fmt.Errorf("path too short: %s", path) | ||
| } | ||
| if parts[2] != "projects" || parts[4] != "report-usage" { | ||
| return "", fmt.Errorf("unexpected path format: %s", path) | ||
| } | ||
| projectID := parts[3] |
There was a problem hiding this comment.
Reject trailing path segments in the route parser.
len(parts) < 5 accepts /v1/commitments/projects/<id>/report-usage/anything as if it were the canonical endpoint. Since this handler is mounted on a prefix, make the path exact here instead of ignoring the extra suffix.
Proposed fix
- if len(parts) < 5 {
- return "", fmt.Errorf("path too short: %s", path)
- }
- if parts[2] != "projects" || parts[4] != "report-usage" {
+ if len(parts) != 5 || parts[0] != "v1" || parts[1] != "commitments" || parts[2] != "projects" || parts[4] != "report-usage" {
return "", fmt.Errorf("unexpected path format: %s", path)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| parts := strings.Split(strings.Trim(path, "/"), "/") | |
| // Expected: ["v1", "commitments", "projects", "<uuid>", "report-usage"] | |
| if len(parts) < 5 { | |
| return "", fmt.Errorf("path too short: %s", path) | |
| } | |
| if parts[2] != "projects" || parts[4] != "report-usage" { | |
| return "", fmt.Errorf("unexpected path format: %s", path) | |
| } | |
| projectID := parts[3] | |
| parts := strings.Split(strings.Trim(path, "/"), "/") | |
| // Expected: ["v1", "commitments", "projects", "<uuid>", "report-usage"] | |
| if len(parts) != 5 || parts[0] != "v1" || parts[1] != "commitments" || parts[2] != "projects" || parts[4] != "report-usage" { | |
| return "", fmt.Errorf("unexpected path format: %s", path) | |
| } | |
| projectID := parts[3] |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/api_report_usage.go` around
lines 71 - 79, The path parser currently allows extra trailing segments because
it only checks len(parts) < 5; change the validation to require an exact match
of parts length and values: require len(parts) == 5 and validate parts[0] ==
"v1", parts[1] == "commitments", parts[2] == "projects" and parts[4] ==
"report-usage" before extracting projectID (parts[3]). This makes the route
exact and will reject paths like
/v1/commitments/projects/<id>/report-usage/extra.
| // Build flavor name -> flavor group lookup | ||
| flavorToGroup := make(map[string]string) | ||
| flavorToSmallestMemory := make(map[string]uint64) // for calculating usage multiples | ||
| for groupName, group := range flavorGroups { | ||
| for _, flavor := range group.Flavors { | ||
| flavorToGroup[flavor.Name] = groupName | ||
| } | ||
| // Smallest flavor in group determines the usage unit | ||
| if group.SmallestFlavor.Name != "" { | ||
| for _, flavor := range group.Flavors { | ||
| flavorToSmallestMemory[flavor.Name] = group.SmallestFlavor.MemoryMB | ||
| } | ||
| } |
There was a problem hiding this comment.
Fail fast when Nova returns a flavor that is missing from flavor-group knowledge.
If server.FlavorName is absent from flavorToGroup or flavorToSmallestMemory, this code leaves FlavorGroup == "" and UsageMultiple == 0. Those VMs are later skipped in buildUsageResponse(), so the report silently undercounts usage instead of failing loudly.
Proposed fix
- // Determine flavor group
- flavorGroup := flavorToGroup[server.FlavorName]
+ // Determine flavor group
+ flavorGroup, ok := flavorToGroup[server.FlavorName]
+ if !ok {
+ return nil, fmt.Errorf("server %s uses unknown flavor %q", server.ID, server.FlavorName)
+ }
// Calculate usage multiple (memory in units of smallest flavor)
- var usageMultiple uint64
- if smallestMem := flavorToSmallestMemory[server.FlavorName]; smallestMem > 0 {
- usageMultiple = (server.FlavorRAM + smallestMem - 1) / smallestMem // Round up
- }
+ smallestMem, ok := flavorToSmallestMemory[server.FlavorName]
+ if !ok || smallestMem == 0 {
+ return nil, fmt.Errorf("missing smallest-flavor memory for %q", server.FlavorName)
+ }
+ usageMultiple := (server.FlavorRAM + smallestMem - 1) / smallestMem // Round upAlso applies to: 201-208
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/usage.go` around lines 171 -
183, The current mapping can leave servers with unknown flavors (flavorToGroup
or flavorToSmallestMemory missing) which silently yields FlavorGroup=="" and
UsageMultiple==0; update the code that assigns FlavorGroup/UsageMultiple (the
place that reads server.FlavorName before calling buildUsageResponse) to check
for existence in flavorToGroup and flavorToSmallestMemory and fail fast (return
an error or log and return) when a flavor is not found, including the server
ID/name and the missing flavor name in the error; ensure the same validation is
applied for any similar logic that computes UsageMultiple so buildUsageResponse
never receives a server with an empty FlavorGroup or zero UsageMultiple.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
internal/scheduling/reservations/commitments/api_report_usage.go (1)
92-98:⚠️ Potential issue | 🟡 MinorRequire the exact
report-usagepath shape here.Because
internal/scheduling/reservations/commitments/api.goLine 56 mounts this handler on a prefix,len(parts) < 5still accepts/v1/commitments/projects/<id>/report-usage/extraas valid. Tighten the check to the full 5-segment route before extractingparts[3].Minimal fix
parts := strings.Split(strings.Trim(path, "/"), "/") // Expected: ["v1", "commitments", "projects", "<uuid>", "report-usage"] - if len(parts) < 5 { - return "", fmt.Errorf("path too short: %s", path) - } - if parts[2] != "projects" || parts[4] != "report-usage" { + if len(parts) != 5 || parts[0] != "v1" || parts[1] != "commitments" || parts[2] != "projects" || parts[4] != "report-usage" { return "", fmt.Errorf("unexpected path format: %s", path) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_report_usage.go` around lines 92 - 98, The handler currently allows extra path segments because it only checks len(parts) < 5; change the validation to require exactly five segments (len(parts) == 5) and keep the existing checks for parts[2] == "projects" and parts[4] == "report-usage" before accessing parts[3]; update the error messages accordingly so the route only accepts the exact shape "/v1/commitments/projects/<id>/report-usage" when extracting parts[3].
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduling/reservations/commitments/api_report_capacity.go`:
- Around line 6-10: The handler currently treats any json.Decoder error as an
empty payload by defaulting to liquid.ServiceCapacityRequest{}, which hides
malformed JSON; update the JSON decode logic (where json.NewDecoder(...).Decode
is called) to only accept io.EOF as the empty-body case: if err == io.EOF then
proceed with an empty liquid.ServiceCapacityRequest{}, otherwise return HTTP 400
(bad request) and surface/log the decode error (and increment the existing
error/metric) so malformed payloads are rejected instead of silently accepted.
Ensure this same change is applied to the other decode site noted in the file
(the second decode block around lines 36-41).
In `@internal/scheduling/reservations/commitments/api.go`:
- Around line 34-36: NewAPI currently calls NewAPIWithConfig(client,
DefaultConfig(), nil) which lets HTTPAPI register HandleReportUsage even when
api.novaClient is nil; update NewAPI/NewAPIWithConfig usage so the report-usage
handler is only registered when a non-nil UsageNovaClient is provided (check
api.novaClient before registering HandleReportUsage in the constructor) or
change NewAPI to require and accept a UsageNovaClient parameter so an API cannot
be created without the dependency; refer to NewAPI, NewAPIWithConfig,
HandleReportUsage, UsageCalculator and api.novaClient when making the guard or
signature change.
---
Duplicate comments:
In `@internal/scheduling/reservations/commitments/api_report_usage.go`:
- Around line 92-98: The handler currently allows extra path segments because it
only checks len(parts) < 5; change the validation to require exactly five
segments (len(parts) == 5) and keep the existing checks for parts[2] ==
"projects" and parts[4] == "report-usage" before accessing parts[3]; update the
error messages accordingly so the route only accepts the exact shape
"/v1/commitments/projects/<id>/report-usage" when extracting parts[3].
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ced3137b-b4b2-483b-a8cd-3aa5917bb434
📒 Files selected for processing (6)
helm/bundles/cortex-nova/alerts/nova.alerts.yamlinternal/scheduling/reservations/commitments/api.gointernal/scheduling/reservations/commitments/api_report_capacity.gointernal/scheduling/reservations/commitments/api_report_capacity_monitor.gointernal/scheduling/reservations/commitments/api_report_usage.gointernal/scheduling/reservations/commitments/api_report_usage_monitor.go
| import ( | ||
| "encoding/json" | ||
| "net/http" | ||
| "strconv" | ||
| "time" |
There was a problem hiding this comment.
Only treat io.EOF as the acceptable empty-body case.
Right now any JSON decode failure falls back to liquid.ServiceCapacityRequest{}, so malformed payloads get a 200 response instead of a 400. That breaks the API contract and also hides the client errors from the new Prometheus metrics/alerts.
Minimal fix
import (
"encoding/json"
+ "errors"
+ "io"
"net/http"
"strconv"
"time"
"github.com/sapcc/go-api-declarations/liquid"
)
@@
// Parse request body (may be empty or contain ServiceCapacityRequest)
var req liquid.ServiceCapacityRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
- // Empty body is acceptable for capacity reports
- req = liquid.ServiceCapacityRequest{}
+ // Empty body is acceptable for capacity reports.
+ if !errors.Is(err, io.EOF) {
+ statusCode = http.StatusBadRequest
+ http.Error(w, "Invalid request body: "+err.Error(), statusCode)
+ api.recordCapacityMetrics(statusCode, startTime)
+ return
+ }
+ req = liquid.ServiceCapacityRequest{}
}Also applies to: 36-41
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/api_report_capacity.go` around
lines 6 - 10, The handler currently treats any json.Decoder error as an empty
payload by defaulting to liquid.ServiceCapacityRequest{}, which hides malformed
JSON; update the JSON decode logic (where json.NewDecoder(...).Decode is called)
to only accept io.EOF as the empty-body case: if err == io.EOF then proceed with
an empty liquid.ServiceCapacityRequest{}, otherwise return HTTP 400 (bad
request) and surface/log the decode error (and increment the existing
error/metric) so malformed payloads are rejected instead of silently accepted.
Ensure this same change is applied to the other decode site noted in the file
(the second decode block around lines 36-41).
| func NewAPI(client client.Client) *HTTPAPI { | ||
| return NewAPIWithConfig(client, DefaultConfig()) | ||
| return NewAPIWithConfig(client, DefaultConfig(), nil) | ||
| } |
There was a problem hiding this comment.
Don’t publish report-usage from the nil-dependency constructor.
Line 35 still passes nil into NewAPIWithConfig(), but Line 56 always registers HandleReportUsage(). Since HandleReportUsage() constructs a UsageCalculator with api.novaClient in internal/scheduling/reservations/commitments/api_report_usage.go Lines 61-63, NewAPI() now creates an API instance that exposes an endpoint it cannot actually serve.
Minimal guard
func (api *HTTPAPI) Init(mux *http.ServeMux, registry prometheus.Registerer) {
registry.MustRegister(&api.monitor)
registry.MustRegister(&api.usageMonitor)
registry.MustRegister(&api.capacityMonitor)
mux.HandleFunc("/v1/commitments/change-commitments", api.HandleChangeCommitments)
mux.HandleFunc("/v1/commitments/report-capacity", api.HandleReportCapacity)
mux.HandleFunc("/v1/commitments/info", api.HandleInfo)
- mux.HandleFunc("/v1/commitments/projects/", api.HandleReportUsage) // matches /v1/commitments/projects/:project_id/report-usage
+ if api.novaClient != nil {
+ mux.HandleFunc("/v1/commitments/projects/", api.HandleReportUsage) // matches /v1/commitments/projects/:project_id/report-usage
+ }
}A stronger fix would be to make NewAPI() require a UsageNovaClient as well, so the API cannot be constructed in an unusable state.
Also applies to: 49-56
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/api.go` around lines 34 - 36,
NewAPI currently calls NewAPIWithConfig(client, DefaultConfig(), nil) which lets
HTTPAPI register HandleReportUsage even when api.novaClient is nil; update
NewAPI/NewAPIWithConfig usage so the report-usage handler is only registered
when a non-nil UsageNovaClient is provided (check api.novaClient before
registering HandleReportUsage in the constructor) or change NewAPI to require
and accept a UsageNovaClient parameter so an API cannot be created without the
dependency; refer to NewAPI, NewAPIWithConfig, HandleReportUsage,
UsageCalculator and api.novaClient when making the guard or signature change.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
internal/scheduling/reservations/commitments/usage.go (2)
131-141:⚠️ Potential issue | 🟠 MajorFilter out commitments that are not currently active.
The loop appends every reconstructed commitment regardless of its time bounds. Commitments with
StartTimein the future orEndTimein the past should be excluded to prevent inactive commitments from affecting usage calculations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/usage.go` around lines 131 - 141, The loop over reservationsByCommitment currently includes all reconstructed commitments; modify it to skip commitments that are not active by checking the CommitmentState's time bounds (use the state returned by FromReservations): compute now := time.Now().UTC() and if state.StartTime.After(now) or state.EndTime.Before(now) then continue so only commitments with StartTime <= now <= EndTime are converted via NewCommitmentStateWithUsage and added to result keyed by azFlavorGroupKey(state.AvailabilityZone, state.FlavorGroupName).
201-208:⚠️ Potential issue | 🟠 MajorHandle unknown flavors explicitly instead of silently skipping.
When
server.FlavorNameis absent fromflavorToGrouporflavorToSmallestMemory, the code setsFlavorGroup == ""andUsageMultiple == 0. These VMs are silently skipped inbuildUsageResponse()(line 337-339), causing the report to undercount usage without any indication of the problem.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/usage.go` around lines 201 - 208, The code currently silently treats unknown server.FlavorName as empty by reading flavorToGroup and flavorToSmallestMemory without existence checks; update the block around flavorGroup and usageMultiple to check map membership (e.g., group, ok := flavorToGroup[server.FlavorName]; smallestMem, ok2 := flavorToSmallestMemory[server.FlavorName]) and if either is missing, emit an explicit warning/log entry (or record the unknown flavor in the buildUsageResponse diagnostics) and skip that server, rather than leaving flavorGroup=="" and usageMultiple==0; only compute usageMultiple when smallestMem>0 and ok2 is true, and ensure buildUsageResponse surfaces these warnings so missing flavors don’t silently undercount usage.
🧹 Nitpick comments (1)
internal/scheduling/reservations/commitments/capacity.go (1)
56-57: Update the comment to reflect the actual naming pattern.The comment on line 56 states "Resource name follows pattern: ram_" but the code now uses
commitmentResourceNamePrefix + groupName. If the prefix has changed from"ram_", this comment is stale and should be updated to accurately describe the naming pattern.📝 Proposed fix to update the comment
- // Resource name follows pattern: ram_<flavorgroup> + // Resource name follows pattern: <commitmentResourceNamePrefix><flavorgroup> resourceName := liquid.ResourceName(commitmentResourceNamePrefix + groupName)Or if the prefix is still
"ram_", keep the comment as-is.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/capacity.go` around lines 56 - 57, The inline comment above the resourceName assignment is stale; update it to describe the actual pattern produced by commitmentResourceNamePrefix + groupName (or restore to "ram_<flavorgroup>" only if commitmentResourceNamePrefix is still "ram_"); locate the assignment using resourceName := liquid.ResourceName(commitmentResourceNamePrefix + groupName) and change the comment to explicitly state the prefix variable (commitmentResourceNamePrefix) and resulting pattern (e.g., "<prefix><groupName>") so it accurately documents the naming convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduling/reservations/commitments/usage.go`:
- Around line 357-364: The call to liquid.SubresourceBuilder.Finalize()
currently drops errors and simply continues; modify buildUsageResponse to accept
a logr.Logger (pass it from callers) and, inside the block where you call
SubresourceBuilder[map[string]any]{...}.Finalize(), log the returned err with
context (e.g., vm.UUID and any relevant attribute keys) using the provided
logger before continuing; ensure you still continue on error but do not swallow
it silently so failures in SubresourceBuilder.Finalize() are recorded.
---
Duplicate comments:
In `@internal/scheduling/reservations/commitments/usage.go`:
- Around line 131-141: The loop over reservationsByCommitment currently includes
all reconstructed commitments; modify it to skip commitments that are not active
by checking the CommitmentState's time bounds (use the state returned by
FromReservations): compute now := time.Now().UTC() and if
state.StartTime.After(now) or state.EndTime.Before(now) then continue so only
commitments with StartTime <= now <= EndTime are converted via
NewCommitmentStateWithUsage and added to result keyed by
azFlavorGroupKey(state.AvailabilityZone, state.FlavorGroupName).
- Around line 201-208: The code currently silently treats unknown
server.FlavorName as empty by reading flavorToGroup and flavorToSmallestMemory
without existence checks; update the block around flavorGroup and usageMultiple
to check map membership (e.g., group, ok := flavorToGroup[server.FlavorName];
smallestMem, ok2 := flavorToSmallestMemory[server.FlavorName]) and if either is
missing, emit an explicit warning/log entry (or record the unknown flavor in the
buildUsageResponse diagnostics) and skip that server, rather than leaving
flavorGroup=="" and usageMultiple==0; only compute usageMultiple when
smallestMem>0 and ok2 is true, and ensure buildUsageResponse surfaces these
warnings so missing flavors don’t silently undercount usage.
---
Nitpick comments:
In `@internal/scheduling/reservations/commitments/capacity.go`:
- Around line 56-57: The inline comment above the resourceName assignment is
stale; update it to describe the actual pattern produced by
commitmentResourceNamePrefix + groupName (or restore to "ram_<flavorgroup>" only
if commitmentResourceNamePrefix is still "ram_"); locate the assignment using
resourceName := liquid.ResourceName(commitmentResourceNamePrefix + groupName)
and change the comment to explicitly state the prefix variable
(commitmentResourceNamePrefix) and resulting pattern (e.g.,
"<prefix><groupName>") so it accurately documents the naming convention.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f7ae84a2-d27e-4a5e-a02a-d6a75afb69fe
📒 Files selected for processing (3)
internal/scheduling/reservations/commitments/api_info.gointernal/scheduling/reservations/commitments/capacity.gointernal/scheduling/reservations/commitments/usage.go
| subresource, err := liquid.SubresourceBuilder[map[string]any]{ | ||
| ID: vm.UUID, | ||
| Attributes: attributes, | ||
| }.Finalize() | ||
| if err != nil { | ||
| // This should never happen with valid attributes, skip this VM | ||
| continue | ||
| } |
There was a problem hiding this comment.
Log the error from SubresourceBuilder.Finalize() before skipping.
Silently continuing when Finalize() fails makes debugging impossible. Even if the error is unexpected, logging it helps diagnose issues in production.
🔧 Proposed fix
subresource, err := liquid.SubresourceBuilder[map[string]any]{
ID: vm.UUID,
Attributes: attributes,
}.Finalize()
if err != nil {
- // This should never happen with valid attributes, skip this VM
+ log.V(1).Info("failed to finalize subresource, skipping VM",
+ "vmUUID", vm.UUID, "error", err)
continue
}Note: This requires passing log logr.Logger to buildUsageResponse.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/usage.go` around lines 357 -
364, The call to liquid.SubresourceBuilder.Finalize() currently drops errors and
simply continues; modify buildUsageResponse to accept a logr.Logger (pass it
from callers) and, inside the block where you call
SubresourceBuilder[map[string]any]{...}.Finalize(), log the returned err with
context (e.g., vm.UUID and any relevant attribute keys) using the provided
logger before continuing; ensure you still continue on error but do not swallow
it silently so failures in SubresourceBuilder.Finalize() are recorded.
Test Coverage ReportTest Coverage 📊: 68.3% |
No description provided.