Skip to content

Conversation

@86254860
Copy link
Collaborator

@86254860 86254860 commented Dec 10, 2025

Summary

Implements a cloud-agnostic, reusable Kubernetes sidecar container that monitors adapter operations and updates Job status based on execution results.

Changes

  • Core functionality: Monitors adapter container execution via file polling and container state watching
  • Failure handling: Detects and reports OOMKilled, crashes, timeouts, and invalid results
  • Status updates: Updates Kubernetes Job conditions with detailed execution information
  • Architecture: Zero-dependency on adapter implementation - uses simple JSON result contract
  • Infrastructure: Includes Dockerfile, Makefile, comprehensive tests
  • Unit tests pass with race detection, mock K8s client validates status update logic

Summary by CodeRabbit

  • New Features

    • Added a Status Reporter service that watches adapter result files, detects adapter/container outcomes, and updates Kubernetes Job status (success/failure/timeout).
  • Documentation

    • Expanded README with overview, components, adapter result contract/schema, shared volume guidance, quick start, repo structure, license, and contact.
  • Tests

    • Added extensive tests covering config, result parsing, reporter flows, Kubernetes interactions, and lifecycle/shutdown behavior.
  • Chores

    • Added build and image workflows plus developer tooling (build, lint, test, image targets).

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Dec 10, 2025

Walkthrough

Adds a new status-reporter subsystem that watches adapter result files, parses a JSON result contract, and updates a Kubernetes Job's status. Implements environment-driven configuration loading and validation, a Kubernetes client wrapper for Job status updates and pod/container inspection, a reporter orchestrator with polling, timeout and termination handling, a CLI entrypoint with graceful shutdown, container build artifacts (Dockerfile, Makefile), parser and result contract handling (size limits, UTF-8-safe truncation), and comprehensive unit tests across config, k8s, result, parser, reporter, and main packages. README expanded with documentation and adapter contract details.

Sequence Diagram

sequenceDiagram
    participant Main
    participant Config
    participant Reporter
    participant FS as Filesystem
    participant Parser
    participant K8sWrapper as K8s
    participant KubeAPI

    Main->>Config: Load() from env
    Config-->>Main: Validated Config
    Main->>Reporter: NewReporter(config)
    Reporter->>K8sWrapper: NewClient(namespace, jobName)
    K8sWrapper->>KubeAPI: in-cluster auth / connect
    KubeAPI-->>K8sWrapper: client ready
    Main->>Reporter: Run(ctx)

    loop Polling
      Reporter->>FS: Stat resultsPath
      alt result file found
        FS-->>Reporter: file bytes
        Reporter->>Parser: Parse(data)
        alt valid AdapterResult
          Parser-->>Reporter: AdapterResult
          Reporter->>K8sWrapper: UpdateJobStatus(condition)
          K8sWrapper->>KubeAPI: PATCH Job status
          KubeAPI-->>K8sWrapper: PATCH response
        else parse error
          Parser-->>Reporter: error
          Reporter->>K8sWrapper: UpdateJobStatus(invalid-result)
        end
        Reporter-->>Main: Exit (done)
      else no file and maxWait reached
        Reporter->>K8sWrapper: GetAdapterContainerStatus(pod, container)
        K8sWrapper->>KubeAPI: GET Pod
        KubeAPI-->>K8sWrapper: Pod/container state
        alt container terminated
          Reporter->>K8sWrapper: UpdateJobStatus(terminated-reason)
        else container running or unknown
          Reporter->>K8sWrapper: UpdateJobStatus(timeout-reason)
        end
        Reporter-->>Main: Exit (timeout)
      end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–90 minutes

  • Review focus:
    • status-reporter/pkg/reporter/reporter.go: concurrency, channel lifecycles, timeout/termination handling, condition mapping.
    • status-reporter/pkg/k8s/client.go: in-cluster client creation, Job status upsert with retry-on-conflict, LastTransitionTime handling.
    • status-reporter/pkg/config/config.go: env loading, validation rules, path safety checks and error typing.
    • status-reporter/pkg/result/* and parser.go: JSON contract validation, size limits, UTF-8-safe truncation.
    • cmd/reporter/main.go: signal handling and graceful shutdown sequencing.
    • Tests and mocks: completeness and correctness of mock behaviors and test coverage.
    • Build artifacts: Dockerfile and Makefile for multi-stage build and image workflows.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 52.17% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: implementing a status reporter component for the GCP validator. It is specific, concise, and directly reflects the primary objective of the PR.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
README.md (1)

210-221: Add a language identifier to the repository-structure code block

markdownlint (MD040) is flagging the fenced block starting at Line 210 because it lacks a language. Consider changing it to something like:

-```
+```text
 adapter-validation-gcp/
 ...

to keep linting clean.

</blockquote></details>
<details>
<summary>status-reporter/Dockerfile (1)</summary><blockquote>

`1-32`: **Align Dockerfile build flags with Makefile metadata (optional)**

The multi‑stage build and distroless nonroot runtime look solid. To keep version/commit/buildDate consistent with `make binary`, consider reusing the same `-ldflags` here (or driving image builds via the Makefile) instead of hard‑coding `-ldflags="-w -s"`.

</blockquote></details>
<details>
<summary>status-reporter/Makefile (1)</summary><blockquote>

`1-171`: **Consider adding an `all` target and optionally shortening long target bodies**

The Makefile is well‑structured and covers the usual workflows. To satisfy tools like checkmake and provide a conventional default:

```diff
+.PHONY: all
+all: verify

If you care about the maxbodylength warnings for lint and fmt, you could also move the shell bodies into small helper scripts or secondary targets, but that’s purely stylistic at this point.

status-reporter/pkg/reporter/reporter_test.go (1)

399-402: Use filepath.Join for path construction.

String concatenation for path construction can lead to issues on different platforms and with edge cases.

 		BeforeEach(func() {
 			tempDir = GinkgoT().TempDir()
-			resultsPath = tempDir + "/adapter-result.json"
+			resultsPath = filepath.Join(tempDir, "adapter-result.json")
 		})

You'll also need to add "path/filepath" to the imports.

status-reporter/pkg/k8s/client.go (1)

30-46: Consider validating required parameters.

The constructor doesn't validate that namespace and jobName are non-empty. While the Kubernetes API will return errors later, validating upfront provides clearer error messages.

 func NewClient(namespace, jobName string) (*Client, error) {
+	if namespace == "" {
+		return nil, fmt.Errorf("namespace is required")
+	}
+	if jobName == "" {
+		return nil, fmt.Errorf("jobName is required")
+	}
+
 	config, err := rest.InClusterConfig()
status-reporter/pkg/config/config.go (1)

162-170: Duplicate ValidationError type exists in result package.

There's a ValidationError type in both config and result packages (see relevant snippet from result/types.go lines 68-71). While they're in different packages, this could lead to confusion. Consider whether these could share a common type in a shared package, or if the duplication is intentional for package isolation.

status-reporter/pkg/reporter/reporter.go (1)

115-120: Using time.Sleep for goroutine cleanup is fragile.

The time.Sleep(goroutineCleanupDelay) approach for waiting for the polling goroutine to exit is timing-dependent and could fail under high load. Consider using a sync.WaitGroup or a done channel that the goroutine signals when it exits.

+import "sync"
+
 func (r *StatusReporter) Run(ctx context.Context) error {
 	// ... existing setup ...
 
 	resultChan := make(chan *result.AdapterResult, 1)
 	errorChan := make(chan error, 1)
 	done := make(chan struct{})
+	var wg sync.WaitGroup
+	wg.Add(1)
 
-	go r.pollForResults(timeoutCtx, resultChan, errorChan, done)
+	go func() {
+		defer wg.Done()
+		r.pollForResults(timeoutCtx, resultChan, errorChan, done)
+	}()
 
 	// ... select block ...
 
 	close(done)
 	cancel()
-	time.Sleep(goroutineCleanupDelay)
+	wg.Wait()
 
 	return reportErr
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e51f3f9 and bfb02ec.

⛔ Files ignored due to path filters (1)
  • status-reporter/go.sum is excluded by !**/*.sum
📒 Files selected for processing (20)
  • README.md (1 hunks)
  • status-reporter/Dockerfile (1 hunks)
  • status-reporter/Makefile (1 hunks)
  • status-reporter/cmd/reporter/main.go (1 hunks)
  • status-reporter/go.mod (1 hunks)
  • status-reporter/pkg/config/config.go (1 hunks)
  • status-reporter/pkg/config/config_suite_test.go (1 hunks)
  • status-reporter/pkg/config/config_test.go (1 hunks)
  • status-reporter/pkg/k8s/client.go (1 hunks)
  • status-reporter/pkg/k8s/client_test.go (1 hunks)
  • status-reporter/pkg/k8s/k8s_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_test.go (1 hunks)
  • status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1 hunks)
  • status-reporter/pkg/result/parser.go (1 hunks)
  • status-reporter/pkg/result/parser_test.go (1 hunks)
  • status-reporter/pkg/result/result_suite_test.go (1 hunks)
  • status-reporter/pkg/result/types.go (1 hunks)
  • status-reporter/pkg/result/types_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (9)
status-reporter/pkg/result/types_test.go (2)
status-reporter/pkg/result/types.go (6)
  • AdapterResult (21-33)
  • StatusSuccess (10-10)
  • StatusFailure (11-11)
  • DefaultReason (13-13)
  • DefaultMessage (14-14)
  • ValidationError (69-72)
status-reporter/pkg/config/config.go (1)
  • ValidationError (163-166)
status-reporter/pkg/k8s/client_test.go (1)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
status-reporter/pkg/config/config_test.go (1)
status-reporter/pkg/config/config.go (2)
  • Load (48-96)
  • Config (13-23)
status-reporter/pkg/result/parser_test.go (2)
status-reporter/pkg/result/parser.go (2)
  • Parser (15-15)
  • NewParser (18-20)
status-reporter/pkg/result/types.go (4)
  • StatusSuccess (10-10)
  • StatusFailure (11-11)
  • DefaultReason (13-13)
  • DefaultMessage (14-14)
status-reporter/pkg/result/parser.go (1)
status-reporter/pkg/result/types.go (1)
  • AdapterResult (21-33)
status-reporter/cmd/reporter/main.go (2)
status-reporter/pkg/config/config.go (2)
  • Load (48-96)
  • Config (13-23)
status-reporter/pkg/reporter/reporter.go (1)
  • NewReporter (54-65)
status-reporter/pkg/result/types.go (1)
status-reporter/pkg/config/config.go (1)
  • ValidationError (163-166)
status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
status-reporter/pkg/config/config.go (2)
status-reporter/pkg/reporter/reporter.go (1)
  • DefaultConditionType (18-18)
status-reporter/pkg/result/types.go (1)
  • ValidationError (69-72)
🪛 checkmake (0.2.2)
status-reporter/Makefile

[warning] 85-85: Target body for "lint" exceeds allowed length of 5 (8).

(maxbodylength)


[warning] 96-96: Target body for "fmt" exceeds allowed length of 5 (6).

(maxbodylength)


[warning] 170-170: Missing required phony target "all"

(minphony)

🪛 markdownlint-cli2 (0.18.1)
README.md

210-210: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (23)
README.md (1)

5-205: Status reporter documentation is clear and adapter-focused

The overview, adapter contract, and status examples form a coherent spec for adapter authors; the JSON schema, validation rules, and Job condition outputs look self‑consistent and aligned with the described status‑reporter behavior.

status-reporter/pkg/k8s/k8s_suite_test.go (1)

1-13: Suite bootstrap for k8s tests looks standard

The TestK8s entrypoint cleanly wires the suite and matches the pattern used in your other packages; nothing to change here.

status-reporter/pkg/result/result_suite_test.go (1)

1-13: Result suite bootstrap is consistent and correct

The TestResult function provides the expected Ginkgo entrypoint for the result package and is consistent with the other suites.

status-reporter/pkg/k8s/client_test.go (1)

1-42: JobCondition tests cover key struct behaviors

The tests validate both a fully populated JobCondition and the zero LastTransitionTime case, which is a good minimal sanity check for this type; assertions look correct.

status-reporter/pkg/config/config_suite_test.go (1)

1-13: Config suite bootstrap is wired correctly

TestConfig registers the fail handler and runs specs under a clear suite name; this matches the other packages and looks good.

status-reporter/go.mod (1)

36-37: Double-check the go.yaml.in/yaml module paths

The indirect requirements

go.yaml.in/yaml/v2 v2.4.2 // indirect
go.yaml.in/yaml/v3 v3.0.4 // indirect

look unusual and may be a typo of the standard gopkg.in/yaml.v2 / gopkg.in/yaml.v3 modules (you already have gopkg.in/yaml.v3 listed separately at Line 50). If these paths are incorrect, go mod tidy / go build will fail to resolve them. Please verify and adjust or drop them as appropriate.

status-reporter/cmd/reporter/main.go (1)

86-102: LGTM!

The configuration logging is clear and helpful for debugging, with appropriate handling of the auto-detect case for the adapter container name.

status-reporter/pkg/reporter/reporter_suite_test.go (1)

1-13: LGTM!

Standard Ginkgo/Gomega test suite bootstrap following best practices.

status-reporter/pkg/result/types.go (2)

9-33: LGTM!

The constants and struct definition are well-designed. Using json.RawMessage for Details is the right choice for preserving arbitrary JSON data.


68-76: LGTM!

The ValidationError implementation is straightforward and follows Go conventions.

status-reporter/pkg/config/config_test.go (1)

1-271: LGTM!

Comprehensive test coverage for configuration loading and validation. Tests cover all important cases including required fields, defaults, validation constraints, and error conditions.

status-reporter/pkg/result/types_test.go (1)

1-198: LGTM!

Comprehensive test coverage for AdapterResult validation, JSON marshaling, and error formatting. The tests cover all important edge cases including length limits, whitespace handling, and nested JSON details.

status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1)

1-35: LGTM!

The mock implementation is clean and suitable for testing. The injectable function fields provide good flexibility for test scenarios.

status-reporter/pkg/result/parser_test.go (1)

1-153: LGTM!

Comprehensive parser tests covering all important cases including file size limits, invalid JSON, validation errors, and default value application.

status-reporter/pkg/result/parser.go (1)

1-53: LGTM!

The parser implementation is clean and includes appropriate safety checks:

  • 1MB file size limit prevents memory exhaustion
  • Empty file detection
  • Clear error messages with context
status-reporter/pkg/reporter/reporter_test.go (1)

1-647: Comprehensive test coverage with good structure.

The test suite covers all the key paths including success/failure scenarios, timeout handling, error propagation, and edge cases like empty files and invalid JSON. The use of GinkgoT().TempDir() ensures proper cleanup.

status-reporter/pkg/k8s/client.go (2)

58-97: Well-implemented retry logic with proper conflict handling.

The UpdateJobStatus method correctly uses retry.RetryOnConflict for optimistic concurrency control, properly handles the NotFound case by breaking out of retry, and correctly updates or appends conditions.


116-129: Verify Go version requirement for loop variable safety.

Taking the address of cs inside the range loop (lines 119 and 127) depends on the Go version:

  • Go 1.22+: Safe — loop variables have per-iteration scope
  • Go < 1.22: Unsafe — all iterations share the same variable, so the returned pointer would reference the last element

Confirm the go.mod file specifies go 1.22 or later to ensure this pattern is safe.

status-reporter/pkg/config/config.go (2)

117-150: The .. check has false positives but is acceptable as defense-in-depth.

The check strings.Contains(c.ResultsPath, "..") on line 119 could trigger false positives for paths like foo..bar.json. However, since filepath.Clean is called afterward and the path must be under /results, this additional check is a reasonable defense-in-depth measure against traversal attacks.


48-96: Clean configuration loading with proper validation flow.

The Load function follows a good pattern: required variables first with early return on error, optional variables with sensible defaults, then validation. Error handling is consistent throughout.

status-reporter/pkg/reporter/reporter.go (3)

122-169: Well-structured polling with proper channel handling.

The pollForResults function correctly handles all exit paths (done signal, context cancellation, error, success) with non-blocking sends to channels. The ticker cleanup via defer is appropriate.


214-242: Good use of separate timeout context for container status check.

Using context.Background() with containerStatusCheckTimeout on line 219 is the right approach since the parent context may have already timed out. This ensures the container status check can complete and provide more specific error information.


244-274: Comprehensive handling of container termination states.

The function correctly distinguishes between OOMKilled, non-zero exit codes, and the edge case of exit code 0 without a result file (treated as a crash). The error messages provide useful debugging information.

@86254860 86254860 force-pushed the HYPERFLEET-264-status-reporter-for-gcp-validator branch from bfb02ec to ae37adf Compare December 10, 2025 06:17
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
status-reporter/cmd/reporter/main.go (1)

44-75: Signal shutdown path ignores reporter errors and always exits 0.

In the SIGTERM/SIGINT branch you cancel ctx and wait for doneChan or the shutdown timer, but you never read from errChan. If rep.Run(ctx) returns an error during shutdown (e.g., failed status update), the process still exits with status 0, which can hide real failures from Kubernetes/Job controllers.

If you want exit codes to consistently reflect reporter success/failure, consider reading from errChan when doneChan is closed in this branch as well (mirroring the non‑signal path) and exiting with 1 on error.

🧹 Nitpick comments (5)
status-reporter/pkg/errors/validation.go (1)

3-11: ValidationError implementation looks good; consider package naming.

The ValidationError type and its Error() method are straightforward and usable across config/result validation. One minor thought: using package name errors can be confusing alongside the stdlib errors package and forces callers to alias one of them; consider a more specific name (e.g., valerrors or similar) if it ever becomes annoying in callers.

status-reporter/Makefile (1)

84-96: Optional: add an all target and keep make checkers happy.

Tools like checkmake are flagging the missing all target and long bodies for lint/fmt. If you care about those checks, you could add a simple aggregator:

.PHONY: all
all: verify

and, if desired, factor helper recipes for lint/fmt to shorten their bodies. Not required for functionality, just to quiet static analysis.

Also applies to: 170-171

status-reporter/pkg/result/parser.go (1)

9-12: Size limit is enforced after full read; consider a streaming read if you worry about huge files.

ParseFile reads the whole file with os.ReadFile and only then enforces maxResultFileSize. That’s functionally fine, but it doesn’t actually prevent a single large read from consuming memory once.

If you anticipate untrusted or accidentally huge result files, consider switching to an os.Open + io.LimitReader (or similar) approach so you never read more than maxResultFileSize+ε bytes into memory before failing. Otherwise the current approach is OK, just keep in mind the comment slightly overstates the protection.

Also applies to: 22-38, 40-52

status-reporter/pkg/reporter/reporter.go (2)

89-122: Potential race between timeout and result delivery in Run.

Because Run does a single select over resultChan, errorChan, and timeoutCtx.Done(), if the adapter result arrives very close to maxWaitTime, both resultChan and timeoutCtx.Done() can be ready and the timeout case may win. That would incorrectly report a timeout even though a valid result was parsed.

If you care about that edge, consider restructuring to give precedence to an already‑arrived result/error over the timeout, e.g.:

  • Use a small loop that first drains resultChan/errorChan non‑blocking when the timeout fires, or
  • Drive everything off a single context but have pollForResults close a dedicated “finished” channel and let Run read the result/error after noticing completion, instead of racing it against the timeout case.

The current behavior is acceptable for most scenarios but may occasionally misclassify in near‑deadline situations.


219-246: Defensively handle nil ContainerStatus from GetAdapterContainerStatus.

UpdateFromTimeout assumes GetAdapterContainerStatus returns a non‑nil *corev1.ContainerStatus when err == nil:

containerStatus, err := r.k8sClient.GetAdapterContainerStatus(...)
if err != nil {
    ...
} else if containerStatus.State.Terminated != nil {
    ...
}

If an implementation ever returns (nil, nil) (e.g., “pod not found” encoded as no error + nil status), this will panic on containerStatus.State. It’s unlikely but cheap to guard against:

-} else if containerStatus.State.Terminated != nil {
+} else if containerStatus != nil && containerStatus.State.Terminated != nil {

This keeps the reporter robust against slightly looser client implementations.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bfb02ec and ae37adf.

⛔ Files ignored due to path filters (1)
  • status-reporter/go.sum is excluded by !**/*.sum
📒 Files selected for processing (21)
  • README.md (1 hunks)
  • status-reporter/Dockerfile (1 hunks)
  • status-reporter/Makefile (1 hunks)
  • status-reporter/cmd/reporter/main.go (1 hunks)
  • status-reporter/go.mod (1 hunks)
  • status-reporter/pkg/config/config.go (1 hunks)
  • status-reporter/pkg/config/config_suite_test.go (1 hunks)
  • status-reporter/pkg/config/config_test.go (1 hunks)
  • status-reporter/pkg/errors/validation.go (1 hunks)
  • status-reporter/pkg/k8s/client.go (1 hunks)
  • status-reporter/pkg/k8s/client_test.go (1 hunks)
  • status-reporter/pkg/k8s/k8s_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_test.go (1 hunks)
  • status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1 hunks)
  • status-reporter/pkg/result/parser.go (1 hunks)
  • status-reporter/pkg/result/parser_test.go (1 hunks)
  • status-reporter/pkg/result/result_suite_test.go (1 hunks)
  • status-reporter/pkg/result/types.go (1 hunks)
  • status-reporter/pkg/result/types_test.go (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • status-reporter/pkg/result/types_test.go
  • README.md
🚧 Files skipped from review as they are similar to previous changes (12)
  • status-reporter/pkg/config/config_suite_test.go
  • status-reporter/pkg/result/result_suite_test.go
  • status-reporter/pkg/result/parser_test.go
  • status-reporter/Dockerfile
  • status-reporter/pkg/k8s/client_test.go
  • status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go
  • status-reporter/pkg/k8s/k8s_suite_test.go
  • status-reporter/go.mod
  • status-reporter/pkg/config/config_test.go
  • status-reporter/pkg/k8s/client.go
  • status-reporter/pkg/result/types.go
  • status-reporter/pkg/reporter/reporter_suite_test.go
🧰 Additional context used
🧬 Code graph analysis (4)
status-reporter/pkg/reporter/reporter_test.go (5)
status-reporter/pkg/reporter/reporter.go (10)
  • StatusReporter (42-51)
  • NewReporterWithClient (68-70)
  • DefaultConditionType (19-19)
  • ConditionStatusTrue (21-21)
  • ConditionStatusFalse (22-22)
  • ReasonAdapterCrashed (24-24)
  • ReasonAdapterOOMKilled (25-25)
  • ReasonAdapterExitedWithError (26-26)
  • ReasonAdapterTimeout (27-27)
  • ReasonInvalidResultFormat (28-28)
status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (2)
  • MockK8sClient (12-16)
  • NewMockK8sClient (18-20)
status-reporter/pkg/config/config.go (1)
  • DefaultConditionType (31-31)
status-reporter/pkg/result/types.go (3)
  • AdapterResult (24-36)
  • StatusSuccess (13-13)
  • StatusFailure (14-14)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
status-reporter/pkg/result/parser.go (1)
status-reporter/pkg/result/types.go (1)
  • AdapterResult (24-36)
status-reporter/cmd/reporter/main.go (2)
status-reporter/pkg/config/config.go (2)
  • Load (50-98)
  • Config (15-25)
status-reporter/pkg/reporter/reporter.go (1)
  • NewReporter (54-65)
status-reporter/pkg/config/config.go (2)
status-reporter/pkg/reporter/reporter.go (1)
  • DefaultConditionType (19-19)
status-reporter/pkg/errors/validation.go (1)
  • ValidationError (4-7)
🪛 checkmake (0.2.2)
status-reporter/Makefile

[warning] 85-85: Target body for "lint" exceeds allowed length of 5 (8).

(maxbodylength)


[warning] 96-96: Target body for "fmt" exceeds allowed length of 5 (6).

(maxbodylength)


[warning] 170-170: Missing required phony target "all"

(minphony)

🔇 Additional comments (3)
status-reporter/Makefile (1)

19-26: Verify that ldflags targets (version/commit/buildDate/tag) exist in main.

The LDFLAGS inject main.version, main.commit, main.buildDate, and optionally main.tag. This works well if those are defined as string vars in the main package; otherwise go build will fail at link time.

Please double‑check that you have something like:

// in cmd/reporter (same main package)
var (
    version   string
    commit    string
    buildDate string
    tag       string
)

If these vars don’t exist yet, adding them (or dropping the corresponding -X flags) will avoid build issues.

Also applies to: 115-115

status-reporter/pkg/config/config.go (1)

49-98: Config loading/validation flow is clean and predictable.

The LoadValidate pipeline with ValidationError for env/field issues, plus GetPollInterval/GetMaxWaitTime helpers, is structured well and should be easy to reason about and test. Required envs, sensible defaults, and timing invariants (poll < maxWait) are all enforced in one place.

Also applies to: 100-117, 164-195

status-reporter/pkg/reporter/reporter_test.go (1)

20-648: Thorough reporter test coverage looks solid.

The suite exercises all the important paths (success/failure results, parse errors, timeouts, container termination variants, k8s update failures, and context cancellation) using the mock client and temp files. Poll intervals vs sleeps look conservative enough that flakiness risk is low. This should give high confidence in the reporter behavior.

@86254860 86254860 force-pushed the HYPERFLEET-264-status-reporter-for-gcp-validator branch from ae37adf to 3253941 Compare December 10, 2025 08:04
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
status-reporter/pkg/config/config.go (1)

126-151: Tighten ResultsPath validation and remove broad .. substring check

Current validation still (a) rejects legitimate filenames containing .. and (b) does not enforce that the path stays under the intended /results area, which weakens the sandboxing story compared to the design intent in earlier review comments.

You can both avoid false positives and constrain the path to the results directory by relying on filepath.Clean plus a base-dir check, and dropping the raw strings.Contains(".."):

 func (c *Config) validateResultsPath() error {
-	if strings.Contains(c.ResultsPath, "..") {
-		return &ValidationError{
-			Field:   "ResultsPath",
-			Message: "path traversal (..) not allowed",
-		}
-	}
-
 	if strings.HasSuffix(c.ResultsPath, "/") {
 		return &ValidationError{
 			Field:   "ResultsPath",
 			Message: "path must be a file, not a directory",
 		}
 	}
 
 	cleanPath := filepath.Clean(c.ResultsPath)
 
 	if !filepath.IsAbs(cleanPath) {
 		return &ValidationError{
 			Field:   "ResultsPath",
 			Message: "path must be absolute",
 		}
 	}
+
+	base := filepath.Clean(filepath.Dir(DefaultResultsPath)) // e.g. "/results"
+	if cleanPath != base && !strings.HasPrefix(cleanPath, base+string(os.PathSeparator)) {
+		return &ValidationError{
+			Field:   "ResultsPath",
+			Message: fmt.Sprintf("path must be within %s directory", base),
+		}
+	}
 
 	return nil
 }

This preserves the existing invariants (absolute path, file not directory) while ensuring the configured path can’t escape the /results tree, and no longer rejects harmless names like /results/adapter..json.

🧹 Nitpick comments (2)
status-reporter/Makefile (1)

61-171: Consider adding a default "all" target.

Following Makefile conventions, projects typically define an "all" target that specifies the default build artifacts. This makes make (without arguments) predictable and serves as documentation for primary project deliverables.

Suggested implementation:

.PHONY: all
all: binary image ## Build binary and image (default)

This makes the project's primary outputs explicit and allows users to build everything with a single make command.

status-reporter/pkg/config/config.go (1)

25-44: Consider centralizing DefaultConditionType to avoid divergence

Relevant snippets show DefaultConditionType is also defined in status-reporter/pkg/reporter/reporter.go. It would be safer to have a single source of truth (e.g., keep it here in config and reference it from reporter) to avoid future drift.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae37adf and 3253941.

⛔ Files ignored due to path filters (1)
  • status-reporter/go.sum is excluded by !**/*.sum
📒 Files selected for processing (20)
  • README.md (1 hunks)
  • status-reporter/Dockerfile (1 hunks)
  • status-reporter/Makefile (1 hunks)
  • status-reporter/cmd/reporter/main.go (1 hunks)
  • status-reporter/go.mod (1 hunks)
  • status-reporter/pkg/config/config.go (1 hunks)
  • status-reporter/pkg/config/config_suite_test.go (1 hunks)
  • status-reporter/pkg/config/config_test.go (1 hunks)
  • status-reporter/pkg/k8s/client.go (1 hunks)
  • status-reporter/pkg/k8s/client_test.go (1 hunks)
  • status-reporter/pkg/k8s/k8s_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_test.go (1 hunks)
  • status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1 hunks)
  • status-reporter/pkg/result/parser.go (1 hunks)
  • status-reporter/pkg/result/parser_test.go (1 hunks)
  • status-reporter/pkg/result/result.go (1 hunks)
  • status-reporter/pkg/result/result_suite_test.go (1 hunks)
  • status-reporter/pkg/result/result_test.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • README.md
🚧 Files skipped from review as they are similar to previous changes (9)
  • status-reporter/pkg/result/result_suite_test.go
  • status-reporter/pkg/config/config_suite_test.go
  • status-reporter/go.mod
  • status-reporter/pkg/reporter/reporter_suite_test.go
  • status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go
  • status-reporter/cmd/reporter/main.go
  • status-reporter/pkg/k8s/k8s_suite_test.go
  • status-reporter/pkg/reporter/reporter.go
  • status-reporter/Dockerfile
🧰 Additional context used
🧬 Code graph analysis (7)
status-reporter/pkg/config/config_test.go (1)
status-reporter/pkg/config/config.go (2)
  • Load (57-105)
  • Config (13-23)
status-reporter/pkg/result/result_test.go (1)
status-reporter/pkg/result/result.go (6)
  • AdapterResult (32-44)
  • StatusSuccess (11-11)
  • StatusFailure (12-12)
  • DefaultReason (14-14)
  • DefaultMessage (15-15)
  • ResultError (22-25)
status-reporter/pkg/k8s/client_test.go (1)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
status-reporter/pkg/result/parser_test.go (2)
status-reporter/pkg/result/parser.go (2)
  • Parser (15-15)
  • NewParser (18-20)
status-reporter/pkg/result/result.go (4)
  • StatusSuccess (11-11)
  • StatusFailure (12-12)
  • DefaultReason (14-14)
  • DefaultMessage (15-15)
status-reporter/pkg/result/parser.go (1)
status-reporter/pkg/result/result.go (1)
  • AdapterResult (32-44)
status-reporter/pkg/config/config.go (1)
status-reporter/pkg/reporter/reporter.go (1)
  • DefaultConditionType (19-19)
status-reporter/pkg/reporter/reporter_test.go (5)
status-reporter/pkg/reporter/reporter.go (10)
  • StatusReporter (42-51)
  • NewReporterWithClient (68-70)
  • DefaultConditionType (19-19)
  • ConditionStatusTrue (21-21)
  • ConditionStatusFalse (22-22)
  • ReasonAdapterCrashed (24-24)
  • ReasonAdapterOOMKilled (25-25)
  • ReasonAdapterExitedWithError (26-26)
  • ReasonAdapterTimeout (27-27)
  • ReasonInvalidResultFormat (28-28)
status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (2)
  • MockK8sClient (12-16)
  • NewMockK8sClient (18-20)
status-reporter/pkg/config/config.go (1)
  • DefaultConditionType (29-29)
status-reporter/pkg/result/result.go (3)
  • AdapterResult (32-44)
  • StatusSuccess (11-11)
  • StatusFailure (12-12)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
🪛 checkmake (0.2.2)
status-reporter/Makefile

[warning] 85-85: Target body for "lint" exceeds allowed length of 5 (8).

(maxbodylength)


[warning] 96-96: Target body for "fmt" exceeds allowed length of 5 (6).

(maxbodylength)


[warning] 170-170: Missing required phony target "all"

(minphony)

🔇 Additional comments (34)
status-reporter/Makefile (1)

1-171: Well-structured Makefile with comprehensive build automation.

The Makefile demonstrates solid infrastructure practices: proper metadata capture (version, commit, build date), intelligent container runtime detection with fallback error handling, race-detection-enabled testing, coverage reporting, and multi-registry image push support. The phony target declarations and variable organization follow Make conventions well.

status-reporter/pkg/k8s/client_test.go (2)

14-29: LGTM!

The test correctly validates that all fields of JobCondition can be set and retrieved as expected.


31-40: LGTM!

The test correctly validates zero-value handling for LastTransitionTime.

status-reporter/pkg/result/parser_test.go (3)

21-25: LGTM!

The test correctly validates parser instantiation.


27-116: LGTM!

Excellent test coverage for ParseFile, including both valid inputs and comprehensive error scenarios (empty file, invalid JSON, invalid status, oversized file, nonexistent file). The use of temp directories with proper cleanup is appropriate.


118-153: LGTM!

The tests appropriately validate JSON parsing, default value application for missing fields, and error handling for malformed input. The coverage is thorough.

status-reporter/pkg/result/result_test.go (4)

14-29: LGTM!

The IsSuccess tests correctly validate all status scenarios (success, failure, invalid).


31-152: LGTM!

Excellent comprehensive test coverage for validation logic, including:

  • Valid success/failure results
  • Invalid status error handling
  • Default value application for empty/whitespace fields
  • Whitespace trimming
  • Length truncation enforcement

The truncation tests are particularly valuable given the UTF-8-safe implementation.


154-186: LGTM!

The JSON marshaling tests appropriately validate unmarshaling of basic results and results with optional details (including nested structures).


188-198: LGTM!

The ResultError formatting tests correctly validate the error message format.

status-reporter/pkg/config/config_test.go (4)

39-148: LGTM!

Excellent comprehensive test coverage for the Load function, including:

  • Valid configuration with defaults
  • Custom value overrides
  • Whitespace trimming
  • Missing required field error handling
  • Invalid integer parsing error handling

The nested Context organization improves readability.


150-246: LGTM!

Excellent validation test coverage, including:

  • Valid configuration acceptance
  • Timing parameter validation (positive values, poll < max wait)
  • Security-critical path validation (path traversal prevention, absolute path requirement, file type validation)

The security-focused path validation tests are particularly valuable.


248-253: LGTM!

The test correctly validates duration conversion for poll interval.


255-260: LGTM!

The test correctly validates duration conversion for max wait time.

status-reporter/pkg/result/parser.go (3)

9-20: LGTM!

The 1MB size limit appropriately prevents memory exhaustion attacks. The stateless Parser design is clean.


22-38: LGTM!

The ParseFile implementation correctly validates file size and content before parsing. The error messages appropriately include contextual information (path, size) for debugging.


40-53: LGTM!

The Parse implementation correctly unmarshals JSON and delegates validation to AdapterResult.Validate(). Good separation of concerns.

status-reporter/pkg/reporter/reporter_test.go (7)

41-52: LGTM!

The test appropriately validates that exported constants have the expected values.


54-80: LGTM!

The tests correctly validate reporter instantiation with custom and default condition types.


82-167: LGTM!

Excellent test coverage for UpdateFromResult, including success/failure outcomes, error propagation, and custom condition type usage. The tests appropriately validate all JobCondition fields.


169-198: LGTM!

The tests correctly validate UpdateFromError behavior, including the InvalidResultFormat reason and error propagation.


200-284: LGTM!

Excellent test coverage for UpdateFromTerminatedContainer, including OOMKilled, non-zero exit, and zero exit scenarios. The comment at lines 239-245 provides valuable context explaining why zero exit code is treated as AdapterCrashed in the timeout path.


286-392: LGTM!

Excellent comprehensive test coverage for UpdateFromTimeout, including various container termination states, fallback to timeout reason when status fetch fails, and error propagation.


394-648: LGTM!

Exceptional end-to-end test coverage for the Run method, including:

  • Immediate and delayed result file availability
  • Timeout scenarios
  • Invalid file content (malformed JSON, empty, invalid status)
  • Context cancellation
  • Error propagation from status updates
  • Container termination detection

The tests appropriately use temp directories, realistic delays, and goroutines to simulate concurrent behavior.

status-reporter/pkg/result/result.go (4)

10-29: LGTM!

The constants are well-defined with appropriate length limits. The ResultError implementation correctly follows the error interface pattern.


31-49: LGTM!

The AdapterResult struct is well-documented with clear field descriptions. The IsSuccess method correctly implements status checking.


51-77: LGTM!

The Validate method correctly:

  • Enforces valid status values
  • Normalizes Reason and Message (trim, default, truncate)
  • Uses UTF-8-safe truncation to avoid corruption

This addresses the past review concern about UTF-8 corruption.


79-94: LGTM!

The truncateUTF8 helper correctly prevents UTF-8 corruption by:

  • Checking length before truncation
  • Finding the last valid rune boundary before maxBytes using utf8.RuneStart
  • Including a safety fallback

This properly addresses the UTF-8 safety concern from the past review.

status-reporter/pkg/k8s/client.go (5)

17-27: LGTM!

The constant and Client struct are appropriately defined.


29-46: LGTM!

The NewClient implementation follows standard Kubernetes client-go patterns for in-cluster configuration with appropriate error handling.


48-97: LGTM!

The UpdateJobStatus implementation correctly follows Kubernetes best practices:

  • Uses RetryOnConflict for optimistic concurrency control
  • Handles NotFound errors explicitly
  • Defaults LastTransitionTime when not set
  • Implements proper upsert logic (update existing condition or append new)

99-107: LGTM!

The GetPodStatus implementation is straightforward with appropriate error context.


109-132: LGTM!

The GetAdapterContainerStatus implementation provides good flexibility:

  • Supports explicit container name lookup
  • Falls back to finding the first non-status-reporter container
  • Includes appropriate error context in messages
status-reporter/pkg/config/config.go (1)

56-124: Config loading and timing validation look solid

Load() and Validate() handle required envs, whitespace, integer parsing, and the PollIntervalSeconds < MaxWaitTimeSeconds invariant cleanly, returning structured ValidationErrors on misconfiguration. No issues from a correctness perspective.

@86254860 86254860 force-pushed the HYPERFLEET-264-status-reporter-for-gcp-validator branch from 3253941 to b748930 Compare December 10, 2025 08:33
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
README.md (2)

57-186: Verify documentation claims against implementation for contract enforcement.

The Adapter Contract section provides comprehensive JSON schema, validation rules, and scenario examples. However, several implementation details require verification:

  • Line 20 claims the reporter "always exits successfully (never fails the parent Job)"; confirm the exit handler never returns non-zero
  • Line 62-64 specifies a 300-second default timeout and /results/adapter-result.json default path; ensure these match the implementation
  • Lines 79-83 define specific field validation (trimming, truncation, defaults); verify all rules are enforced in the result parser

Please verify the following:

  1. The status reporter's main entrypoint exits with code 0 regardless of adapter outcome
  2. Default timeout is 300 seconds (and configurable via env var)
  3. Default results path is /results/adapter-result.json (and configurable via RESULTS_PATH env var)
  4. The result parser enforces all documented validation rules:
    • reason is trimmed and truncated to 128 characters with default "NoReasonProvided"
    • message is trimmed and truncated to 1024 characters with default "No message provided"
    • status value is case-sensitive and only accepts "success" or "failure"

Consider adding a "Testing the Status Reporter" section with:

  • Example Kubernetes Job manifest integrating status reporter
  • Sample adapter container that writes valid/invalid results
  • Expected Job status outputs for each scenario

This would improve the onboarding experience for future validators.


187-207: Document Kubernetes volume ownership and cleanup expectations.

The shared volume configuration (lines 187–207) shows the volume mount but does not clarify:

  • Whether the status reporter should clean up the results file after reading
  • Implications if the adapter writes multiple result files
  • Whether persistent storage or emptyDir is required

Add clarification on:

  • Result file lifecycle (created by adapter, read by reporter, cleaned up by reporter or left for debugging?)
  • Volume cleanup behavior (emptyDir will be destroyed with Pod; is this intentional?)
  • Handling of partial or incomplete result files

This will prevent integration issues for teams deploying the status reporter.

status-reporter/Makefile (3)

84-93: Lint target needs simplification to comply with Makefile conventions.

The lint target body (lines 84–93) exceeds the standard 5-line body length limit for Makefile targets. While the implementation is correct, refactoring for readability and compliance is recommended.

Extract the golangci-lint check into a shell script or simplify the conditional:

.PHONY: lint
lint: ## Run golangci-lint
	@echo "Running golangci-lint..."
	@command -v golangci-lint >/dev/null || (echo "Error: golangci-lint not found"; exit 1)
	golangci-lint cache clean && golangci-lint run

Alternatively, document that golangci-lint must be pre-installed via CI setup.


95-102: Fmt target should be simplified to reduce body length.

The fmt target (lines 95–102) has conditional logic that slightly exceeds recommended body length. Simplify or extract the goimports logic.

Simplify by defaulting to goimports with a fallback:

.PHONY: fmt
fmt: ## Format code with gofmt and goimports
	@echo "Formatting code..."
	@command -v goimports >/dev/null && goimports -w . || gofmt -w .

This is more concise and achieves the same behavior.


110-171: Image building and verification targets are well-structured; missing "all" phony target.

The binary, clean, and image targets are clear and functional. The verify target chains lint and test appropriately. However, the Makefile lacks a default "all" phony target, which is a Makefile convention for projects with multiple build artifacts.

Add an "all" phony target to define the default build goal:

.PHONY: all
all: verify binary ## Default: verify code and build binary

.PHONY: verify
verify: lint test ## Run all verification checks (lint + test)

Place the "all" target near the help target (after line 61) and update the verify target to depend on it, or simply have "all" depend on verify and binary. This establishes a convention where make alone performs the most common build workflow.

status-reporter/pkg/config/config.go (1)

126-152: Simplify the path traversal check.

The strings.Contains(c.ResultsPath, "..") check on line 128 is overly broad and would reject legitimate filenames like /results/adapter..json. Since you already use filepath.Clean and filepath.IsAbs, path traversal sequences are normalized away, making this explicit check redundant.

Consider removing the ".." substring check:

-	if strings.Contains(c.ResultsPath, "..") {
-		return &ValidationError{
-			Field:   "ResultsPath",
-			Message: "path traversal (..) not allowed",
-		}
-	}
-

The combination of filepath.Clean and filepath.IsAbs provides sufficient protection against path traversal.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3253941 and b748930.

⛔ Files ignored due to path filters (1)
  • status-reporter/go.sum is excluded by !**/*.sum
📒 Files selected for processing (20)
  • README.md (1 hunks)
  • status-reporter/Dockerfile (1 hunks)
  • status-reporter/Makefile (1 hunks)
  • status-reporter/cmd/reporter/main.go (1 hunks)
  • status-reporter/go.mod (1 hunks)
  • status-reporter/pkg/config/config.go (1 hunks)
  • status-reporter/pkg/config/config_suite_test.go (1 hunks)
  • status-reporter/pkg/config/config_test.go (1 hunks)
  • status-reporter/pkg/k8s/client.go (1 hunks)
  • status-reporter/pkg/k8s/client_test.go (1 hunks)
  • status-reporter/pkg/k8s/k8s_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_test.go (1 hunks)
  • status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1 hunks)
  • status-reporter/pkg/result/parser.go (1 hunks)
  • status-reporter/pkg/result/parser_test.go (1 hunks)
  • status-reporter/pkg/result/result.go (1 hunks)
  • status-reporter/pkg/result/result_suite_test.go (1 hunks)
  • status-reporter/pkg/result/result_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (10)
  • status-reporter/pkg/k8s/client_test.go
  • status-reporter/pkg/config/config_suite_test.go
  • status-reporter/pkg/result/result_test.go
  • status-reporter/pkg/result/result_suite_test.go
  • status-reporter/pkg/result/parser_test.go
  • status-reporter/go.mod
  • status-reporter/pkg/k8s/client.go
  • status-reporter/Dockerfile
  • status-reporter/pkg/config/config_test.go
  • status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go
🧰 Additional context used
🧬 Code graph analysis (4)
status-reporter/pkg/result/parser.go (1)
status-reporter/pkg/result/result.go (1)
  • AdapterResult (32-44)
status-reporter/cmd/reporter/main.go (2)
status-reporter/pkg/config/config.go (2)
  • Load (57-105)
  • Config (13-23)
status-reporter/pkg/reporter/reporter.go (1)
  • NewReporter (52-59)
status-reporter/pkg/reporter/reporter.go (3)
status-reporter/pkg/k8s/client.go (2)
  • JobCondition (49-55)
  • NewClient (30-46)
status-reporter/pkg/result/parser.go (2)
  • Parser (15-15)
  • NewParser (18-20)
status-reporter/pkg/result/result.go (1)
  • AdapterResult (32-44)
status-reporter/pkg/reporter/reporter_test.go (4)
status-reporter/pkg/reporter/reporter.go (9)
  • StatusReporter (40-49)
  • NewReporterWithClient (62-64)
  • ConditionStatusTrue (19-19)
  • ConditionStatusFalse (20-20)
  • ReasonAdapterCrashed (22-22)
  • ReasonAdapterOOMKilled (23-23)
  • ReasonAdapterExitedWithError (24-24)
  • ReasonAdapterTimeout (25-25)
  • ReasonInvalidResultFormat (26-26)
status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (2)
  • MockK8sClient (12-16)
  • NewMockK8sClient (18-20)
status-reporter/pkg/result/result.go (3)
  • AdapterResult (32-44)
  • StatusSuccess (11-11)
  • StatusFailure (12-12)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
🪛 checkmake (0.2.2)
status-reporter/Makefile

[warning] 85-85: Target body for "lint" exceeds allowed length of 5 (8).

(maxbodylength)


[warning] 96-96: Target body for "fmt" exceeds allowed length of 5 (6).

(maxbodylength)


[warning] 170-170: Missing required phony target "all"

(minphony)

🔇 Additional comments (35)
README.md (1)

1-30: Documentation establishes clear component mission and cloud-agnostic design.

The Overview and Status Reporter sections accurately reflect the PR objectives: cloud-agnostic sidecar, JSON result contract, multiple failure scenario handling, and zero adapter implementation dependencies.

status-reporter/Makefile (4)

1-60: Build metadata, environment detection, and tooling configuration look solid.

The Makefile establishes clear metadata, embeds version/commit/buildDate into the binary, and robustly detects container runtimes (Docker → Podman → error). The PKG_DIRS discovery appropriately excludes vendor and test directories, and Go tooling is well-configured.


61-109: Test and coverage targets properly configured with race detection.

The test targets include race detection (-race) and comprehensive coverage reporting (HTML output). Test timeout is reasonable (30m).


42-55: Container runtime detection is robust with good error messaging.

The fallback from Docker to Podman and the explicit error message if neither is available provides good developer experience. The error exit ensures CI pipelines fail fast if the environment is misconfigured.


147-168: Dev image workflow requires QUAY_USER; validation is clear and user-friendly.

The image-dev target requires the QUAY_USER environment variable and provides helpful error messaging. The DEV_TAG based on GIT_COMMIT ensures unique tags for each iteration.

status-reporter/pkg/k8s/k8s_suite_test.go (1)

1-13: LGTM!

The test suite bootstrap follows the standard Ginkgo/Gomega pattern and is consistent with the other test suite files in the repository.

status-reporter/pkg/reporter/reporter_suite_test.go (1)

1-13: LGTM!

The test suite bootstrap follows the standard Ginkgo/Gomega pattern and is consistent with other test suite files in the repository.

status-reporter/cmd/reporter/main.go (4)

1-17: LGTM!

The imports and shutdown timeout constant are appropriate for the status reporter service.


19-56: LGTM!

The configuration loading, reporter setup, and goroutine initialization are correct. The channel setup properly handles the asynchronous reporter execution.


58-90: LGTM!

The graceful shutdown logic and exit code handling are correct. The code properly handles both signal-initiated shutdown and normal completion, exiting with appropriate status codes based on errors.


92-108: LGTM!

The configuration logging helper provides comprehensive visibility into the runtime configuration, with clear handling of the auto-detect case for the adapter container name.

status-reporter/pkg/result/parser.go (3)

1-20: LGTM!

The 1MB file size limit is a reasonable safeguard against memory exhaustion, and the stateless Parser design is clean.


22-38: LGTM!

The ParseFile method includes appropriate safeguards: empty file detection, size limits, and informative error messages.


40-53: LGTM!

The Parse method correctly unmarshals JSON and validates the result format, with proper error wrapping.

status-reporter/pkg/reporter/reporter_test.go (5)

1-51: LGTM!

The test setup and constants validation are well-structured. Testing exported constants ensures API stability.


53-166: LGTM!

The tests comprehensively cover reporter creation and result update flows, including error propagation and custom condition types.


168-283: LGTM!

The tests thoroughly cover error handling and container termination scenarios. The treatment of zero exit code without a result file as AdapterCrashed is well-reasoned and documented.


285-391: LGTM!

The timeout tests cover all scenarios including container termination, running state, and fallback behavior when container status cannot be retrieved.


393-647: LGTM!

The Run integration tests provide excellent coverage of the complete reporter workflow, including timing scenarios, invalid inputs, context cancellation, and error handling.

status-reporter/pkg/result/result.go (4)

1-29: LGTM!

The constants and error type are well-defined. The max length limits (128 for reason, 1024 for message) are appropriate for Kubernetes condition fields.


31-49: LGTM!

The AdapterResult structure is well-designed with clear field documentation. Using json.RawMessage for Details provides flexibility for adapter-specific data.


51-77: LGTM!

The validation logic correctly enforces the status contract and safely normalizes Reason and Message fields with UTF-8-aware truncation.


79-94: LGTM!

The truncateUTF8 helper correctly handles UTF-8 boundaries by finding the last valid rune start before the byte limit. This properly addresses the UTF-8 truncation concern from previous reviews.

status-reporter/pkg/config/config.go (5)

1-44: LGTM!

The Config struct and default constants are well-designed with sensible default values for polling and timeouts.


46-105: LGTM!

The Load function correctly handles required and optional environment variables and validates the configuration before returning.


107-124: LGTM!

The validation logic correctly enforces positive values and the relationship constraint between poll interval and max wait time.


154-162: LGTM!

The duration conversion helpers are straightforward and correct.


164-195: LGTM!

The environment variable helpers properly handle whitespace trimming, required fields, and integer parsing with clear error messages.

status-reporter/pkg/reporter/reporter.go (7)

1-37: LGTM!

The constants and K8sClientInterface are well-defined. The 10-second timeout for container status checks is appropriate.


39-77: LGTM!

The StatusReporter structure and constructors are well-designed. The separation between production and test constructors enables easy testing with mock clients.


79-120: LGTM!

The Run method's orchestration is excellent. The nested select on timeout (lines 104-112) cleverly handles race conditions by giving precedence to results/errors that arrive just before the timeout fires.


122-171: LGTM!

The polling loop correctly handles all edge cases with proper channel send protection using select statements to prevent blocking during shutdown.


173-195: LGTM!

The UpdateFromResult method correctly maps adapter results to Kubernetes condition status and propagates the adapter's reason and message.


197-214: LGTM!

The UpdateFromError method correctly marks the job as failed due to invalid result format and preserves the original parsing error for debugging.


216-276: LGTM!

The timeout handling is sophisticated and provides excellent diagnostics. The use of a fresh context for the container status check (line 221) ensures the check completes even if the parent context is cancelled. The termination reason mapping correctly distinguishes between OOMKilled, error exits, and logic bugs (zero exit without result file).

@86254860 86254860 force-pushed the HYPERFLEET-264-status-reporter-for-gcp-validator branch 2 times, most recently from 93ca488 to 1468e95 Compare December 10, 2025 10:21
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
status-reporter/pkg/config/config.go (1)

107-152: ResultsPath validation is mostly good; the "..") substring check is still overly broad

The timing validations and basic path checks (disallow directory-like values and require an absolute path) look good. The remaining rough edge is:

  • strings.Contains(c.ResultsPath, "..") will also reject harmless filenames like /results/adapter..json, not just paths with .. segments. Since you already normalize with filepath.Clean, you can either drop this check or tighten it to reject only actual .. path segments (e.g., splitting on os.PathSeparator and checking for a segment equal to "..").

If you later decide you need strict confinement to a base directory (e.g. /results), you could also reintroduce a prefix check in terms of filepath.Clean(ResultsBasePath) plus a path‑separator boundary as was discussed previously.

🧹 Nitpick comments (10)
status-reporter/pkg/result/result.go (1)

10-19: Result validation and UTF‑8 truncation look solid; clarify byte vs character limits

The Validate flow (status check, trimming/defaulting, then truncation via truncateUTF8) is coherent and matches how the contract is documented. The UTF‑8 helper safely avoids splitting multi‑byte runes by walking back to a utf8.RuneStart boundary, which is good.

Two small suggestions:

  • The limits (maxReasonLength = 128, maxMessageLength = 1024) are enforced in bytes, not runes. Since the README calls these “128 chars / 1024 chars”, non‑ASCII text will be truncated earlier in terms of visible characters. Either (a) switch the limits to be rune‑based, or (b) explicitly document that the limits are in bytes to avoid surprises.
  • For future‑proofing, you could add a quick guard in truncateUTF8 for non‑positive maxBytes (e.g. if maxBytes <= 0 { return "" }) so it remains safe if ever called with dynamic values.

Also applies to: 51-77, 79-93

README.md (1)

69-82: Minor contract/documentation nits: chars vs bytes wording and stray backtick

The README is nicely detailed and matches the code behavior for status/reason/message overall, but two small points:

  • The JSON schema section describes reason as “max 128 chars” and message as “max 1024 chars”, while the implementation enforces 128/1024 bytes (with UTF‑8‑safe truncation). For non‑ASCII text this can mean fewer visible characters. Consider either clarifying that these are byte limits or switching the enforcement to rune counts to match the current wording.
  • There’s a small typo in the failure example: Adapter writes to the result file:` has an extra backtick after “file”.

You could fix the typo with something like:

-   Adapter writes to the result file`:
+   Adapter writes to the result file:

Also applies to: 110-121

status-reporter/pkg/result/result_test.go (1)

129-150: Consider adding a UTF‑8 truncation test case

The truncation tests for Reason and Message use ASCII-only strings and assert len(...) == 128/1024, which is great for the byte-limit behavior. To fully lock in truncateUTF8’s intent, I’d add a case where Reason or Message contains multi-byte runes (e.g., accented or CJK characters), call Validate(), and assert that:

  • The resulting string is still valid UTF‑8 (no rune split), and
  • Its length does not exceed the configured limit.

This would guard against future regressions in the UTF‑8 truncation logic.

status-reporter/Dockerfile (1)

1-32: Multi-stage build and distroless runtime look solid

The Dockerfile cleanly builds a static binary and runs it in a nonroot distroless image with the expected path and entrypoint; nothing blocking here. If you want extra robustness, you could later pin the base images by digest to avoid surprises when upstream tags move.

status-reporter/pkg/result/parser.go (1)

9-37: Result parsing and validation are correct; consider stricter size enforcement

The Parser flow (read → empty check → max-size check → JSON unmarshal → AdapterResult.Validate) is sane and matches the intended contract. If, in the future, you expect untrusted or potentially large result files, you might consider switching from os.ReadFile to an io.LimitReader-style approach so the max-size check happens before allocating the full file into memory; for the current 1MB cap this is more of a hardening tweak than a blocker.

status-reporter/pkg/config/config_test.go (1)

13-260: Config tests give good coverage; consider reusing env key constants

The test suite does a nice job exercising Load, Validate, and the duration helpers, including error paths and path validation. To reduce duplication and keep tests in sync with the config package, you might consider importing and using the config.Env* constants instead of hardcoding "JOB_NAME", "POD_NAME", etc., in expectations and setup.

status-reporter/Makefile (1)

61-171: Makefile workflow is solid; minor checkmake nits if you care about that tool

The test, fmt, lint, and image targets are well-structured and should work fine as-is. If you plan to keep checkmake happy, you could:

  • Add a simple all target (e.g., all: verify) to satisfy the missing-phony warning.
  • Optionally move the longer lint and fmt shell blocks into small helper scripts (or secondary targets) to get under the configured max body length.

These are style-only and don’t affect correctness.

status-reporter/pkg/reporter/reporter_test.go (1)

20-646: Reporter tests give strong behavioral coverage

The test suite thoroughly exercises the reporter’s branching (success/failure results, parse errors, timeouts, container terminations, k8s update failures, and context cancellation) using the mock client—this should make refactoring much safer. The only nit is the "uses default condition type when empty" test name, which doesn’t currently assert the condition type itself; if that defaulting behavior is important to the API contract, you might add an assertion later.

status-reporter/pkg/reporter/reporter.go (2)

216-244: Consider tying the timeout container-status check to the caller’s context

UpdateFromTimeout currently uses:

checkCtx, cancel := context.WithTimeout(context.Background(), containerStatusCheckTimeout)

This means the container-status lookup ignores cancellation on the caller’s ctx and will always run up to containerStatusCheckTimeout. If you want the reporter to honor shutdown signals more strictly, you could instead parent this on the provided ctx:

checkCtx, cancel := context.WithTimeout(ctx, containerStatusCheckTimeout)

so that a canceled/expiring caller context also cancels the status lookup.


66-76: Optional: default the condition type when empty

newReporterWithClient passes conditionType straight through; if it’s an empty string (as in one of the tests), later JobCondition updates will use an empty type. This is harmless but not very useful. If you want the constructor to be more forgiving, you could treat "" as “use the default” (e.g., "Available" from the config package) rather than requiring callers to always supply a non-empty type.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b748930 and 93ca488.

⛔ Files ignored due to path filters (1)
  • status-reporter/go.sum is excluded by !**/*.sum
📒 Files selected for processing (20)
  • README.md (1 hunks)
  • status-reporter/Dockerfile (1 hunks)
  • status-reporter/Makefile (1 hunks)
  • status-reporter/cmd/reporter/main.go (1 hunks)
  • status-reporter/go.mod (1 hunks)
  • status-reporter/pkg/config/config.go (1 hunks)
  • status-reporter/pkg/config/config_suite_test.go (1 hunks)
  • status-reporter/pkg/config/config_test.go (1 hunks)
  • status-reporter/pkg/k8s/client.go (1 hunks)
  • status-reporter/pkg/k8s/client_test.go (1 hunks)
  • status-reporter/pkg/k8s/k8s_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_test.go (1 hunks)
  • status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1 hunks)
  • status-reporter/pkg/result/parser.go (1 hunks)
  • status-reporter/pkg/result/parser_test.go (1 hunks)
  • status-reporter/pkg/result/result.go (1 hunks)
  • status-reporter/pkg/result/result_suite_test.go (1 hunks)
  • status-reporter/pkg/result/result_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
  • status-reporter/pkg/reporter/reporter_suite_test.go
  • status-reporter/go.mod
  • status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go
  • status-reporter/pkg/result/result_suite_test.go
🧰 Additional context used
🧬 Code graph analysis (8)
status-reporter/pkg/k8s/client_test.go (1)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
status-reporter/cmd/reporter/main.go (2)
status-reporter/pkg/config/config.go (2)
  • Load (57-105)
  • Config (13-23)
status-reporter/pkg/reporter/reporter.go (1)
  • NewReporter (52-59)
status-reporter/pkg/result/result_test.go (1)
status-reporter/pkg/result/result.go (6)
  • AdapterResult (32-44)
  • StatusSuccess (11-11)
  • StatusFailure (12-12)
  • DefaultReason (14-14)
  • DefaultMessage (15-15)
  • ResultError (22-25)
status-reporter/pkg/result/parser_test.go (2)
status-reporter/pkg/result/parser.go (2)
  • Parser (15-15)
  • NewParser (18-20)
status-reporter/pkg/result/result.go (4)
  • StatusSuccess (11-11)
  • StatusFailure (12-12)
  • DefaultReason (14-14)
  • DefaultMessage (15-15)
status-reporter/pkg/config/config_test.go (1)
status-reporter/pkg/config/config.go (2)
  • Load (57-105)
  • Config (13-23)
status-reporter/pkg/reporter/reporter_test.go (4)
status-reporter/pkg/reporter/reporter.go (9)
  • StatusReporter (40-49)
  • NewReporterWithClient (62-64)
  • ConditionStatusTrue (19-19)
  • ConditionStatusFalse (20-20)
  • ReasonAdapterCrashed (22-22)
  • ReasonAdapterOOMKilled (23-23)
  • ReasonAdapterExitedWithError (24-24)
  • ReasonAdapterTimeout (25-25)
  • ReasonInvalidResultFormat (26-26)
status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (2)
  • MockK8sClient (12-16)
  • NewMockK8sClient (18-20)
status-reporter/pkg/result/result.go (3)
  • AdapterResult (32-44)
  • StatusSuccess (11-11)
  • StatusFailure (12-12)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
status-reporter/pkg/reporter/reporter.go (3)
status-reporter/pkg/k8s/client.go (2)
  • JobCondition (49-55)
  • NewClient (30-46)
status-reporter/pkg/result/parser.go (2)
  • Parser (15-15)
  • NewParser (18-20)
status-reporter/pkg/result/result.go (1)
  • AdapterResult (32-44)
status-reporter/pkg/result/parser.go (1)
status-reporter/pkg/result/result.go (1)
  • AdapterResult (32-44)
🪛 checkmake (0.2.2)
status-reporter/Makefile

[warning] 85-85: Target body for "lint" exceeds allowed length of 5 (8).

(maxbodylength)


[warning] 96-96: Target body for "fmt" exceeds allowed length of 5 (6).

(maxbodylength)


[warning] 170-170: Missing required phony target "all"

(minphony)

🔇 Additional comments (7)
status-reporter/pkg/config/config_suite_test.go (1)

1-13: Ginkgo test suite bootstrap looks correct

TestConfig correctly registers the Gomega fail handler and runs the "Config Suite" specs from package config_test, which is the expected Ginkgo v2 pattern for black‑box tests. No changes needed.

status-reporter/pkg/k8s/client_test.go (1)

12-40: JobCondition tests are straightforward and sufficient

Both tests exercise construction of JobCondition with and without LastTransitionTime, and validate fields as expected (including IsZero() for the zero case). This gives good baseline coverage for the struct with minimal complexity.

status-reporter/pkg/k8s/k8s_suite_test.go (1)

1-13: K8s Ginkgo suite setup is correct

TestK8s properly wires Ginkgo/Gomega for the k8s package under package k8s_test. This matches the pattern used in the other suites; no changes needed.

status-reporter/pkg/result/parser_test.go (1)

14-153: Parser tests give good end‑to‑end coverage

These tests thoroughly exercise Parser construction, file handling (including size limit and missing file), JSON parsing failures, invalid status values, and defaulting behavior. The use of per‑test temporary directories and explicit error message checks makes the suite robust. No changes needed.

status-reporter/pkg/k8s/client.go (1)

29-132: Kubernetes client wrapper looks correct and idiomatic

NewClient, UpdateJobStatus, and the container-status helpers follow standard client-go patterns: retry-on-conflict around Job status updates, clean error wrapping, and sensible adapter container selection (named container when provided, otherwise first non‑status‑reporter container). I don’t see correctness issues here for the intended in-cluster use.

status-reporter/pkg/config/config.go (1)

154-195: Env loading and numeric parsing behavior look correct

Load()’s use of getRequiredEnv for required keys and getEnvIntOrDefault for numeric options gives clear failure modes: unset values fall back to sane defaults, while malformed integers produce explicit ValidationErrors. Combined with the duration helpers, this should make misconfiguration relatively easy to diagnose and avoids silent clamping.

status-reporter/pkg/reporter/reporter.go (1)

79-120: Run/poll orchestration looks robust; nice use of channels and WaitGroup

The lifecycle around RunpollForResults (buffered channels, done signal, timeoutCtx, and WaitGroup) is clean and avoids goroutine leaks, while still giving precedence to late-arriving results/errors on timeout. This is a good structure for a sidecar-style watcher.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
status-reporter/cmd/reporter/main.go (1)

66-77: Shutdown timeout can still block indefinitely on errChan.

After the select on lines 66-71, if the shutdown timer fires before rep.Run(ctx) returns, doneChan is still open and no value has been sent to errChan. The subsequent blocking read <-errChan on line 74 will hang indefinitely, defeating the purpose of the shutdown timeout.

You should only read from errChan when doneChan has actually closed (the graceful path), and treat a timeout as a forced exit.

Apply this diff to fix the blocking issue:

 		shutdownTimer := time.NewTimer(shutdownTimeout)
 		defer shutdownTimer.Stop()
 
 		select {
 		case <-doneChan:
+			// Reporter exited within the shutdown window
+			if err := <-errChan; err != nil {
+				log.Printf("Reporter finished with error during shutdown: %v", err)
+				os.Exit(1)
+			}
 			log.Println("Reporter finished during shutdown")
+			os.Exit(0)
 		case <-shutdownTimer.C:
+			// Reporter did not stop in time; force exit
 			log.Printf("Shutdown timeout (%s) exceeded", shutdownTimeout)
+			os.Exit(1)
 		}
-
-		// Check for errors even during signal shutdown
-		if err := <-errChan; err != nil {
-			log.Printf("Reporter finished with error: %v", err)
-			os.Exit(1)
-		}
-
-		log.Println("Shutdown complete")
-		os.Exit(0)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 93ca488 and 1468e95.

⛔ Files ignored due to path filters (1)
  • status-reporter/go.sum is excluded by !**/*.sum
📒 Files selected for processing (20)
  • README.md (1 hunks)
  • status-reporter/Dockerfile (1 hunks)
  • status-reporter/Makefile (1 hunks)
  • status-reporter/cmd/reporter/main.go (1 hunks)
  • status-reporter/go.mod (1 hunks)
  • status-reporter/pkg/config/config.go (1 hunks)
  • status-reporter/pkg/config/config_suite_test.go (1 hunks)
  • status-reporter/pkg/config/config_test.go (1 hunks)
  • status-reporter/pkg/k8s/client.go (1 hunks)
  • status-reporter/pkg/k8s/client_test.go (1 hunks)
  • status-reporter/pkg/k8s/k8s_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_test.go (1 hunks)
  • status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1 hunks)
  • status-reporter/pkg/result/parser.go (1 hunks)
  • status-reporter/pkg/result/parser_test.go (1 hunks)
  • status-reporter/pkg/result/result.go (1 hunks)
  • status-reporter/pkg/result/result_suite_test.go (1 hunks)
  • status-reporter/pkg/result/result_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • status-reporter/go.mod
  • status-reporter/pkg/result/parser.go
  • status-reporter/pkg/result/parser_test.go
  • status-reporter/pkg/result/result_test.go
  • status-reporter/pkg/result/result.go
  • status-reporter/Dockerfile
  • status-reporter/pkg/config/config_suite_test.go
🧰 Additional context used
🧬 Code graph analysis (6)
status-reporter/pkg/k8s/client_test.go (1)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
status-reporter/cmd/reporter/main.go (2)
status-reporter/pkg/config/config.go (2)
  • Load (57-105)
  • Config (13-23)
status-reporter/pkg/reporter/reporter.go (1)
  • NewReporter (52-59)
status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
status-reporter/pkg/reporter/reporter_test.go (4)
status-reporter/pkg/reporter/reporter.go (9)
  • StatusReporter (40-49)
  • NewReporterWithClient (62-64)
  • ConditionStatusTrue (19-19)
  • ConditionStatusFalse (20-20)
  • ReasonAdapterCrashed (22-22)
  • ReasonAdapterOOMKilled (23-23)
  • ReasonAdapterExitedWithError (24-24)
  • ReasonAdapterTimeout (25-25)
  • ReasonInvalidResultFormat (26-26)
status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (2)
  • MockK8sClient (12-16)
  • NewMockK8sClient (18-20)
status-reporter/pkg/result/result.go (3)
  • AdapterResult (32-44)
  • StatusSuccess (11-11)
  • StatusFailure (12-12)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
status-reporter/pkg/config/config_test.go (1)
status-reporter/pkg/config/config.go (2)
  • Load (57-105)
  • Config (13-23)
status-reporter/pkg/reporter/reporter.go (3)
status-reporter/pkg/k8s/client.go (2)
  • JobCondition (49-55)
  • NewClient (30-46)
status-reporter/pkg/result/parser.go (2)
  • Parser (15-15)
  • NewParser (18-20)
status-reporter/pkg/result/result.go (1)
  • AdapterResult (32-44)
🪛 checkmake (0.2.2)
status-reporter/Makefile

[warning] 85-85: Target body for "lint" exceeds allowed length of 5 (8).

(maxbodylength)


[warning] 96-96: Target body for "fmt" exceeds allowed length of 5 (6).

(maxbodylength)


[warning] 170-170: Missing required phony target "all"

(minphony)

🔇 Additional comments (16)
status-reporter/Makefile (1)

1-171: LGTM!

The Makefile provides comprehensive build automation with proper error handling, container runtime detection, and helpful user feedback. The static analysis warnings about target body length are acceptable given the conditional logic requirements, and the missing "all" target is optional for this use case.

status-reporter/pkg/k8s/client_test.go (1)

1-42: LGTM!

The tests properly validate JobCondition struct behavior, covering both complete initialization and zero-value cases.

status-reporter/pkg/result/result_suite_test.go (1)

1-13: LGTM!

Standard Ginkgo/Gomega test suite bootstrap following the established pattern across the codebase.

status-reporter/pkg/k8s/k8s_suite_test.go (1)

1-13: LGTM!

Standard Ginkgo/Gomega test suite bootstrap following the established pattern across the codebase.

status-reporter/pkg/reporter/reporter_suite_test.go (1)

1-13: LGTM!

Standard Ginkgo/Gomega test suite bootstrap following the established pattern across the codebase.

status-reporter/cmd/reporter/main.go (3)

19-42: LGTM!

Configuration loading and reporter initialization are properly structured with appropriate error handling.


82-89: LGTM!

The normal completion path correctly reads from errChan and exits with the appropriate status code based on whether an error occurred.


92-108: LGTM!

The configuration logging helper provides clear visibility into runtime settings, including the auto-detect behavior for adapter container name.

status-reporter/pkg/config/config_test.go (1)

1-250: LGTM!

Comprehensive test coverage with proper test isolation, validation of required/optional fields, error handling, and edge cases. The tests effectively validate the configuration loading and validation logic.

status-reporter/pkg/reporter/reporter_test.go (1)

1-647: Exemplary test coverage!

The test suite is comprehensive and well-structured, covering:

  • All public constants and constructors
  • Success and failure paths for result processing
  • Container termination scenarios (OOMKilled, error exits, crashes)
  • Timeout handling with various container states
  • Edge cases (empty files, invalid JSON, context cancellation)
  • Error propagation from the Kubernetes client

The use of Ginkgo/Gomega with a mock Kubernetes client enables deterministic, readable tests without requiring a live cluster.

status-reporter/pkg/k8s/client.go (1)

1-132: LGTM!

The Kubernetes client wrapper is well-implemented:

  • Proper retry-on-conflict handling for status updates (line 59)
  • Intelligent fallback logic in GetAdapterContainerStatus that auto-detects the adapter container when name is unspecified
  • Clear error messages with contextual information
  • Correct JobCondition upsert semantics (lines 81-92)
status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1)

1-35: LGTM!

The mock client is clean and serves its testing purpose well. It properly records conditions for assertions and allows flexible behavior injection through function fields.

status-reporter/pkg/config/config.go (1)

126-145: Verify that path restriction removal is intentional.

The past review comments mentioned issues with ResultsPath validation (prefix check and .. traversal detection). The current implementation has simplified validation to only check:

  1. Path is not a directory (line 128)
  2. Path is absolute (line 137)

However, there's no longer any restriction on which directories can be accessed. If the intent was to sandbox file access to a specific directory (e.g., /results), this validation is insufficient.

If path sandboxing is required, consider adding validation to ensure the cleaned path is within an allowed base directory. If unrestricted absolute paths are intentional (adapter can specify any path), this is acceptable but should be documented as a security consideration.

Based on past review comments indicating path restriction was intended.

status-reporter/pkg/reporter/reporter.go (3)

79-120: Excellent concurrency pattern!

The Run method demonstrates strong concurrent programming practices:

  • Proper timeout context usage (line 87)
  • WaitGroup for goroutine coordination (line 93-95)
  • Graceful precedence handling on timeout (lines 104-112) that checks for late-arriving results
  • Correct cleanup sequence: close(done), cancel(), wg.Wait() (lines 115-117)

This prevents false timeouts when results arrive just before the deadline and ensures clean shutdown.


122-171: LGTM!

The polling implementation is correct:

  • Proper ticker cleanup with defer ticker.Stop() (line 127)
  • Distinguishes between file-not-exist (continue polling) and actual errors (line 140-141)
  • Non-blocking channel sends with done signal checks (lines 143-147, 154-158, 163-167) prevent goroutine leaks
  • Clean shutdown on context cancellation or done signal

173-276: LGTM!

The update methods are well-designed:

  • UpdateFromResult correctly maps adapter status to Kubernetes condition status
  • UpdateFromError properly preserves the original error while updating Job status (line 213)
  • UpdateFromTimeout intelligently checks container status first and delegates to UpdateFromTerminatedContainer when applicable
  • UpdateFromTerminatedContainer correctly distinguishes between OOMKilled, error exits, and the edge case of zero exit without result file (treated as crash)

The error propagation and status update logic is sound.

@86254860 86254860 force-pushed the HYPERFLEET-264-status-reporter-for-gcp-validator branch from 1468e95 to fd11719 Compare December 10, 2025 12:20
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1468e95 and fd11719.

⛔ Files ignored due to path filters (1)
  • status-reporter/go.sum is excluded by !**/*.sum
📒 Files selected for processing (22)
  • README.md (1 hunks)
  • status-reporter/Dockerfile (1 hunks)
  • status-reporter/Makefile (1 hunks)
  • status-reporter/cmd/reporter/main.go (1 hunks)
  • status-reporter/cmd/reporter/main_suite_test.go (1 hunks)
  • status-reporter/cmd/reporter/main_test.go (1 hunks)
  • status-reporter/go.mod (1 hunks)
  • status-reporter/pkg/config/config.go (1 hunks)
  • status-reporter/pkg/config/config_suite_test.go (1 hunks)
  • status-reporter/pkg/config/config_test.go (1 hunks)
  • status-reporter/pkg/k8s/client.go (1 hunks)
  • status-reporter/pkg/k8s/client_test.go (1 hunks)
  • status-reporter/pkg/k8s/k8s_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_test.go (1 hunks)
  • status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1 hunks)
  • status-reporter/pkg/result/parser.go (1 hunks)
  • status-reporter/pkg/result/parser_test.go (1 hunks)
  • status-reporter/pkg/result/result.go (1 hunks)
  • status-reporter/pkg/result/result_suite_test.go (1 hunks)
  • status-reporter/pkg/result/result_test.go (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • status-reporter/pkg/result/result_test.go
  • README.md
  • status-reporter/pkg/result/parser_test.go
  • status-reporter/go.mod
  • status-reporter/pkg/config/config_suite_test.go
  • status-reporter/pkg/k8s/client.go
  • status-reporter/Dockerfile
  • status-reporter/pkg/k8s/client_test.go
🧰 Additional context used
🧬 Code graph analysis (5)
status-reporter/pkg/reporter/reporter_test.go (4)
status-reporter/pkg/reporter/reporter.go (8)
  • StatusReporter (40-49)
  • ConditionStatusTrue (19-19)
  • ConditionStatusFalse (20-20)
  • ReasonAdapterCrashed (22-22)
  • ReasonAdapterOOMKilled (23-23)
  • ReasonAdapterExitedWithError (24-24)
  • ReasonAdapterTimeout (25-25)
  • ReasonInvalidResultFormat (26-26)
status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (2)
  • MockK8sClient (12-16)
  • NewMockK8sClient (18-20)
status-reporter/pkg/result/result.go (3)
  • AdapterResult (32-44)
  • StatusSuccess (11-11)
  • StatusFailure (12-12)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
status-reporter/pkg/config/config_test.go (1)
status-reporter/pkg/config/config.go (2)
  • Load (57-105)
  • Config (13-23)
status-reporter/pkg/result/parser.go (1)
status-reporter/pkg/result/result.go (1)
  • AdapterResult (32-44)
status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
status-reporter/pkg/reporter/reporter.go (3)
status-reporter/pkg/k8s/client.go (2)
  • JobCondition (49-55)
  • NewClient (30-46)
status-reporter/pkg/result/parser.go (2)
  • Parser (15-15)
  • NewParser (18-20)
status-reporter/pkg/result/result.go (1)
  • AdapterResult (32-44)
🪛 checkmake (0.2.2)
status-reporter/Makefile

[warning] 85-85: Target body for "lint" exceeds allowed length of 5 (8).

(maxbodylength)


[warning] 96-96: Target body for "fmt" exceeds allowed length of 5 (6).

(maxbodylength)


[warning] 170-170: Missing required phony target "all"

(minphony)

🔇 Additional comments (24)
status-reporter/cmd/reporter/main_suite_test.go (1)

1-13: LGTM! Standard Ginkgo test suite bootstrap.

The test suite setup follows Ginkgo v2 conventions correctly.

status-reporter/pkg/k8s/k8s_suite_test.go (1)

1-13: LGTM! Standard Ginkgo test suite bootstrap.

The K8s test suite setup follows Ginkgo v2 conventions correctly.

status-reporter/pkg/result/result_suite_test.go (1)

1-13: LGTM! Standard Ginkgo test suite bootstrap.

The Result test suite setup follows Ginkgo v2 conventions correctly.

status-reporter/pkg/reporter/reporter_suite_test.go (1)

1-13: LGTM! Standard Ginkgo test suite bootstrap.

The Reporter test suite setup follows Ginkgo v2 conventions correctly.

status-reporter/cmd/reporter/main.go (4)

22-68: LGTM! Well-structured main function with proper error handling.

The main function correctly:

  • Sets up configuration and logging
  • Implements panic recovery for the reporter goroutine
  • Establishes signal handling for graceful shutdown
  • Delegates exit code determination to waitForCompletion

70-82: LGTM! Clean completion handling.

The waitForCompletion function properly coordinates between normal completion and signal-driven shutdown, returning appropriate exit codes.


84-120: LGTM! Shutdown handling correctly addresses past review feedback.

The completion and shutdown handlers properly:

  • Return correct exit codes (1 for errors, 0 for success)
  • Avoid blocking indefinitely on errChan by reading it only within the graceful shutdown case
  • Force exit on timeout without waiting for the done channel
  • Distinguish between expected context.Canceled errors and real errors during shutdown

The past review concerns about exit codes and shutdown timeout blocking have been successfully resolved.


122-138: LGTM! Clear configuration logging.

The logConfig function provides helpful visibility into the loaded configuration, including the auto-detection behavior for AdapterContainerName.

status-reporter/pkg/config/config_test.go (4)

13-37: LGTM! Proper test isolation with environment variable management.

The BeforeEach/AfterEach hooks correctly save, clear, and restore environment variables to ensure test independence.


39-148: LGTM! Comprehensive test coverage for config loading.

The Load() tests thoroughly cover:

  • Valid configurations with required and optional fields
  • Default value application
  • Custom value overrides
  • Whitespace trimming
  • Missing required environment variables
  • Invalid integer parsing

150-235: LGTM! Thorough validation test coverage.

The Validate() tests comprehensively verify:

  • Valid configurations pass
  • Timing parameters must be positive
  • PollIntervalSeconds must be less than MaxWaitTimeSeconds
  • ResultsPath must be absolute and point to a file

237-249: LGTM! Helper method tests verify duration conversions.

The tests for GetPollInterval() and GetMaxWaitTime() correctly verify the conversion from seconds to time.Duration.

status-reporter/pkg/result/parser.go (3)

1-20: LGTM! Sensible size limit and constructor pattern.

The 1MB maxResultFileSize is appropriate for JSON result files, and the constructor pattern provides flexibility for future enhancements.


22-38: LGTM! Robust file parsing with appropriate validation.

ParseFile correctly validates file contents by checking for empty files and enforcing the size limit before parsing, with descriptive error messages.


40-53: LGTM! Clean parsing with delegated validation.

Parse correctly unmarshals JSON and delegates validation to AdapterResult.Validate(), maintaining proper separation of concerns.

status-reporter/Makefile (4)

28-59: LGTM! Robust Go tooling and container runtime detection.

The Makefile correctly:

  • Configures race detection for tests
  • Detects available container runtimes (Docker/Podman)
  • Provides clear error messages when no runtime is found

61-109: LGTM! Well-structured development targets.

The test, lint, and formatting targets properly:

  • Enable race detection for tests
  • Generate coverage reports
  • Check for tool availability before use
  • Prefer goimports over gofmt when available

The static analysis warnings about target body length are stylistic and don't indicate actual problems.


110-146: LGTM! Image build and push targets with runtime validation.

The image targets correctly:

  • Validate container runtime availability
  • Build and tag images appropriately
  • Provide clear error messages and status indicators

Note: The binary target uses LDFLAGS that were flagged in a separate comment.


147-171: LGTM! Dev workflow targets with proper validation.

The image-dev target appropriately:

  • Requires QUAY_USER to be set
  • Provides clear usage instructions
  • Uses a dev-specific tag with git commit

The verify target usefully chains linting and testing. The static analysis warning about a missing "all" target is a convention, not a requirement.

status-reporter/pkg/reporter/reporter_test.go (1)

1-647: Excellent comprehensive test coverage!

The test suite thoroughly exercises the StatusReporter across all code paths:

  • Constructor variations with custom and default condition types
  • Result processing for success/failure cases
  • Error handling for invalid formats and K8s update failures
  • Container termination scenarios (OOMKilled, exit codes, crashes)
  • Timeout handling with various container states
  • End-to-end Run orchestration with file polling

The test structure is well-organized with clear Context blocks, and the use of the mock K8s client enables precise verification of status updates. The comment at lines 238-244 excellently explains the exit-code-zero-without-result-file edge case.

status-reporter/cmd/reporter/main_test.go (1)

1-382: Well-structured lifecycle tests with excellent edge case coverage.

The test suite comprehensively validates the reporter's lifecycle management:

  • Exit code handling for success, errors, and context cancellation
  • Graceful shutdown with signal handling (SIGTERM/SIGINT)
  • Shutdown timeout behavior and timer cleanup
  • Race conditions between completion and signals
  • Proper distinction between context.Canceled in shutdown (exit 0) vs normal paths (exit 1)

The tests cover critical operational scenarios that ensure the sidecar behaves correctly during Kubernetes pod lifecycle events.

status-reporter/pkg/result/result.go (1)

79-94: UTF-8 truncation correctly implemented.

The truncateUTF8 helper properly addresses the past review concern about splitting multi-byte UTF-8 characters. The implementation walks backward from maxBytes to find the last valid rune boundary using utf8.RuneStart, ensuring the truncated string remains valid UTF-8.

The logic correctly handles all cases:

  • Returns the original string if already within limits
  • Finds the last complete character that fits within maxBytes
  • Never splits multi-byte runes
status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1)

1-35: Clean and effective mock implementation.

The MockK8sClient provides a straightforward test double with:

  • Injectable behavior via function fields for custom error scenarios
  • State capture via LastUpdatedCondition for assertion verification
  • Safe defaults (nil/no-error) when custom behavior isn't needed

This enables the comprehensive test coverage in reporter_test.go by allowing precise control over Kubernetes interactions.

status-reporter/pkg/reporter/reporter.go (1)

1-276: Excellent implementation with proper resource management and error handling.

The StatusReporter demonstrates several strong patterns:

Clean architecture:

  • K8sClientInterface abstraction enables testability
  • Dependency injection via constructors (production vs. test)
  • Clear separation of concerns across update methods

Robust orchestration:

  • Proper context usage with timeout and graceful cancellation
  • WaitGroup ensures goroutine cleanup before return
  • Double-select pattern (lines 104-112) gives precedence to results that arrive just before timeout
  • Ticker and context properly stopped via defers

Intelligent error handling:

  • UpdateFromTimeout creates a new context.Background() (line 221) for container status checks, allowing the query even if the parent context is cancelled - this is correct defensive programming
  • UpdateFromTerminatedContainer distinguishes between OOMKilled, non-zero exits, and the edge case of zero-exit without results
  • Error messages include relevant context (pod name, condition type, timing)

Comment on lines +20 to +26
LDFLAGS := -w -s
LDFLAGS += -X main.version=$(VERSION)
LDFLAGS += -X main.commit=$(GIT_COMMIT)
LDFLAGS += -X main.buildDate=$(BUILD_DATE)
ifneq ($(GIT_TAG),)
LDFLAGS += -X main.tag=$(GIT_TAG)
endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

LDFLAGS references non-existent variables in main.go.

The LDFLAGS attempt to inject version metadata (main.version, main.commit, main.buildDate, main.tag) via -X flags, but these variables are not declared in status-reporter/cmd/reporter/main.go.

If you want to embed build metadata, add these variables to main.go:

package main

import (
	// ... existing imports
)

var (
	version   = "dev"
	commit    = "unknown"
	buildDate = "unknown"
	tag       = ""
)

Alternatively, if version metadata is not needed, remove lines 20-26 from the Makefile.

🤖 Prompt for AI Agents
In status-reporter/Makefile around lines 20 to 26, the LDFLAGS use -X to set
main.version, main.commit, main.buildDate and main.tag which do not exist in
status-reporter/cmd/reporter/main.go; to fix either (A) add matching
package-level vars in main.go (e.g., version, commit, buildDate, tag) with
sensible defaults so the -X flags can set them at link time, or (B) if build
metadata is not required, remove the LDFLAGS lines 20–26 from the Makefile so it
no longer references non-existent symbols.

@86254860 86254860 force-pushed the HYPERFLEET-264-status-reporter-for-gcp-validator branch from fd11719 to 910c755 Compare December 12, 2025 06:56
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (2)
status-reporter/Makefile (1)

19-26: LDFLAGS -X main.* still needs confirmation in main.go.
This is the same concern as the prior review: ensure status-reporter/cmd/reporter/main.go declares version/commit/buildDate/tag variables, or remove these flags.

#!/bin/bash
set -euo pipefail

# Verify main package variables exist for ldflags injection
rg -n --no-heading '^\s*var\s*\(\s*$|^\s*version\s*=|^\s*commit\s*=|^\s*buildDate\s*=|^\s*tag\s*=' status-reporter/cmd/reporter/main.go || true
echo "----"
sed -n '1,120p' status-reporter/cmd/reporter/main.go
status-reporter/pkg/config/config.go (1)

126-145: Reintroduce /results sandboxing for ResultsPath (currently allows arbitrary absolute paths).
Right now any absolute file path is accepted; this is a security footgun and was previously flagged.

 const (
     DefaultResultsPath          = "/results/adapter-result.json"
+    ResultsBasePath             = "/results"
     DefaultPollIntervalSeconds  = 2
     DefaultMaxWaitTimeSeconds   = 300
     DefaultConditionType        = "Available"
     DefaultLogLevel             = "info"
     DefaultAdapterContainerName = ""
 )
 
 func (c *Config) validateResultsPath() error {
     if strings.HasSuffix(c.ResultsPath, "/") {
         return &ValidationError{
             Field:   "ResultsPath",
             Message: "path must be a file, not a directory",
         }
     }
 
     cleanPath := filepath.Clean(c.ResultsPath)
 
     if !filepath.IsAbs(cleanPath) {
         return &ValidationError{
             Field:   "ResultsPath",
             Message: "path must be absolute",
         }
     }
+
+    // Ensure cleanPath is within ResultsBasePath (avoid treating "/results2" as inside "/results")
+    base := filepath.Clean(ResultsBasePath)
+    if cleanPath != base && !strings.HasPrefix(cleanPath, base+string(os.PathSeparator)) {
+        return &ValidationError{
+            Field:   "ResultsPath",
+            Message: fmt.Sprintf("path must be within %s directory", ResultsBasePath),
+        }
+    }
 
     return nil
 }
🧹 Nitpick comments (5)
status-reporter/Dockerfile (1)

2-17: Avoid hardcoding amd64; prefer build args / multi-arch pattern.
If this image is expected to run on arm64 clusters too, GOARCH=amd64 will be a problem. Consider using ARG TARGETOS TARGETARCH and building with BuildKit’s target platform variables.

 FROM golang:1.25-alpine AS builder
+ARG TARGETOS=linux
+ARG TARGETARCH=amd64
 ...
-RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="-w -s" -o status-reporter ./cmd/reporter
+RUN CGO_ENABLED=0 GOOS=$TARGETOS GOARCH=$TARGETARCH go build -ldflags="-w -s" -o status-reporter ./cmd/reporter
status-reporter/Makefile (1)

61-65: Consider adding all phony target to match common Make conventions.
This also addresses the checkmake “missing phony target all” warning.

+.PHONY: all
+all: help
+
 .PHONY: help
 help: ## Display this help message

Also applies to: 170-171

status-reporter/pkg/reporter/reporter_test.go (1)

529-555: Potential CI flakes: replace fixed sleeps / exact error strings with Eventually + sentinel errors.
The tests coordinate with time.Sleep(...) and sometimes assert err.Error() == "timeout waiting for adapter results", which is brittle if wording changes. Prefer Eventually around file existence/condition updates and typed/sentinel errors (or MatchError with a shared exported error).

Also applies to: 652-687, 747-842

status-reporter/pkg/k8s/client.go (1)

116-129: Avoid returning the address of a range variable (&cs).
This is a common Go pitfall and is easy to regress if the loop logic changes. Prefer indexing to take the address of the slice element.

- for _, cs := range podStatus.ContainerStatuses {
-     if cs.Name == containerName {
-         return &cs, nil
-     }
- }
+ for i := range podStatus.ContainerStatuses {
+     if podStatus.ContainerStatuses[i].Name == containerName {
+         return &podStatus.ContainerStatuses[i], nil
+     }
+ }
...
- for _, cs := range podStatus.ContainerStatuses {
-     if cs.Name != StatusReporterContainerName {
-         return &cs, nil
-     }
- }
+ for i := range podStatus.ContainerStatuses {
+     if podStatus.ContainerStatuses[i].Name != StatusReporterContainerName {
+         return &podStatus.ContainerStatuses[i], nil
+     }
+ }
status-reporter/pkg/reporter/reporter.go (1)

204-226: Consider checking LastTerminationState.Terminated (restart/crash-loop visibility) + clamp condition message size.
If the container restarts, State.Terminated may be nil while LastTerminationState.Terminated has the real failure reason; also, condition Message could grow large depending on parse errors.

Also applies to: 375-405

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd11719 and 910c755.

⛔ Files ignored due to path filters (1)
  • status-reporter/go.sum is excluded by !**/*.sum
📒 Files selected for processing (22)
  • README.md (1 hunks)
  • status-reporter/Dockerfile (1 hunks)
  • status-reporter/Makefile (1 hunks)
  • status-reporter/cmd/reporter/main.go (1 hunks)
  • status-reporter/cmd/reporter/main_suite_test.go (1 hunks)
  • status-reporter/cmd/reporter/main_test.go (1 hunks)
  • status-reporter/go.mod (1 hunks)
  • status-reporter/pkg/config/config.go (1 hunks)
  • status-reporter/pkg/config/config_suite_test.go (1 hunks)
  • status-reporter/pkg/config/config_test.go (1 hunks)
  • status-reporter/pkg/k8s/client.go (1 hunks)
  • status-reporter/pkg/k8s/client_test.go (1 hunks)
  • status-reporter/pkg/k8s/k8s_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_suite_test.go (1 hunks)
  • status-reporter/pkg/reporter/reporter_test.go (1 hunks)
  • status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (1 hunks)
  • status-reporter/pkg/result/parser.go (1 hunks)
  • status-reporter/pkg/result/parser_test.go (1 hunks)
  • status-reporter/pkg/result/result.go (1 hunks)
  • status-reporter/pkg/result/result_suite_test.go (1 hunks)
  • status-reporter/pkg/result/result_test.go (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • status-reporter/cmd/reporter/main_suite_test.go
🚧 Files skipped from review as they are similar to previous changes (9)
  • status-reporter/pkg/config/config_suite_test.go
  • status-reporter/pkg/result/result_suite_test.go
  • status-reporter/pkg/reporter/reporter_suite_test.go
  • status-reporter/go.mod
  • status-reporter/pkg/result/result_test.go
  • status-reporter/cmd/reporter/main.go
  • status-reporter/pkg/result/parser.go
  • status-reporter/pkg/result/parser_test.go
  • status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go
🧰 Additional context used
🧬 Code graph analysis (4)
status-reporter/pkg/config/config_test.go (1)
status-reporter/pkg/config/config.go (2)
  • Load (57-105)
  • Config (13-23)
status-reporter/pkg/reporter/reporter.go (3)
status-reporter/pkg/k8s/client.go (2)
  • JobCondition (49-55)
  • NewClient (30-46)
status-reporter/pkg/result/result.go (1)
  • AdapterResult (32-44)
status-reporter/pkg/result/parser.go (2)
  • Parser (16-16)
  • NewParser (19-21)
status-reporter/pkg/reporter/reporter_test.go (4)
status-reporter/pkg/reporter/reporter.go (3)
  • StatusReporter (49-59)
  • NewReporterWithClient (72-74)
  • NewReporterWithClientAndIntervals (77-79)
status-reporter/pkg/reporter/testhelpers/mock_k8s_client.go (2)
  • MockK8sClient (12-16)
  • NewMockK8sClient (18-20)
status-reporter/pkg/result/result.go (3)
  • AdapterResult (32-44)
  • StatusSuccess (11-11)
  • StatusFailure (12-12)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
status-reporter/pkg/k8s/client_test.go (1)
status-reporter/pkg/k8s/client.go (1)
  • JobCondition (49-55)
🪛 checkmake (0.2.2)
status-reporter/Makefile

[warning] 85-85: Target body for "lint" exceeds allowed length of 5 (8).

(maxbodylength)


[warning] 96-96: Target body for "fmt" exceeds allowed length of 5 (6).

(maxbodylength)


[warning] 170-170: Missing required phony target "all"

(minphony)

🔇 Additional comments (8)
README.md (1)

88-90: Doc typo from earlier review looks fixed.
The “Adapter writes to the result file:” line no longer has the stray backtick.

Also applies to: 110-112

status-reporter/pkg/k8s/k8s_suite_test.go (1)

1-13: Standard Ginkgo suite bootstrap looks correct.

status-reporter/pkg/k8s/client_test.go (1)

12-41: LGTM (basic coverage).
These are simple smoke tests; consider expanding later to cover behavior around status update helpers (if/when present).

status-reporter/Dockerfile (1)

2-2: Pin/validate base image tags against your supported toolchain.
Please confirm golang:1.25-alpine (and the distroless tag) match what you want for CI/prod and actually exist in your registries.

Also applies to: 19-19

status-reporter/pkg/config/config_test.go (1)

16-37: Env isolation approach looks solid.

status-reporter/cmd/reporter/main_test.go (1)

15-40: Coverage of signal/normal completion paths is strong.

Also applies to: 156-256

status-reporter/pkg/reporter/reporter_test.go (1)

20-843: Scenario coverage is thorough and matches the PR’s intent.
Good breadth across result parsing, termination, timeout, and status update propagation.

status-reporter/pkg/result/result.go (1)

51-94: UTF-8-safe truncation + normalization looks solid.
This addresses the prior “byte slice can corrupt UTF-8” issue while keeping the contract simple.

Comment on lines +97 to +106
Resulting Kubernetes Job status:
```yaml
status:
conditions:
- type: Available
status: "True"
reason: ValidationPassed
message: GCP environment validated successfully
lastTransitionTime: "2024-01-15T10:30:00Z"
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider using Job-native condition types in examples (or explicitly call out “custom”).
Using type: Available in Job status examples can confuse readers and some tooling; if custom condition types are intentional, clarify that in the doc examples (or switch examples to Complete/Failed).

Also applies to: 123-132, 137-145

🤖 Prompt for AI Agents
In README.md around lines 97-106 (also review examples at 123-132 and 137-145),
the Job status YAML uses a non-standard/custom condition type "Available" which
can confuse readers and tooling; either change the example condition types to
native Job conditions such as "Complete" or "Failed" with appropriate
status/messages, or explicitly label the snippet as using a custom condition
type (e.g., add a comment/note above the YAML stating "custom condition type
used for illustration"). Ensure the chosen fix is applied consistently to the
other referenced snippets (123-132 and 137-145).

Comment on lines +104 to +137
Context("when reporter exceeds shutdown timeout", func() {
It("returns exit code 1 after timeout", func() {
// Simulate reporter hanging (never sends to done)
go func() {
<-ctx.Done()
// Never send to done - simulating hang
time.Sleep(10 * time.Second)
}()

start := time.Now()
exitCode := handleShutdown(syscall.SIGTERM, cancel, done)
duration := time.Since(start)

Expect(exitCode).To(Equal(1))
Expect(duration).To(BeNumerically(">=", shutdownTimeout))
Expect(duration).To(BeNumerically("<", shutdownTimeout+200*time.Millisecond))
})

It("does not block indefinitely", func() {
go func() {
<-ctx.Done()
// Hang forever
select {}
}()

done := make(chan int, 1)
go func() {
exitCode := handleShutdown(syscall.SIGTERM, cancel, make(chan error, 1))
done <- exitCode
}()

Eventually(done, shutdownTimeout+1*time.Second).Should(Receive(Equal(1)))
})
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Reduce flake risk: loosen timing assertions + avoid forever goroutines.
The < shutdownTimeout+200ms bound is tight under load, and select {} will leak a goroutine. Prefer a goroutine that exits on ctx.Done() and use Eventually for time-based expectations.

-                go func() {
-                    <-ctx.Done()
-                    // Hang forever
-                    select {}
-                }()
+                go func() {
+                    <-ctx.Done()
+                    // Simulate hang until test cancels context
+                    <-ctx.Done()
+                }()

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In status-reporter/cmd/reporter/main_test.go around lines 104 to 137, the test
has a tight timing assertion and a goroutine that blocks forever causing
flakiness and leaks; change the first It to relax the upper timing bound (remove
the strict "< shutdownTimeout+200ms" and instead assert duration is >=
shutdownTimeout and <= shutdownTimeout+1s or use Eventually to allow variance),
and rewrite the hanging goroutines so they exit on ctx.Done() (replace select{}
with <-ctx.Done() or a sleep until ctx.Done()) and for the second It use
Eventually on the done channel with a timeout like shutdownTimeout+1s to assert
it receives 1, ensuring all goroutines can terminate and timing is more
tolerant.

Comment on lines +147 to +168
.PHONY: image-dev
image-dev: ## Build and push to personal Quay registry (requires QUAY_USER)
ifndef QUAY_USER
@echo "❌ ERROR: QUAY_USER is not set"
@echo ""
@echo "Usage: QUAY_USER=myuser make image-dev"
@echo ""
@echo "This will build and push to: quay.io/$$QUAY_USER/$(IMAGE_NAME):$(DEV_TAG)"
@exit 1
endif
ifeq ($(CONTAINER_RUNTIME),none)
@echo "❌ ERROR: No container runtime found"
@echo "Please install Docker or Podman"
@exit 1
else
@echo "Building dev image quay.io/$(QUAY_USER)/$(IMAGE_NAME):$(DEV_TAG)..."
$(CONTAINER_CMD) build -t quay.io/$(QUAY_USER)/$(IMAGE_NAME):$(DEV_TAG) .
@echo "Pushing dev image..."
$(CONTAINER_CMD) push quay.io/$(QUAY_USER)/$(IMAGE_NAME):$(DEV_TAG)
@echo ""
@echo "✅ Dev image pushed: quay.io/$(QUAY_USER)/$(IMAGE_NAME):$(DEV_TAG)"
endif
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fix image-dev QUAY_USER guard (currently allows empty QUAY_USER).
Because QUAY_USER ?= defines the var, ifndef QUAY_USER won’t fire when it’s empty.

 .PHONY: image-dev
 image-dev: ## Build and push to personal Quay registry (requires QUAY_USER)
-ifndef QUAY_USER
+ifeq ($(strip $(QUAY_USER)),)
 	@echo "❌ ERROR: QUAY_USER is not set"
 	@echo ""
 	@echo "Usage: QUAY_USER=myuser make image-dev"
 	@echo ""
 	@echo "This will build and push to: quay.io/$$QUAY_USER/$(IMAGE_NAME):$(DEV_TAG)"
 	@exit 1
 endif

Comment on lines +56 to +65
It("uses default values for optional fields", func() {
cfg, err := config.Load()
Expect(err).NotTo(HaveOccurred())
Expect(cfg.ResultsPath).To(Equal("/results/adapter-result.json"))
Expect(cfg.PollIntervalSeconds).To(Equal(2))
Expect(cfg.MaxWaitTimeSeconds).To(Equal(300))
Expect(cfg.ConditionType).To(Equal("Available"))
Expect(cfg.LogLevel).To(Equal("info"))
Expect(cfg.AdapterContainerName).To(Equal(""))
})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid hardcoding defaults in tests; reference config constants (prevents drift).
AdapterContainerName default asserted as "" is especially likely to diverge from DefaultAdapterContainerName.

-                Expect(cfg.ResultsPath).To(Equal("/results/adapter-result.json"))
-                Expect(cfg.PollIntervalSeconds).To(Equal(2))
-                Expect(cfg.MaxWaitTimeSeconds).To(Equal(300))
-                Expect(cfg.ConditionType).To(Equal("Available"))
-                Expect(cfg.LogLevel).To(Equal("info"))
-                Expect(cfg.AdapterContainerName).To(Equal(""))
+                Expect(cfg.ResultsPath).To(Equal(config.DefaultResultsPath))
+                Expect(cfg.PollIntervalSeconds).To(Equal(config.DefaultPollIntervalSeconds))
+                Expect(cfg.MaxWaitTimeSeconds).To(Equal(config.DefaultMaxWaitTimeSeconds))
+                Expect(cfg.ConditionType).To(Equal(config.DefaultConditionType))
+                Expect(cfg.LogLevel).To(Equal(config.DefaultLogLevel))
+                Expect(cfg.AdapterContainerName).To(Equal(config.DefaultAdapterContainerName))

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In status-reporter/pkg/config/config_test.go around lines 56 to 65, the test
hardcodes default values (notably AdapterContainerName as "") which can drift
from the package defaults; update the assertions to compare against the config
package constants (e.g. config.DefaultAdapterContainerName and the other
Default* constants such as DefaultResultsPath, DefaultPollIntervalSeconds,
DefaultMaxWaitTimeSeconds, DefaultConditionType, DefaultLogLevel) instead of
literal values so the test always reflects the source-of-truth defaults.

Comment on lines +109 to +132
// GetAdapterContainerStatus finds the adapter container status
func (c *Client) GetAdapterContainerStatus(ctx context.Context, podName, containerName string) (*corev1.ContainerStatus, error) {
podStatus, err := c.GetPodStatus(ctx, podName)
if err != nil {
return nil, err
}

if containerName != "" {
for _, cs := range podStatus.ContainerStatuses {
if cs.Name == containerName {
return &cs, nil
}
}
return nil, fmt.Errorf("container not found: namespace=%s pod=%s container=%s", c.namespace, podName, containerName)
}

for _, cs := range podStatus.ContainerStatuses {
if cs.Name != StatusReporterContainerName {
return &cs, nil
}
}

return nil, fmt.Errorf("adapter container not found: namespace=%s pod=%s", c.namespace, podName)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Find and examine the k8s client file and related code
find . -type f -name "*.go" | head -20

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 896


🏁 Script executed:

# Search for GetAdapterContainerStatus usage
rg "GetAdapterContainerStatus" --type go -B 2 -A 2

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 8709


🏁 Script executed:

# Search for StatusReporterContainerName definition
rg "StatusReporterContainerName" --type go -B 2 -A 2

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 729


🏁 Script executed:

# Search for ADAPTER_CONTAINER_NAME
rg "ADAPTER_CONTAINER_NAME" --type go -B 2 -A 2

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 1739


🏁 Script executed:

# Check the full client.go file to understand context
wc -l status-reporter/pkg/k8s/client.go

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 124


🏁 Script executed:

# Get the file structure and NewClient to understand initialization
head -150 status-reporter/pkg/k8s/client.go | tail -50

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 1594


🏁 Script executed:

# Check for README or deployment docs
find . -type f \( -name "README*" -o -name "*.md" \) | head -20

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 98


🏁 Script executed:

# Look at the main.go to see full context of how it's initialized
cat -n status-reporter/cmd/reporter/main.go

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 4757


🏁 Script executed:

# Check the config.go to see if AdapterContainerName has defaults
cat -n status-reporter/pkg/config/config.go

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 6163


🏁 Script executed:

# Check README for deployment guidance about container names
cat README.md

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 8281


Require explicit ADAPTER_CONTAINER_NAME when sidecar injection is present—the fallback "first non-status-reporter" container selection is fragile and non-deterministic.

If no container name is specified, the function returns the first container that is not "status-reporter". In Kubernetes clusters with sidecar injection (Istio, service meshes, etc.), container order in ContainerStatuses is undefined, making it possible to select an injected sidecar instead of the adapter container and silently misreport its status.

Either:

  • Require ADAPTER_CONTAINER_NAME to be explicitly set and fail fast with a clear error if the fallback path is triggered, or
  • Validate that exactly one non-status-reporter container exists before returning it.

The fallback is documented in logs as "(auto-detect)" but not enforced safely in code.

🤖 Prompt for AI Agents
In status-reporter/pkg/k8s/client.go around lines 109 to 132, the current
fallback that returns the first non-status-reporter container is unsafe with
sidecar injection; change the logic so that when containerName is empty you
collect all ContainerStatuses where Name != StatusReporterContainerName and
then: if the resulting slice length == 1 return that container, otherwise return
an explicit error that auto-detection failed and instructs the caller to set
ADAPTER_CONTAINER_NAME (include namespace, pod, and the count of non-status
containers in the message). This enforces deterministic behavior and fails fast
when multiple candidate containers (e.g. injected sidecars) exist.

Comment on lines +95 to +146
// Run starts the reporter and blocks until completion
func (r *StatusReporter) Run(ctx context.Context) error {
log.Printf("Status reporter starting...")
log.Printf(" Pod: %s", r.podName)
log.Printf(" Results path: %s", r.resultsPath)
log.Printf(" Poll interval: %s", r.pollInterval)
log.Printf(" Max wait time: %s", r.maxWaitTime)

timeoutCtx, cancel := context.WithTimeout(ctx, r.maxWaitTime)
defer cancel()

// Buffered channels (size 1) prevent goroutine leaks if the main select has already
// chosen another case when a sender tries to send
channels := &pollChannels{
result: make(chan *result.AdapterResult, 1),
error: make(chan error, 1),
terminated: make(chan *corev1.ContainerStateTerminated, 1),
done: make(chan struct{}),
}

var wg sync.WaitGroup
wg.Add(2)
go r.pollForResultFile(timeoutCtx, channels, &wg)
go r.monitorContainerStatus(timeoutCtx, channels, &wg)

var reportErr error
select {
case adapterResult := <-channels.result:
reportErr = r.UpdateFromResult(ctx, adapterResult)
case err := <-channels.error:
reportErr = r.UpdateFromError(ctx, err)
case terminated := <-channels.terminated:
reportErr = r.HandleTermination(ctx, terminated)
case <-timeoutCtx.Done():
// Give precedence to results/errors/termination that may have arrived just before timeout
select {
case adapterResult := <-channels.result:
reportErr = r.UpdateFromResult(ctx, adapterResult)
case err := <-channels.error:
reportErr = r.UpdateFromError(ctx, err)
case terminated := <-channels.terminated:
reportErr = r.HandleTermination(ctx, terminated)
default:
reportErr = r.UpdateFromTimeout(ctx)
}
}

close(channels.done)
wg.Wait()

return reportErr
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cd status-reporter && cat -n pkg/reporter/reporter.go | sed -n '95,146p'

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 2173


🏁 Script executed:

# Also check surrounding code to understand the full context
cd status-reporter && cat -n pkg/reporter/reporter.go | sed -n '80,160p'

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 3597


🏁 Script executed:

# Check what UpdateFromTimeout does
cd status-reporter && rg -A 10 "func.*UpdateFromTimeout"

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 1025


🏁 Script executed:

# Check what UpdateFromError does (mentioned in the review as potentially embedding unbounded errors)
cd status-reporter && rg -A 10 "func.*UpdateFromError"

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 778


🌐 Web query:

Go context package DeadlineExceeded vs Canceled behavior

💡 Result:

Short answer

  • ctx.Done() is closed in both cases; the difference is the error returned by ctx.Err():
    • context.Canceled — the context was explicitly canceled (Cancel/CancelFunc called or parent canceled). [1]
    • context.DeadlineExceeded — the context’s deadline/timeout expired. [1][2]

Details (practical points)

  • Use WithCancel to produce a cancellation that yields context.Canceled when you call the CancelFunc. [1]
  • Use WithDeadline/WithTimeout to produce a timeout that yields context.DeadlineExceeded when the deadline passes. Always still call the returned CancelFunc to release resources. [1]
  • Many libraries treat both as “stop work now” (Done closed), but they may handle them differently (e.g., logging, connection handling) because the Err value indicates whether it was caller-initiated cancellation or a timeout. [1][3]

Sources
[1] Go context package docs (pkg.go.dev).
[2] Go stdlib context overview (golang.org / pkg docs).
[3] Discussion/examples about cancellation vs timeout behavior (blog/issue references).


🏁 Script executed:

cd status-reporter && rg -A 20 "func.*UpdateFromTimeout"

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 1619


🏁 Script executed:

# Check what UpdateFromTerminatedContainer does
cd status-reporter && rg -A 15 "func.*UpdateFromTerminatedContainer"

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 1227


🏁 Script executed:

# Check the Job condition update mechanism to understand side effects
cd status-reporter && rg -A 10 "UpdateJobStatus" | head -50

Repository: openshift-hyperfleet/adapter-validation-gcp

Length of output: 3243


Distinguish "timeout" vs "context canceled" to avoid misreporting during shutdown/deletion.
timeoutCtx.Done() fires for both deadline-exceeded and parent context cancellation, but the code always runs the "timeout" path (unless a result/error arrived). When the parent context cancels (e.g., during pod/job deletion), this incorrectly marks the Job with ReasonAdapterTimeout instead of propagating the cancellation.

 func (r *StatusReporter) Run(ctx context.Context) error {
 ...
     case <-timeoutCtx.Done():
         // Give precedence to results/errors/termination that may have arrived just before timeout
         select {
         case adapterResult := <-channels.result:
             reportErr = r.UpdateFromResult(ctx, adapterResult)
         case err := <-channels.error:
             reportErr = r.UpdateFromError(ctx, err)
         case terminated := <-channels.terminated:
             reportErr = r.HandleTermination(ctx, terminated)
         default:
+            if errors.Is(timeoutCtx.Err(), context.DeadlineExceeded) {
+                reportErr = r.UpdateFromTimeout(ctx)
+            } else {
+                // Parent canceled (shutdown/deletion). Avoid writing misleading "timeout" status.
+                reportErr = timeoutCtx.Err()
+            }
-            reportErr = r.UpdateFromTimeout(ctx)
         }
     }
 ...
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Run starts the reporter and blocks until completion
func (r *StatusReporter) Run(ctx context.Context) error {
log.Printf("Status reporter starting...")
log.Printf(" Pod: %s", r.podName)
log.Printf(" Results path: %s", r.resultsPath)
log.Printf(" Poll interval: %s", r.pollInterval)
log.Printf(" Max wait time: %s", r.maxWaitTime)
timeoutCtx, cancel := context.WithTimeout(ctx, r.maxWaitTime)
defer cancel()
// Buffered channels (size 1) prevent goroutine leaks if the main select has already
// chosen another case when a sender tries to send
channels := &pollChannels{
result: make(chan *result.AdapterResult, 1),
error: make(chan error, 1),
terminated: make(chan *corev1.ContainerStateTerminated, 1),
done: make(chan struct{}),
}
var wg sync.WaitGroup
wg.Add(2)
go r.pollForResultFile(timeoutCtx, channels, &wg)
go r.monitorContainerStatus(timeoutCtx, channels, &wg)
var reportErr error
select {
case adapterResult := <-channels.result:
reportErr = r.UpdateFromResult(ctx, adapterResult)
case err := <-channels.error:
reportErr = r.UpdateFromError(ctx, err)
case terminated := <-channels.terminated:
reportErr = r.HandleTermination(ctx, terminated)
case <-timeoutCtx.Done():
// Give precedence to results/errors/termination that may have arrived just before timeout
select {
case adapterResult := <-channels.result:
reportErr = r.UpdateFromResult(ctx, adapterResult)
case err := <-channels.error:
reportErr = r.UpdateFromError(ctx, err)
case terminated := <-channels.terminated:
reportErr = r.HandleTermination(ctx, terminated)
default:
reportErr = r.UpdateFromTimeout(ctx)
}
}
close(channels.done)
wg.Wait()
return reportErr
}
// Run starts the reporter and blocks until completion
func (r *StatusReporter) Run(ctx context.Context) error {
log.Printf("Status reporter starting...")
log.Printf(" Pod: %s", r.podName)
log.Printf(" Results path: %s", r.resultsPath)
log.Printf(" Poll interval: %s", r.pollInterval)
log.Printf(" Max wait time: %s", r.maxWaitTime)
timeoutCtx, cancel := context.WithTimeout(ctx, r.maxWaitTime)
defer cancel()
// Buffered channels (size 1) prevent goroutine leaks if the main select has already
// chosen another case when a sender tries to send
channels := &pollChannels{
result: make(chan *result.AdapterResult, 1),
error: make(chan error, 1),
terminated: make(chan *corev1.ContainerStateTerminated, 1),
done: make(chan struct{}),
}
var wg sync.WaitGroup
wg.Add(2)
go r.pollForResultFile(timeoutCtx, channels, &wg)
go r.monitorContainerStatus(timeoutCtx, channels, &wg)
var reportErr error
select {
case adapterResult := <-channels.result:
reportErr = r.UpdateFromResult(ctx, adapterResult)
case err := <-channels.error:
reportErr = r.UpdateFromError(ctx, err)
case terminated := <-channels.terminated:
reportErr = r.HandleTermination(ctx, terminated)
case <-timeoutCtx.Done():
// Give precedence to results/errors/termination that may have arrived just before timeout
select {
case adapterResult := <-channels.result:
reportErr = r.UpdateFromResult(ctx, adapterResult)
case err := <-channels.error:
reportErr = r.UpdateFromError(ctx, err)
case terminated := <-channels.terminated:
reportErr = r.HandleTermination(ctx, terminated)
default:
if errors.Is(timeoutCtx.Err(), context.DeadlineExceeded) {
reportErr = r.UpdateFromTimeout(ctx)
} else {
// Parent canceled (shutdown/deletion). Avoid writing misleading "timeout" status.
reportErr = timeoutCtx.Err()
}
}
}
close(channels.done)
wg.Wait()
return reportErr
}
🤖 Prompt for AI Agents
In status-reporter/pkg/reporter/reporter.go around lines 95 to 146, the select
branch for <-timeoutCtx.Done() treats all cancellations as a deadline timeout;
change it to distinguish context.DeadlineExceeded from parent cancellation by
checking timeoutCtx.Err(): if errors.Is(timeoutCtx.Err(),
context.DeadlineExceeded) keep the existing inner select and call
r.UpdateFromTimeout(...) when nothing else arrived, otherwise propagate the
cancellation immediately (e.g., return timeoutCtx.Err() or call
r.UpdateFromError with the cancellation error) so pod/job deletion doesn't get
misreported as a timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant