Skip to content

Conversation

vertex451
Copy link
Contributor

@vertex451 vertex451 commented Aug 27, 2025

  1. Replaced kcp-dev/controller-runtime fork with new multicluster-provider as a k8s client.

Testing

  1. Tested in openmfp local-setup by loading docker image with my changes and creating an account. Commit 00c6e87, date 02.10.2025
  2. Tested ClusterAccess mode against standard k8s, works. (Queried configMaps).

Summary by CodeRabbit

  • New Features

    • Manager-based KCP with provider lifecycle and dynamic per-request virtual workspace resolution via context helpers.
  • Refactor

    • Replaced single-reconciler flow with a controller-provider manager pattern and unified workspace/cluster path handling; coordinated startup/shutdown.
  • Bug Fixes

    • Improved workspace-path fallbacks, non-fatal per-workspace error handling, and clearer shutdown/error signaling.
  • Tests

    • Expanded/refactored tests, new test helpers, and removal/update of several autogenerated mocks.
  • Chores / Docs

    • Dependency upgrades, simplified CRD docs, updated virtual-workspaces docs, new default gateway URL pattern.

})
}

// Create client - use KCP-aware client only for KCP mode, standard client otherwise
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multicluster-runtime uses standard client.WithWatch internally for each cluster it manages.
So no need to differentiate between KCP and non KCP modes here

@vertex451 vertex451 marked this pull request as ready for review August 29, 2025 11:54
@vertex451 vertex451 requested a review from a team as a code owner August 29, 2025 11:54
@vertex451 vertex451 self-assigned this Aug 29, 2025
Copy link
Contributor

@aaronschweig aaronschweig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially see the comment about the full migration to multiclusterruntime. Let's make sure that we have used the capabilites of mcruntime properly and not only build around it.

go.mod Outdated
k8s.io/client-go => k8s.io/client-go v0.32.4
sigs.k8s.io/controller-runtime => github.com/kcp-dev/controller-runtime v0.19.0-kcp.1.0.20250129100209-5eaf4c7b6056
// Pin controller-runtime to v0.21.0 to avoid compatibility issues
sigs.k8s.io/controller-runtime => sigs.k8s.io/controller-runtime v0.21.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to pin it? is there any way to use the wanted version? Otherwise we still have the same dependency pain as before.

What forces us to use controller-runtime at all? I though we might be able to get rid of this dependency

Copy link
Member

@embik embik Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multicluster-runtime has a dependency on controller-runtime since it's an "addon", not a standalone project, but I agree with this being doable without replace directives. You probably just have to downgrade the k8s.io packages to be compatible with v0.21.0 of controller-runtime (currently I think controller-runtime gets upgraded because of k8s.io packages in go.mod being too new).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it - makes sense. Let's try and make it work without the replace then 💪

Copy link
Contributor Author

@vertex451 vertex451 Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed replace lines

Comment on lines 104 to 107
reconcilerInstance.manager = &kcpManagerWrapper{
Manager: mcMgr.GetLocalManager(),
reconciler: reconcilerInstance,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to implement the Reconciler interface then if there is no logic in here? Lets use a type that represents its usage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed unnecessary layer.

Comment on lines 252 to 431
func (w *kcpManagerWrapper) Start(ctx context.Context) error {
return w.reconciler.Start(ctx)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need that, as the manager is managed by the multiclusterruntime.Start

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

even kcpManagerWrapper is not needed, removed it

)

// Context key types to avoid collisions
type LogicalClusterKey struct{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check if there is an built in type?

Lets maybe also go one step further and refactor the usage of the clients properly.

With the mutlicluster runtime Client there is a GetCluster method. We should use that and make sure to use the multiclusterruntime properly in general.

this might cause some implementation changes on the resolver side. I would appreciate if we could investigate this in this PR to make a proper migration and not only migrate the listener, but also the gateway side

@vertex451 vertex451 force-pushed the feat/multi-cluster-runtime branch 2 times, most recently from 302e2c0 to cdd8368 Compare September 18, 2025 07:29
@vertex451 vertex451 force-pushed the feat/multi-cluster-runtime branch from 24289d7 to 70e5a05 Compare September 19, 2025 12:54
@nexus49
Copy link
Contributor

nexus49 commented Sep 22, 2025

@coderabbitai full review

Copy link

coderabbitai bot commented Sep 22, 2025

✅ Actions performed

Full review triggered.

Copy link

coderabbitai bot commented Sep 22, 2025

Walkthrough

Refactors the KCP reconciler into a KCPManager with multicluster-provider integration; replaces signal handling with cancellable contexts and errCh; centralizes gateway context keys; adds virtual-workspace and cluster-path helpers; removes multiple autogenerated mocks and APIBinding reconciler/tests; updates GraphQL/roundtripper context handling, tests, and dependencies.

Changes

Cohort / File(s) Summary
Entrypoint / coordinated shutdown
cmd/listener.go
Replace direct signal handling with cancellable context and errCh; rename NewKCPReconcilerNewKCPManager; adapt startup/shutdown flow to use ControllerProvider and error-channel driven shutdown.
KCP manager & reconciler API
listener/reconciler/kcp/reconciler.go, listener/reconciler/kcp/types.go
Replace KCPReconciler/CustomReconciler with KCPManager/ControllerProvider; add NewKCPManager, apiexport provider wiring, providerRunnable, and manager lifecycle helpers (GetManager, StartVirtualWorkspaceWatching) plus helper reconcile/schema/workspace methods.
KCP reconciler tests & test exports
listener/reconciler/kcp/reconciler_test.go, listener/reconciler/kcp/export_test.go
Migrate tests to KCPManager, add test factories and logger usage; expose testing helpers/wrappers (ExportedKCPManager, provider test hooks); update constructors to accept logger.
APIBinding controller & tests removed
listener/reconciler/kcp/apibinding_controller.go, listener/reconciler/kcp/apibinding_controller_test.go
Remove APIBindingReconciler implementation and its extensive test suite (schema generation, discovery/REST mapper, filesystem IO).
Virtual workspaces: behavior, API, tests & docs
listener/reconciler/kcp/virtual_workspace.go, listener/reconciler/kcp/virtual_workspace_test.go, docs/virtual-workspaces.md
Add VirtualWorkspace.Validate, default-workspace resolution; change virtual config/client creation to accept targetWorkspace; reconciliation uses generic schema with per-request workspace resolution; update tests and docs for dynamic workspace targeting.
Cluster path resolver & helpers
listener/reconciler/kcp/cluster_path.go, listener/reconciler/kcp/cluster_path_test.go, listener/reconciler/kcp/helper_functions.go, listener/reconciler/kcp/helper_functions_test.go
Add logger to ClusterPathResolverProvider, make PathForCluster a receiver method, add PathForClusterFromConfig, implement stripAPIExportPath/extractAPIExportRef, and update tests for fallback semantics and config-based paths.
Mocks removed & test adjustments
listener/pkg/apischema/mocks/*, listener/reconciler/kcp/mocks/*, listener/pkg/apischema/crd_resolver_test.go, listener/pkg/apischema/resolver_test.go
Delete many autogenerated mock files (DiscoveryInterface, RESTMapper, ClusterPathResolver, DiscoveryFactory); update tests to use kcp mocks or alternate providers and adjust imports/usages.
Target-cluster client & GraphQL context updates
gateway/manager/targetcluster/cluster.go, gateway/manager/targetcluster/graphql.go, gateway/manager/targetcluster/graphql_test.go
Always initialize cluster client with client.NewWithWatch; replace kontext usage with centralized ctxkeys helpers; store/extract cluster name and token via ctxkeys.WithClusterName/WithToken and corresponding getters.
Gateway context helpers & registry updates
gateway/manager/context/keys.go, gateway/manager/targetcluster/registry.go, gateway/manager/targetcluster/registry_test.go, gateway/manager/context/keys_test.go
Add centralized context helpers (WithToken, TokenFromContext, WithKcpWorkspace, KcpWorkspaceFromContext, WithClusterName, ClusterNameFromContext); remove local context key usage and update registry and tests to use helpers; add comprehensive tests.
RoundTripper / virtual-workspace URL handling
gateway/manager/roundtripper/roundtripper.go, gateway/manager/roundtripper/roundtripper_test.go, gateway/manager/roundtripper/export_test.go
Remove public TokenKey; use ctxkeys.TokenFromContext; add workspace-aware URL rewrite/normalization (inject/strip /clusters/<workspace> for virtual workspaces), update discovery path handling, and add test helpers (NewUnauthorizedRoundTripperForTest, IsWorkspaceQualifiedForTest).
Config and CRD fixture updates
common/config/config.go, tests/gateway_test/testdata/crd/...core.platform-mesh.io_accounts.yaml
Add KcpWorkspacePattern to URL config; update CRD fixture controller-gen annotation and simplify status.conditions descriptions.
Command-level error handling & defaults
cmd/root.go, cmd/gateway.go
Add default gateway-url-kcp-workspace-pattern; change init helpers to return errors (instead of exiting) so Run handles fatal conditions.
Cluster access reconciler typing
listener/reconciler/clusteraccess/reconciler.go
Change exported constructors to return reconciler.ControllerProvider (previously CustomReconciler) to align with new reconciler abstraction.
Mocks: client / withWatch Apply additions
common/mocks/mock_Client.go, common/mocks/mock_WithWatch.go
Add typed Apply mock methods (with Call/Expecter helpers) to support mocking Apply(ctx, obj, opts...) error with variadic options.
Watcher tests parallelization
common/watcher/watcher_test.go
Enable t.Parallel() in several watcher tests and increase timing windows to improve stability.
Module dependency updates
go.mod
Add/upgrade multicluster-runtime/multicluster-provider and multiple Kubernetes modules; simplify prior replace blocks; bump related deps.
Misc test helpers & exporters
gateway/manager/targetcluster/export_test.go, common/auth/export_test.go, various tests
Add test/export wrappers (e.g., ExtractClusterNameFromPathForTest, ExtractCAFromKubeconfigB64ForTest) and adjust tests to use them.
Gateway schema scalar parsing fix
gateway/schema/scalars.go, gateway/schema/scalars_test.go
Fix ListValue parsing to only insert key/value pairs when both key and value are present; expand and reorganize scalar tests into subtests with additional cases.
Target cluster tests duplicate
gateway/manager/targetcluster/cluster_test.go
Two identical TestTargetCluster_GetName functions were added (duplicate declaration likely to cause compile error).

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly captures the primary change—adopting the multicluster-provider library in place of the previous KCP controller runtime—and follows conventional commit styling. It is concise, specific, and informative.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/multi-cluster-runtime

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 73cc293 and 00c6e87.

📒 Files selected for processing (2)
  • gateway/manager/roundtripper/roundtripper.go (4 hunks)
  • gateway/manager/roundtripper/roundtripper_test.go (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
gateway/manager/roundtripper/roundtripper_test.go (5)
gateway/manager/context/keys.go (2)
  • WithToken (36-38)
  • WithKcpWorkspace (54-56)
gateway/manager/roundtripper/export_test.go (2)
  • NewUnauthorizedRoundTripperForTest (6-8)
  • IsWorkspaceQualifiedForTest (11-13)
gateway/manager/mocks/mock_RoundTripper.go (1)
  • MockRoundTripper (12-14)
common/config/config.go (1)
  • Config (3-37)
gateway/manager/roundtripper/roundtripper.go (1)
  • New (23-30)
gateway/manager/roundtripper/roundtripper.go (1)
gateway/manager/context/keys.go (2)
  • TokenFromContext (41-44)
  • KcpWorkspaceFromContext (59-62)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pipe / dockerBuild / docker
  • GitHub Check: pipe / testSource / test
  • GitHub Check: Analyze (go)
🔇 Additional comments (12)
gateway/manager/roundtripper/roundtripper_test.go (5)

15-15: LGTM!

The import alias ctxkeys centralizes context key handling and aligns with the refactored token/workspace context injection throughout the tests.


84-84: LGTM!

The refactoring to use ctxkeys.WithToken instead of direct context.WithValue improves maintainability and consistency across the test suite.

Also applies to: 382-382, 536-536, 583-583


602-615: LGTM!

The new test TestUnauthorizedRoundTripper_RoundTrip provides coverage for the unauthorizedRoundTripper.RoundTrip method, verifying the correct status code, request passthrough, and empty body.


617-697: LGTM!

The new test TestIsWorkspaceQualified provides comprehensive coverage of the isWorkspaceQualified helper function, including edge cases for various path patterns. The test expectations align with the requirement that services paths must have at least 4 segments with a non-empty workspace to be considered qualified.


699-800: LGTM!

The new test TestRoundTripper_WorkspaceQualifiedPathDetection thoroughly validates workspace URL handling, including edge cases where the service name matches "clusters" and empty workspace scenarios. The test expectations correctly align with the path qualification logic.

gateway/manager/roundtripper/roundtripper.go (7)

12-12: LGTM!

The import alias ctxkeys centralizes context key handling and aligns with the refactored token and workspace context extraction throughout the implementation.


47-48: LGTM!

The placement of handleVirtualWorkspaceURL at the start of RoundTrip ensures virtual workspace URL transformations occur before any authentication or routing decisions.


63-63: LGTM!

The refactoring to use ctxkeys.TokenFromContext encapsulates token extraction logic and maintains consistency with the centralized context key handling pattern.


117-143: LGTM!

The refactored isDiscoveryRequest correctly normalizes paths by stripping workspace prefixes before evaluating discovery patterns, ensuring consistent behavior across standard and workspace-qualified requests.


145-156: LGTM!

The stripWorkspacePrefix helper correctly normalizes paths by stripping various workspace-related prefixes, supporting both direct /clusters/<workspace>/ paths and virtual workspace /services/<service>/clusters/<workspace>/ paths.


173-216: LGTM!

The handleVirtualWorkspaceURL method correctly transforms virtual workspace URLs by injecting the workspace path between the service name and the API path. The implementation properly handles edge cases including empty workspaces, already-qualified paths, and non-services paths.


158-171: Clarify workspace qualification for /clusters paths

Current code (lines 164–166) and tests (roundtripper_test.go 633–642) treat "/clusters" and "/clusters/" as workspace-qualified. If the intent is to require a workspace segment, update the condition to:

len(segments) >= 2 && segments[0] == "clusters" && segments[1] != ""

and adjust tests accordingly. Otherwise, no change is needed. Please confirm the desired behavior.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (22)
tests/gateway_test/testdata/crd/core.platform-mesh.io_accounts.yaml (1)

155-155: Make conditions a map list keyed by type to prevent duplicates.

Kubernetes idiom is a map list for conditions keyed by “type”. Add x‑kubernetes list hints for safer merges and deduplication.

Apply:

   conditions:
-    items:
+    x-kubernetes-list-type: map
+    x-kubernetes-list-map-keys:
+    - type
+    items:
       description: Condition contains details for one aspect of the current
         state of this API Resource.
       properties:
         ...
       type:
-        description: type of condition in CamelCase or in foo.example.com/CamelCase.
+        description: type of condition in CamelCase or in foo.example.com/CamelCase.
         maxLength: 316
         pattern: ^([a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*/)?(([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])$
         type: string
gateway/manager/targetcluster/graphql_test.go (2)

199-201: Tests should not depend on the key type; use the accessor.

Accessing targetcluster.LogicalClusterKey("cluster") ties the test to an implementation detail. Use ClusterNameFromContext instead.

Apply this diff:

-				clusterFromCtx, ok := result.Context().Value(targetcluster.LogicalClusterKey("cluster")).(logicalcluster.Name)
-				if !ok || clusterFromCtx != logicalcluster.Name(tt.workspace) {
+				clusterFromCtx, ok := targetcluster.ClusterNameFromContext(result.Context())
+				if !ok || clusterFromCtx != logicalcluster.Name(tt.workspace) {
 					t.Errorf("expected cluster %q in context, got %q", tt.workspace, clusterFromCtx)
 				}

199-201: Require resolver tests to assert the resolver actually calls ClusterManager.GetCluster (use mock or integration test)

Current test only checks a cluster name in context; instead inject the existing MockClusterManager (or run an integration-style test) and assert GetCluster(name) is invoked and used by the resolver. Verified multicluster-runtime/provider are in go.mod and a ClusterManager + mock exist.

Locations: gateway/manager/targetcluster/graphql_test.go (≈lines 197–201); gateway/manager/mocks/mock_ClusterManager.go; gateway/manager/interfaces.go; listener/reconciler/kcp/reconciler.go:184 (example mcMgr.GetCluster usage).

gateway/manager/targetcluster/cluster.go (3)

147-151: Honor ClusterMetadata.Path when building the rest.Config (APIExport/virtual workspace endpoints).

metadata.Path is currently ignored; clusters that supply base paths (e.g., APIExport virtual workspaces) will break unless they embed the path into Host. Join host+path here to make both forms work.

@@
-	// Use common auth package
-	config, err := auth.BuildConfigFromMetadata(metadata.Host, authType, token, kubeconfig, certData, keyData, caData)
+	// Use common auth package. Allow either full host with path or separate fields.
+	host := strings.TrimRight(metadata.Host, "/")
+	if metadata.Path != "" {
+		host = host + "/" + strings.TrimLeft(metadata.Path, "/")
+	}
+	config, err := auth.BuildConfigFromMetadata(host, authType, token, kubeconfig, certData, keyData, caData)

Note: strings is already imported.


221-224: Make SSE detection robust to multi-value Accept headers.

Equality check can miss valid requests like text/event-stream, text/html;q=0.9. Use substring match.

-	if r.Header.Get("Accept") == "text/event-stream" {
+	if strings.Contains(r.Header.Get("Accept"), "text/event-stream") {
 		tc.graphqlServer.HandleSubscription(w, r, tc.handler.Schema)
 		return
 	}

100-103: Breaking change: schema metadata is now required. Verify all schemas and update docs.

Enforcing ClusterMetadata presence will fail previously valid setups. Confirm all shipped/expected schemas include x-cluster-metadata, and document the requirement/migration.

I can draft a short migration note and a JSON Schema snippet for validation if helpful.

listener/pkg/apischema/crd_resolver_test.go (2)

15-15: Avoid cross-package test coupling: move common mocks to shared testutil

Good switch to kcp mocks, but apischema tests now depend on reconciler/kcp internals. Consider relocating DiscoveryInterface/RESTMapper mocks to a shared test-only module (e.g., listener/testutil/mocks or internal/test/mocks) to keep package boundaries clean.


303-305: Add compile-time assertions for kcp mocks; optional parallel subtests

Search shows MockDiscoveryInterface and MockRESTMapper live in listener/reconciler/kcp/mocks and are used by tests in both packages (package apischema and apischema_test). Add small test-only files (no extra build tag required — use _test.go) to catch mock drift at compile time.

Suggested files (under listener/pkg/apischema):

listener/pkg/apischema/mock_assertions_package_test.go

package apischema

import (
	k8sMeta "k8s.io/apimachinery/pkg/api/meta"
	k8sDiscovery "k8s.io/client-go/discovery"

	kcpMocks "github.com/platform-mesh/kubernetes-graphql-gateway/listener/reconciler/kcp/mocks"
)

var _ k8sDiscovery.DiscoveryInterface = (*kcpMocks.MockDiscoveryInterface)(nil)
var _ k8sMeta.RESTMapper = (*kcpMocks.MockRESTMapper)(nil)

listener/pkg/apischema/mock_assertions_external_test.go

package apischema_test

import (
	k8sMeta "k8s.io/apimachinery/pkg/api/meta"
	k8sDiscovery "k8s.io/client-go/discovery"

	kcpMocks "github.com/platform-mesh/kubernetes-graphql-gateway/listener/reconciler/kcp/mocks"
)

var _ k8sDiscovery.DiscoveryInterface = (*kcpMocks.MockDiscoveryInterface)(nil)
var _ k8sMeta.RESTMapper = (*kcpMocks.MockRESTMapper)(nil)

Optional: mark per-test t.Run subtests with t.Parallel() if and only if your gomock controllers/mocks are safe for concurrent use.

listener/pkg/apischema/resolver_test.go (2)

12-12: Centralize mocks to reduce layering leakage

Same note: importing mocks from reconciler/kcp ties apischema tests to another layer. Prefer a shared testutil package for DiscoveryInterface/RESTMapper mocks.


67-69: Add OpenAPI error-path test; compile-time assertion present

  • Compile-time assertion already present in listener/pkg/apischema/resolver_test.go: var _ Resolver = (*ResolverProvider)(nil).
  • Add a subtest that exercises the OpenAPIV3().Paths error branch (add fmt to imports), e.g.:
{
  name: "openapi_paths_error",
  preferredResources: []*metav1.APIResourceList{{
    GroupVersion: "v1",
    APIResources: []metav1.APIResource{{Name: "pods", Kind: "Pod", Namespaced: true}},
  }},
  openAPIPaths: map[string]openapi.GroupVersion{"/api/v1": apischemaMocks.NewMockGroupVersion(t)},
  openAPIErr:  fmt.Errorf("boom"),
  wantErr:     true,
}
  • Optionally parallelize subtests with t.Parallel().
  • Quick grep: apischemaMocks and kcpMocks are present and correctly used; no lingering old mock imports found.
listener/reconciler/types.go (2)

12-18: Split provider vs reconciler concerns; don’t force Reconcile on all providers

Having Reconcile on ControllerProvider couples “manager providers” with reconcilers. Prefer two interfaces to keep roles clear and avoid requiring non‑reconciler coordinators to stub Reconcile.

Apply:

-type ControllerProvider interface {
-	Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error)
-	SetupWithManager(mgr ctrl.Manager) error
-	GetManager() ctrl.Manager
-}
+type ManagerProvider interface {
+	SetupWithManager(mgr ctrl.Manager) error
+	GetManager() ctrl.Manager
+}
+
+// For components that also reconcile:
+type ReconcilerProvider interface {
+	ManagerProvider
+	Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error)
+}

20-22: Deprecation plan for alias

CustomReconciler alias is fine; add an explicit removal timeline and deprecation notice to avoid drift.

-// TODO: Migrate usages to ControllerProvider and remove this alias
+// DEPRECATED: migrate usages to ManagerProvider/ReconcilerProvider by 2025‑12‑31, then remove.
listener/reconciler/kcp/helper_functions.go (2)

8-25: Preserve RawPath and normalize path handling

When stripping the path, also clear RawPath and normalize to avoid double slashes and odd encodings.

-		parsedURL.Path = ""
-		return parsedURL.String()
+		parsedURL.Path = ""
+		parsedURL.RawPath = ""
+		return strings.TrimRight(parsedURL.String(), "/")

27-48: Harden parsing; tolerate trailing slashes and odd forms

Use path.Clean and trim to avoid empty parts and handle “/services/apiexport///”.

-	pathParts := strings.Split(strings.Trim(parsedURL.Path, "/"), "/")
+	clean := strings.Trim(path.Clean(parsedURL.Path), "/")
+	pathParts := strings.Split(clean, "/")

Also consider supporting alternative prefixes if KCP evolves (e.g., constants).

listener/reconciler/kcp/cluster_path.go (2)

27-48: Preserve RawPath when rewriting Host

ConfigForKCPCluster should clear RawPath to avoid leaking an encoded path.

-	clusterCfgURL.Path = fmt.Sprintf("/clusters/%s", clusterName)
+	clusterCfgURL.Path = fmt.Sprintf("/clusters/%s", clusterName)
+	clusterCfgURL.RawPath = ""

84-102: Silent fallback hides discovery failures

Swallowing Get errors entirely makes triage harder. Consider wrapping and returning ErrGetLogicalCluster (or at least debug log via an injected logger) behind a feature flag.

listener/reconciler/kcp/export_test.go (1)

14-16: Avoid unsafe type asserts in test exports

NewClusterPathResolverExported should accept *runtime.Scheme directly to prevent panics.

-func NewClusterPathResolverExported(cfg *rest.Config, scheme interface{}) (*ClusterPathResolverProvider, error) {
-	return NewClusterPathResolver(cfg, scheme.(*runtime.Scheme))
-}
+func NewClusterPathResolverExported(cfg *rest.Config, scheme *runtime.Scheme) (*ClusterPathResolverProvider, error) {
+	return NewClusterPathResolver(cfg, scheme)
+}
listener/reconciler/kcp/cluster_path_test.go (2)

47-54: Align error expectation with exported constant to avoid flaky Contains().

Message casing/wording may differ; assert with the exported error string.

-            errContains: "failed to parse host URL",
+            errContains: kcp.ErrParseHostURLExported.Error(),

411-487: Prune unused fields in table (workspaces, listError) or exercise them.

They aren’t used by the test body anymore (the flow now mocks LogicalCluster Get). Removing them reduces confusion.

-   tests := []struct {
-       name         string
-       clusterHash  string
-       workspaces   []kcptenancy.Workspace
-       listError    error
-       expectedPath string
-       expectError  bool
-   }{
+   tests := []struct {
+       name         string
+       clusterHash  string
+       expectedPath string
+       expectError  bool
+   }{
listener/reconciler/kcp/reconciler_test.go (1)

171-230: Watcher test: consider slightly longer timeouts to de-flake CI.

100–200ms can be tight on loaded runners. 500ms–1s reduces false negatives without slowing the suite much.

listener/reconciler/kcp/reconciler.go (2)

55-56: Nit: update log copy to “manager”.

Message still says “reconciler”.

- log.Info().Msg("Setting up KCP reconciler with multicluster-provider")
+ log.Info().Msg("Setting up KCP manager with multicluster-provider")

297-304: Use errors.Is for not-exist checks instead of string matching.

String contains on error text is brittle; prefer os/fs sentinels.

+   // at top: 
+   // import (
+   //   "errors"
+   //   "io/fs"
+   //   "os"
+   // )
@@
- if err != nil && !strings.Contains(err.Error(), "file does not exist") && !strings.Contains(err.Error(), "no such file") {
+ if err != nil && !errors.Is(err, os.ErrNotExist) && !errors.Is(err, fs.ErrNotExist) {
     return fmt.Errorf("failed to read existing schema: %w", err)
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5db5aeb and 9919cf0.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (22)
  • cmd/listener.go (1 hunks)
  • gateway/manager/targetcluster/cluster.go (1 hunks)
  • gateway/manager/targetcluster/graphql.go (2 hunks)
  • gateway/manager/targetcluster/graphql_test.go (1 hunks)
  • go.mod (6 hunks)
  • listener/pkg/apischema/crd_resolver_test.go (2 hunks)
  • listener/pkg/apischema/mocks/mock_DiscoveryInterface.go (0 hunks)
  • listener/pkg/apischema/mocks/mock_RESTMapper.go (0 hunks)
  • listener/pkg/apischema/resolver_test.go (2 hunks)
  • listener/reconciler/kcp/apibinding_controller.go (0 hunks)
  • listener/reconciler/kcp/apibinding_controller_test.go (0 hunks)
  • listener/reconciler/kcp/cluster_path.go (2 hunks)
  • listener/reconciler/kcp/cluster_path_test.go (5 hunks)
  • listener/reconciler/kcp/export_test.go (2 hunks)
  • listener/reconciler/kcp/helper_functions.go (1 hunks)
  • listener/reconciler/kcp/mocks/mock_ClusterPathResolver.go (0 hunks)
  • listener/reconciler/kcp/mocks/mock_DiscoveryFactory.go (0 hunks)
  • listener/reconciler/kcp/reconciler.go (3 hunks)
  • listener/reconciler/kcp/reconciler_test.go (3 hunks)
  • listener/reconciler/kcp/virtual_workspace.go (2 hunks)
  • listener/reconciler/types.go (1 hunks)
  • tests/gateway_test/testdata/crd/core.platform-mesh.io_accounts.yaml (3 hunks)
💤 Files with no reviewable changes (6)
  • listener/reconciler/kcp/apibinding_controller_test.go
  • listener/reconciler/kcp/apibinding_controller.go
  • listener/reconciler/kcp/mocks/mock_DiscoveryFactory.go
  • listener/reconciler/kcp/mocks/mock_ClusterPathResolver.go
  • listener/pkg/apischema/mocks/mock_RESTMapper.go
  • listener/pkg/apischema/mocks/mock_DiscoveryInterface.go
🔇 Additional comments (15)
tests/gateway_test/testdata/crd/core.platform-mesh.io_accounts.yaml (2)

113-114: Description text changed: audit tests/snapshots and docs.

If any tests or docs assert the previous verbose condition description, update them to match the concise wording to prevent brittle failures.


6-6: Controller-gen version bump: ensure toolchain and regen are in sync.

tests/gateway_test/testdata/crd/core.platform-mesh.io_accounts.yaml:6 contains controller-gen.kubebuilder.io/version: v0.19.0; no repository pin for controller-gen was found. Confirm Makefile/CI/tools.go/.github/workflows (or equivalent) pin controller-gen to v0.19.0 and re-generate all CRDs to avoid drift.

gateway/manager/targetcluster/graphql.go (2)

63-71: Stashing “cluster” via ad‑hoc key won’t be honored by multicluster‑runtime; use typed helpers (and plan to integrate provider.Get/manager.GetCluster).

The multicluster runtime typically resolves cluster-scoped clients via its own APIs (e.g., mgr.GetCluster(ctx, name) / provider.Get(ctx, name)). Storing a cluster name under a stringly typed key won’t be consumed by those APIs and is brittle. Store it via the typed helper and have resolvers fetch the actual cluster client using the provider/manager.

Apply this minimal change now:

-		// Store the logical cluster name in context using logicalcluster.Name as key
-		r = r.WithContext(context.WithValue(r.Context(), LogicalClusterKey("cluster"), logicalcluster.Name(kcpWorkspaceName)))
+		// Store the logical cluster name in context using a private key.
+		r = r.WithContext(WithClusterName(r.Context(), logicalcluster.Name(kcpWorkspaceName)))

Follow-up (separate PR or in this one):

  • In resolvers, replace direct context key reads with:
    • name := ... // from ClusterNameFromContext(ctx)
    • cl, err := provider.Get(ctx, name.String()) or mgr.GetCluster(ctx, name.String())
    • use cl.GetClient() for k8s ops.

63-71: kcpWorkspaceKey is defined and used — keep the override.
kcpWorkspaceKey is declared in gateway/manager/targetcluster/registry.go (lines 22–24) and is written into the request context in the same file (lines 332–335); tests reference it in gateway/manager/targetcluster/registry_test.go (lines 215 and 269). Leave the override in graphql.go.

gateway/manager/targetcluster/cluster.go (1)

122-123: Pass an explicit Scheme into client.NewWithWatch (gateway/manager/targetcluster/cluster.go:122–123)

Make GVK/Scheme resolution explicit and future‑proof by passing client.Options{Scheme: } when creating the per‑cluster client. Repo inspection shows resolvers predominantly use unstructured.Unstructured (so current behavior is working today) and AddToScheme calls exist elsewhere (cmd/tests). If you expect typed CRDs, construct a runtime.NewScheme(), register core + apiextensions/custom APIs with AddToScheme, and pass that Scheme here.

listener/reconciler/kcp/virtual_workspace.go (2)

228-231: Good diagnostic: include name and URL in failure log

The added warn log improves observability without changing control flow. LGTM.


62-66: Require absolute URL, trim trailing slash, and keep TLS behaviour consistent

  • Use the parsed URL (u, err := url.Parse(workspace.URL)) and return ErrInvalidVirtualWorkspaceURL if u.Scheme=="" || u.Host=="". (listener/reconciler/kcp/virtual_workspace.go:63-66)
  • Avoid double slashes: build Host with strings.TrimRight(workspace.URL, "/") + "/clusters/root" in both kubeconfig and minimal-config branches. (listener/reconciler/kcp/virtual_workspace.go:81-90)
  • Do NOT flip TLSClientConfig.Insecure to false in this file without cross-module coordination — the project currently defaults to insecure (common/auth/config.go:30-31) and unit tests expect it (listener/reconciler/kcp/virtual_workspace_test.go:182-184). Gate “trust self-signed” behind appCfg or a central auth config and update tests if you change the default.
  • Add unit tests for missing scheme/host and trailing-slash handling.
go.mod (1)

3-3: Fix go.mod: use valid go directive and align controller-runtime / k8s versions

  • Update go directive to major.minor only.
-go 1.24.3
+go 1.24
  • Controller-runtime mismatch: require lists v0.22.1 but replace pins v0.21.0. Choose one:

    • Option A (preferred): remove the replace for sigs.k8s.io/controller-runtime and bump all k8s.* modules to the matching minor for controller-runtime v0.22.x.
    • Option B: change the require to v0.21.0 and align all k8s.* modules to the v0.21.x compatibility matrix.
  • Drop the golang.org/x/net replace; go.sum shows an indirect at v0.43.0, the replace pins v0.40.0 and should be removed unless strictly required.

-	golang.org/x/net => golang.org/x/net v0.40.0
+	// remove; allow indirect v0.43.0
  • Align k8s.io/* minors: go.mod replaces k8s.* to v0.33.3 while go.sum shows k8s.io/apiserver/component-base at v0.34.0 (indirect). Pick a single minor for all k8s modules and apply consistently (replace or require).

  • Location: go.mod (top-level go directive; replace block and require entries).

Confirm controller-runtime ↔ Kubernetes compatibility and which controller-runtime version sigs.k8s.io/multicluster-runtime v0.21.0-alpha.9 requires before applying Option A.

cmd/listener.go (2)

110-114: KCPManager integration: good switch.

Constructor/use swap to KCPManager reads clean and aligns with the multicluster provider refactor.


138-170: Manager boot wiring LGTM.

Health/ready checks and Start are correctly placed behind SetupWithManager().

listener/reconciler/kcp/cluster_path_test.go (2)

277-280: Fallback-to-cluster-name behavior: tests read well.

Updated expectations for missing annotation and client errors are consistent with the new fallback semantics.

Also applies to: 288-291


321-397: Nice coverage for PathForClusterFromConfig() patterns.

Good table of base/cluster/apiexport URLs and fallbacks.

listener/reconciler/kcp/reconciler_test.go (2)

47-58: Helper setup: solid.

Temp schema dir + test logger keeps tests hermetic.


96-101: Nil logger case covered.

Constructor validates logger; test correctly asserts error.

listener/reconciler/kcp/reconciler.go (1)

153-177: Verify multicluster manager startup
The current providerRunnable.Start only calls p.provider.Run(ctx, p.mcMgr) and never invokes mcMgr.Start. Confirm that provider.Run(ctx, mcMgr) internally starts the multicluster manager; if not, explicitly call m.mcMgr.Start(ctx) (via a runnable or within providerRunnable.Start) to ensure registered controllers reconcile.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/virtual-workspaces.md (1)

79-86: Route prefix is inconsistent: “virtual-workspace” vs “virtualworkspace”.

The pattern shows /virtual-workspace/..., but the example uses /virtualworkspace/.... Pick one and update all references; otherwise users will hit 404s.

Provide both forms if both are supported, or add a redirect note.

🧹 Nitpick comments (4)
gateway/manager/roundtripper/roundtripper.go (1)

141-153: Consider treating OpenAPI endpoints as discovery as well.

client-go and schema generators often hit /openapi/v2 or /openapi/v3. Whitelist them similarly to avoid 401s when tokens aren’t attached.

Apply this incremental diff:

@@
   switch {
   case len(parts) == 1 && (parts[0] == "api" || parts[0] == "apis"):
     return true // /api or /apis (root discovery endpoints)
@@
   case len(parts) == 3 && parts[0] == "apis":
     return true // /apis/<group>/<version> (group version discovery)
+  case len(parts) == 2 && parts[0] == "openapi" && (parts[1] == "v2" || strings.HasPrefix(parts[1], "v3")):
+    return true // /openapi/v2 or /openapi/v3 (schema discovery)
   default:
     return false
   }
docs/virtual-workspaces.md (3)

24-29: Config/env naming inconsistency.

Text references gateway-url-default-kcp-workspace, but later the env var is GATEWAY_URL_DEFAULT_KCP_WORKSPACE. Use one naming convention (prefer the env var name or document both clearly with precedence).


44-52: Duplicate/ambiguous “Configuration Options” section.

There are two “Configuration Options” headers. Consider merging them or renaming the second to “Global Defaults” to avoid confusion.


14-23: YAML robustness for targetWorkspace values with colons.

Colons in plain scalars are allowed but easy to get wrong. Recommend quoting values like targetWorkspace: "root:orgs:production" in examples to reduce copy/paste parse errors.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9919cf0 and fea6b34.

📒 Files selected for processing (3)
  • docs/virtual-workspaces.md (2 hunks)
  • gateway/manager/roundtripper/roundtripper.go (1 hunks)
  • listener/reconciler/kcp/virtual_workspace.go (9 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • listener/reconciler/kcp/virtual_workspace.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
gateway/manager/roundtripper/roundtripper.go (2)

129-139: Handle APIExport virtual-workspace prefixes by locating the first api|apis segment under /services/...

Stripping only /services/<service> misclassifies APIExport VWs (e.g. /services/apiexport///api[s]/...), causing discovery requests (sent without tokens) to be rejected (401) instead of being whitelisted.

  • Change in gateway/manager/roundtripper/roundtripper.go (≈ lines 129–139): if parts[0] == "services" then
    • if len(parts) >= 5 && parts[2] == "clusters" -> parts = parts[4:]
    • else find the first index i where parts[i] == "api" || parts[i] == "apis" and set parts = parts[i:]; return false if no such segment.
  • Add unit tests for:
    • /services/apiexport/root/export/api
    • /services/apiexport/root/export/apis
    • /services/apiexport/root/export/apis/group
    • /services/apiexport/root/export/apis/group/v1

Verification note: repository read failed in the sandbox; confirm the change and tests apply cleanly.


67-75: Incorrect review comment: adminRT does not cause privilege escalation in non-impersonation mode.

The review comment misunderstands the authentication flow. In non-impersonation mode (line 73), adminRT receives the user's bearer token via req.Header.Set("Authorization", "Bearer "+token) (line 69), which replaces any existing admin credentials. The Kubernetes API server authenticates based on this user token, not any admin credentials that may have been used to construct the transport.

Looking at the codebase:

  • adminRT is created by rest.HTTPClientFor() with cluster-specific TLS config (lines 263 in registry.go, 118 in cluster.go)
  • The roundtripper factory wraps this transport but preserves the bearer token authentication pattern
  • Discovery requests (line 57) legitimately use admin credentials since they don't carry user tokens
  • Local development mode (line 49) intentionally uses admin credentials

The authentication concern is unfounded - bearer tokens take precedence over transport-level credentials in Kubernetes client-go.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 330ffe4 and 5b62aa1.

📒 Files selected for processing (4)
  • cmd/listener.go (4 hunks)
  • listener/reconciler/clusteraccess/reconciler.go (2 hunks)
  • listener/reconciler/kcp/reconciler.go (3 hunks)
  • listener/reconciler/types.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
cmd/listener.go (5)
listener/reconciler/types.go (1)
  • ControllerProvider (14-18)
listener/reconciler/kcp/reconciler.go (1)
  • NewKCPManager (52-158)
listener/pkg/workspacefile/io_handler.go (1)
  • NewIOHandler (27-35)
listener/reconciler/clusteraccess/reconciler.go (1)
  • NewClusterAccessReconciler (35-63)
listener/pkg/apischema/resolver.go (1)
  • NewResolver (31-33)
listener/reconciler/clusteraccess/reconciler.go (1)
listener/reconciler/types.go (1)
  • ControllerProvider (14-18)
listener/reconciler/kcp/reconciler.go (6)
listener/pkg/workspacefile/io_handler.go (1)
  • IOHandler (17-21)
listener/pkg/apischema/resolver.go (1)
  • Resolver (23-25)
listener/reconciler/kcp/cluster_path.go (4)
  • ClusterPathResolverProvider (58-63)
  • NewClusterPathResolver (65-81)
  • PathForClusterFromConfig (145-208)
  • ConfigForKCPCluster (31-50)
listener/reconciler/types.go (1)
  • ReconcilerOpts (21-27)
listener/reconciler/kcp/discovery_client.go (1)
  • NewDiscoveryFactory (36-44)
listener/reconciler/kcp/schema_generator.go (1)
  • SchemaGenerationParams (15-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pipe / testSource / test
  • GitHub Check: pipe / dockerBuild / docker
  • GitHub Check: Analyze (go)
🔇 Additional comments (5)
listener/reconciler/kcp/reconciler.go (2)

258-279: Use the resolved workspace path for discovery/config.

After resolving workspacePath, the code still builds the rest.Config and discovery clients with clusterName (the hash). That recreates the exact bug we’re fixing—KCP rejects /clusters/<hash>, so schema generation fails every time. Build everything with workspacePath instead.

-	workspaceConfig, err := ConfigForKCPCluster(clusterName, m.mcMgr.GetLocalManager().GetConfig())
+	workspaceConfig, err := ConfigForKCPCluster(workspacePath, m.mcMgr.GetLocalManager().GetConfig())
@@
-	discoveryClient, err := discoveryFactory.ClientForCluster(clusterName)
+	discoveryClient, err := discoveryFactory.ClientForCluster(workspacePath)
@@
-	restMapper, err := discoveryFactory.RestMapperForCluster(clusterName)
+	restMapper, err := discoveryFactory.RestMapperForCluster(workspacePath)

212-216: Do not swallow non-NotFound APIBinding errors.

client.Get can fail for reasons other than a missing resource (e.g., RBAC, transport issues). Right now every error is treated as “not found,” so reconciliation silently skips and never retries. Please distinguish NotFound from real errors and propagate the latter.

@@
-	err = client.Get(ctx, req.NamespacedName, apiBinding)
-	if err != nil {
-		logger.Info().Msg("APIBinding not found, skipping reconciliation")
-		return ctrl.Result{}, nil
-	}
+	err = client.Get(ctx, req.NamespacedName, apiBinding)
+	if err != nil {
+		if apierrors.IsNotFound(err) {
+			logger.Info().Msg("APIBinding not found, skipping reconciliation")
+			return ctrl.Result{}, nil
+		}
+		logger.Error().Err(err).Msg("failed to get APIBinding")
+		return ctrl.Result{}, err
+	}

Remember to add:

+	apierrors "k8s.io/apimachinery/pkg/api/errors"
listener/reconciler/types.go (1)

12-18: Rename aligns with the new controller-provider abstraction.

The interface rename cleanly reflects the updated manager/provider model; the method contract stays consistent. No concerns from my side.

listener/reconciler/clusteraccess/reconciler.go (2)

35-43: Factory now returning ControllerProvider looks correct.

The constructor continues to hand back the concrete ClusterAccessReconciler, and its implementation already satisfies the new interface. All good here.


94-100: Signature update keeps external API in sync.

NewReconciler now advertising the ControllerProvider interface matches the rest of the migration and doesn’t alter behavior. Looks good.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
common/watcher/watcher_test.go (1)

393-394: Reduce time-based flakiness: prefer Eventually over fixed sleeps

Fixed sleeps are brittle under parallel scheduling. Replace the post‑write sleep with an Eventually check.

Apply this diff:

-	// Give more time for file change to be detected and debounced (increased for parallel execution)
-	time.Sleep(300 * time.Millisecond) // 50ms debounce + extra buffer for parallel tests
+	// Wait until the change is observed instead of fixed sleeping
+	require.Eventually(t, func() bool {
+		return len(handler.OnFileChangedCalls) >= 1
+	}, 2*time.Second, 20*time.Millisecond, "Expected at least 1 file change call")

Also applies to: 400-401

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5b62aa1 and 3ab0b6e.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (4)
  • common/mocks/mock_Client.go (1 hunks)
  • common/mocks/mock_WithWatch.go (1 hunks)
  • common/watcher/watcher_test.go (4 hunks)
  • go.mod (6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pipe / dockerBuild / docker
  • GitHub Check: pipe / lint / lint
  • GitHub Check: pipe / testSource / test
  • GitHub Check: Analyze (go)
🔇 Additional comments (8)
go.mod (1)

3-3: Fix the invalid go directive.

Modules must pin go to major.minor; go 1.24.3 makes go mod tidy fail. Switch it to go 1.24 (no patch) so tooling works.

-go 1.24.3
+go 1.24
common/mocks/mock_Client.go (2)

32-55: LGTM! The Apply mock method is correctly implemented.

The implementation follows the established pattern for handling variadic options and properly supports both direct error returns and function-based returns. The panic on missing return value ensures test failures are caught early.


57-92: LGTM! Typed helper methods follow consistent patterns.

The MockClient_Apply_Call helper and its associated methods (Run, Return, RunAndReturn) correctly implement the typed shadow pattern used throughout this mock file, providing type safety for test expectations.

common/mocks/mock_WithWatch.go (2)

34-57: LGTM! Apply method implementation is consistent with MockClient.

The implementation correctly handles the runtime.ApplyConfiguration interface and maintains consistency with the pattern established in mock_Client.go.


59-94: LGTM! Helper methods properly implement the typed mock pattern.

The MockWithWatch_Apply_Call structure and its methods provide proper type safety for test expectations, following the same pattern as other methods in this file.

common/watcher/watcher_test.go (3)

365-366: t.Parallel is fine here; verify CI flake risk

Good call enabling parallelism. These tests hit fsnotify and the FS; please sanity‑check with high parallelism to avoid FD limits/flakiness (e.g., run with -race -count=50 -parallel=8 in CI) and confirm stability.


631-632: Same parallelism caveat as above

See comment on Line 365 about verifying parallel stability with -race/-count/-parallel.


983-984: Same parallelism caveat as above

See comment on Line 365 about verifying parallel stability with -race/-count/-parallel.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (9)
listener/reconciler/kcp/reconciler_test.go (9)

3-9: Use errors.Is for context errors; add stdlib errors import

This enables robust checks for context cancellation/deadline cases used later.

-import (
-	"context"
+import (
+	"context"
+	"errors"
 	"os"
 	"path/filepath"
 	"testing"
 	"time"

37-46: Re-use a single Scheme instance for opts and manager

Avoids subtle divergence between client and manager schemes; keeps type registrations identical.

 func createTestReconcilerOpts() reconciler.ReconcilerOpts {
-	return reconciler.ReconcilerOpts{
-		Config: &rest.Config{Host: "https://kcp.example.com/services/apiexport/root/core.platform-mesh.io"},
-		Scheme: createTestScheme(),
-		ManagerOpts: ctrl.Options{
-			Metrics: server.Options{BindAddress: "0"},
-			Scheme:  createTestScheme(),
-		},
-	}
+	scheme := createTestScheme()
+	return reconciler.ReconcilerOpts{
+		Config: &rest.Config{Host: "https://kcp.example.com/services/apiexport/root/core.platform-mesh.io"},
+		Scheme: scheme,
+		ManagerOpts: ctrl.Options{
+			Metrics: server.Options{BindAddress: "0"},
+			Scheme:  scheme,
+		},
+	}
 }

63-127: Avoid brittle string matching on error messages

The “invalid_path” case asserts on a specific error substring. Prefer checking only that an error occurred, or match a typed/sentinel error if one exists. Error text is implementation/detail- and OS‑dependent and may change.


167-226: Deflake virtual workspace watcher test: increase timeouts and assert expected cancellation

Short 100–200ms timeouts are brittle on CI. Also, assert cancellation semantics instead of only logging.

-			timeout:     100 * time.Millisecond,
+			timeout:     1 * time.Second,
@@
-			timeout:   200 * time.Millisecond,
+			timeout:   2 * time.Second,
@@
-			timeout:     100 * time.Millisecond,
+			timeout:     1 * time.Second,
@@
-			if tt.expectErr {
-				assert.Error(t, err)
-			} else {
-				// Should either succeed or be cancelled by context timeout
-				if err != nil {
-					t.Logf("Got error (possibly expected): %v", err)
-				}
-			}
+			if tt.expectErr {
+				assert.Error(t, err)
+			} else if err != nil {
+				assert.True(t, errors.Is(err, context.DeadlineExceeded) || errors.Is(err, context.Canceled))
+			}

228-272: Optionally strengthen assertions for workspace path resolution

If feasible, assert concrete invariants (e.g., “root” maps to a normalized path form) to catch regressions, rather than only length.

Would you like me to add table cases asserting specific normalized outputs if the resolver has a stable contract?


274-319: Don’t swallow errors; assert behavior explicitly (or at least no panics)

The test sets expectErr but doesn’t use it, and logs any error without assertion. If the intent is “should not panic regardless of connectivity,” assert NotPanics.

-			err := exported.GenerateAndWriteSchemaForWorkspace(ctx, tt.workspacePath, tt.clusterName)
-
-			if tt.expectErr {
-				assert.Error(t, err)
-			} else {
-				// The function may return an error due to connection issues in tests,
-				// but we're testing that it doesn't panic and handles the parameters correctly
-				t.Logf("Schema generation result for %s: %v", tt.workspacePath, err)
-			}
+			assert.NotPanics(t, func() {
+				_ = exported.GenerateAndWriteSchemaForWorkspace(ctx, tt.workspacePath, tt.clusterName)
+			})

321-363: Use errors.Is for context error comparison

Equality on context errors is brittle; prefer errors.Is and keep the require/assert flow tight.

-			if tt.expectErr {
-				assert.Error(t, err)
-				// Should return context.DeadlineExceeded or context.Canceled
-				assert.True(t, err == context.DeadlineExceeded || err == context.Canceled)
-			} else {
+			if tt.expectErr {
+				require.Error(t, err)
+				assert.True(t, errors.Is(err, context.DeadlineExceeded) || errors.Is(err, context.Canceled))
+			} else {
 				assert.NoError(t, err)
 			}

365-383: Consider deduping “edge cases” with earlier tests

These checks duplicate prior assertions; consolidating reduces runtime and maintenance.


384-426: Add a positive-path test for ReconcileAPIBinding

A success case (with a known cluster in the provider) would better guard behavior alongside the “not found” and “empty” cases. If direct wiring is hard, consider a fake/mocked multicluster manager.

I can provide a lightweight fake provider/manager to exercise the happy path if you share the ReconcileAPIBinding contract.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3225d75 and d01584a.

📒 Files selected for processing (2)
  • listener/reconciler/kcp/export_test.go (2 hunks)
  • listener/reconciler/kcp/reconciler_test.go (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
listener/reconciler/kcp/export_test.go (2)
listener/reconciler/kcp/cluster_path.go (4)
  • ConfigForKCPCluster (31-50)
  • ClusterPathResolverProvider (58-63)
  • NewClusterPathResolver (65-81)
  • PathForClusterFromConfig (145-208)
listener/reconciler/kcp/reconciler.go (1)
  • KCPManager (40-50)
listener/reconciler/kcp/reconciler_test.go (4)
listener/reconciler/types.go (1)
  • ReconcilerOpts (21-27)
common/config/config.go (1)
  • Config (3-37)
listener/reconciler/kcp/reconciler.go (2)
  • KCPManager (40-50)
  • NewKCPManager (52-158)
listener/reconciler/kcp/export_test.go (1)
  • ExportedKCPManager (55-57)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pipe / testSource / test
  • GitHub Check: Analyze (go)
🔇 Additional comments (6)
listener/reconciler/kcp/export_test.go (1)

20-22: Fix the unsafe scheme type assertion

scheme.(*runtime.Scheme) will still panic when a caller passes anything but *runtime.Scheme—exactly the issue flagged in the previous review. Please gate the cast and return a descriptive error instead of crashing.

Apply this diff:

@@
-import (
-	"context"
+import (
+	"context"
+	"fmt"
@@
-func NewClusterPathResolverExported(cfg *rest.Config, scheme interface{}, log *logger.Logger) (*ClusterPathResolverProvider, error) {
-	return NewClusterPathResolver(cfg, scheme.(*runtime.Scheme), log)
+func NewClusterPathResolverExported(cfg *rest.Config, scheme interface{}, log *logger.Logger) (*ClusterPathResolverProvider, error) {
+	runtimeScheme, ok := scheme.(*runtime.Scheme)
+	if !ok {
+		return nil, fmt.Errorf("scheme must be of type *runtime.Scheme, got %T", scheme)
+	}
+	return NewClusterPathResolver(cfg, runtimeScheme, log)
 }
listener/reconciler/kcp/reconciler_test.go (5)

30-35: LGTM: Scheme helper is clear and scoped

Registration for kcp APIs is fine for tests.


48-59: LGTM: Test manager factory

Isolated temp dir and hidden logger keep tests hermetic.


129-134: LGTM: GetManager happy-path

Covers initialized state.


136-157: LGTM: Reconcile is a no-op and stable across calls

Good defensive checks.


159-165: LGTM: SetupWithManager against own manager

Covers basic wiring.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (16)
listener/reconciler/kcp/cluster_path_test.go (3)

287-291: Consider documenting the error handling fallback.

Similar to the missing path annotation case, client errors now fall back to returning the cluster name. This could mask underlying connectivity issues. Consider logging warnings when falling back to help with debugging.


433-435: Document the empty string return for malformed paths.

The function returns an empty string for paths with empty segments (//empty), which differs from other fallback cases that return the cluster name. This inconsistency might be confusing. Consider either:

  1. Documenting this special case behavior
  2. Making it consistent by returning the cluster name as fallback

276-280: Document fallback behavior to cluster name
Add a note in the public docs (e.g. README or the GoDoc for GetClusterPath in listener/reconciler/kcp/cluster_path.go) explaining that when the path annotation is missing, the cluster name is now returned as a fallback instead of producing an error.

listener/reconciler/kcp/export_test.go (1)

68-74: Consider adding nil check for provider and mcMgr.

The CreateProviderRunnableForTesting method directly accesses KCPManager.provider and KCPManager.mcMgr without checking if they're initialized. This could cause nil pointer dereference in tests if the manager isn't fully initialized.

 func (e *ExportedKCPManager) CreateProviderRunnableForTesting(log *logger.Logger) ProviderRunnableInterface {
+	if e.KCPManager.provider == nil || e.KCPManager.mcMgr == nil {
+		return nil
+	}
 	return &providerRunnable{
 		provider: e.KCPManager.provider,
 		mcMgr:    e.KCPManager.mcMgr,
 		log:      log,
 	}
 }
listener/reconciler/kcp/helper_functions_test.go (8)

3-8: Import require and use it for fatal assertions

Use require for error/no-error checks to stop executing subsequent assertions in failing branches.

 import (
 	"testing"
 
 	"github.com/platform-mesh/kubernetes-graphql-gateway/listener/reconciler/kcp"
-	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/assert"
+	"github.com/stretchr/testify/require"
 )

10-74: Confirm intent: preserving query/fragment on stripped base URL

The expectations keep query/fragment (e.g., "https://kcp.example.com?timeout=30s"). If this base URL feeds rest.Config.Host or HTTP clients, query/fragment are typically undesired. Please confirm; if not intended, adjust expectations to drop them.

If dropping query/fragment is desired, update expected values:

-			expected: "https://kcp.example.com?timeout=30s",
+			expected: "https://kcp.example.com",
...
-			expected: "https://kcp.example.com#section",
+			expected: "https://kcp.example.com",

10-10: Enable parallel test execution

These tests are pure and can safely run in parallel to speed up CI.

 func TestStripAPIExportPath(t *testing.T) {
+	t.Parallel()

68-73: Run subtests in parallel

Further parallelization per case.

 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
+			t.Parallel()
 			result := kcp.StripAPIExportPathExported(tt.input)
 			assert.Equal(t, tt.expected, result)
 		})
 	}

76-188: Avoid brittle string matching for errors

Asserting substrings in error messages is fragile. Prefer sentinel errors and errors.Is, or exported typed errors, so refactors don’t break tests.

Follow-up options:

  • Export sentinel vars (e.g., kcp.ErrNotAPIExportURL, kcp.ErrInvalidAPIExportURLFormat) and use require.ErrorIs/assert.ErrorIs.
  • If not exporting, at least standardize messages in one place and reference constants in tests.

Also consider adding a couple of edge cases:

  • No-scheme host (e.g., "kcp.example.com/services/apiexport/root/export")
  • Percent-encoded workspace segments (e.g., "root:teams:my%2Fteam")
  • Double slashes in path (".../services/apiexport//root/export")
  • Upper/lowercase path components sensitivity

76-76: Enable parallel test execution

Parallelize the second test suite as well.

 func TestExtractAPIExportRef(t *testing.T) {
+	t.Parallel()

190-206: Use require for error/no-error checks to short-circuit

Prevents follow-up asserts from running in invalid states.

-			if tt.expectErr {
-				assert.Error(t, err)
+			if tt.expectErr {
+				require.Error(t, err)
 				if tt.errContains != "" {
 					assert.Contains(t, err.Error(), tt.errContains)
 				}
 				assert.Empty(t, workspace)
 				assert.Empty(t, export)
 			} else {
-				assert.NoError(t, err)
+				require.NoError(t, err)
 				assert.Equal(t, tt.expectedWorkspace, workspace)
 				assert.Equal(t, tt.expectedExport, export)
 			}

190-206: Run subtests in parallel

Safe here as well.

 	for _, tt := range tests {
 		t.Run(tt.name, func(t *testing.T) {
+			t.Parallel()
 			workspace, export, err := kcp.ExtractAPIExportRefExported(tt.input)
gateway/manager/context/keys_test.go (4)

20-32: Also assert parent context is unmodified

After creating ctxWithToken, verify the original ctx does not contain the token to reinforce immutability semantics.

 		// Store token in context
 		ctxWithToken := ctxkeys.WithToken(ctx, token)
 		assert.NotEqual(t, ctx, ctxWithToken, "Context should be different after adding token")
 
+		// Parent context must remain unmodified
+		origToken, origOK := ctxkeys.TokenFromContext(ctx)
+		assert.False(t, origOK, "Parent context should not contain token")
+		assert.Empty(t, origToken)
+
 		// Retrieve token from context
 		retrievedToken, ok := ctxkeys.TokenFromContext(ctxWithToken)
 		assert.True(t, ok, "Token should be found in context")
 		assert.Equal(t, token, retrievedToken, "Retrieved token should match stored token")

173-227: Strengthen overwrite test: verify old contexts keep old values

Keep intermediate contexts to assert that older contexts still return the previous values, while the latest returns the new ones.

 	t.Run("overwriting_values", func(t *testing.T) {
 		ctx := context.Background()
 
 		// Store initial values
 		initialToken := "initial-token"
 		initialWorkspace := "initial-workspace"
 
-		ctx = ctxkeys.WithToken(ctx, initialToken)
-		ctx = ctxkeys.WithKcpWorkspace(ctx, initialWorkspace)
+		ctx1 := ctxkeys.WithToken(ctx, initialToken)
+		ctx1 = ctxkeys.WithKcpWorkspace(ctx1, initialWorkspace)
 
 		// Overwrite with new values
 		newToken := "new-token"
 		newWorkspace := "new-workspace"
 
-		ctx = ctxkeys.WithToken(ctx, newToken)
-		ctx = ctxkeys.WithKcpWorkspace(ctx, newWorkspace)
+		ctx2 := ctxkeys.WithToken(ctx1, newToken)
+		ctx2 = ctxkeys.WithKcpWorkspace(ctx2, newWorkspace)
 
-		// Verify new values are retrieved
-		retrievedToken, tokenOk := ctxkeys.TokenFromContext(ctx)
-		retrievedWorkspace, workspaceOk := ctxkeys.KcpWorkspaceFromContext(ctx)
+		// Old context still returns initial values
+		oldToken, oldTokenOk := ctxkeys.TokenFromContext(ctx1)
+		oldWorkspace, oldWorkspaceOk := ctxkeys.KcpWorkspaceFromContext(ctx1)
+		assert.True(t, oldTokenOk, "Token should be retrievable from old context")
+		assert.True(t, oldWorkspaceOk, "Workspace should be retrievable from old context")
+		assert.Equal(t, initialToken, oldToken, "Old context should return initial token")
+		assert.Equal(t, initialWorkspace, oldWorkspace, "Old context should return initial workspace")
+
+		// New context returns new values
+		retrievedToken, tokenOk := ctxkeys.TokenFromContext(ctx2)
+		retrievedWorkspace, workspaceOk := ctxkeys.KcpWorkspaceFromContext(ctx2)
 
 		assert.True(t, tokenOk, "Token should be retrievable")
 		assert.True(t, workspaceOk, "Workspace should be retrievable")
 
 		assert.Equal(t, newToken, retrievedToken, "Should get new token value")
 		assert.Equal(t, newWorkspace, retrievedWorkspace, "Should get new workspace value")
 	})

173-227: Add a safety check that context keys are distinct

Validates no accidental key collisions across token/workspace/cluster name.

 	t.Run("different_keys_dont_interfere", func(t *testing.T) {
 		...
 	})
 
+	t.Run("keys_are_distinct", func(t *testing.T) {
+		assert.NotEqual(t, ctxkeys.TokenKey(), ctxkeys.KcpWorkspaceKey(), "TokenKey and KcpWorkspaceKey must be different")
+		assert.NotEqual(t, ctxkeys.TokenKey(), ctxkeys.ClusterNameKey(), "TokenKey and ClusterNameKey must be different")
+		assert.NotEqual(t, ctxkeys.KcpWorkspaceKey(), ctxkeys.ClusterNameKey(), "KcpWorkspaceKey and ClusterNameKey must be different")
+	})
+
 	t.Run("overwriting_values", func(t *testing.T) {
 		...
 	})

13-13: Run tests in parallel for faster CI

Top-level test functions can safely run in parallel. Add t.Parallel() at the start of each.

 func TestTokenContextHelpers(t *testing.T) {
+	t.Parallel()

Apply similarly to:

  • TestKcpWorkspaceContextHelpers
  • TestClusterNameContextHelpers
  • TestContextKeyIsolation
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d01584a and bcc07f3.

📒 Files selected for processing (4)
  • gateway/manager/context/keys_test.go (1 hunks)
  • listener/reconciler/kcp/cluster_path_test.go (7 hunks)
  • listener/reconciler/kcp/export_test.go (2 hunks)
  • listener/reconciler/kcp/helper_functions_test.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
gateway/manager/context/keys_test.go (1)
gateway/manager/context/keys.go (9)
  • TokenKey (31-33)
  • WithToken (36-38)
  • TokenFromContext (41-44)
  • KcpWorkspaceKey (49-51)
  • WithKcpWorkspace (54-56)
  • KcpWorkspaceFromContext (59-62)
  • ClusterNameKey (67-69)
  • WithClusterName (72-74)
  • ClusterNameFromContext (77-80)
listener/reconciler/kcp/cluster_path_test.go (1)
listener/reconciler/kcp/export_test.go (3)
  • NewClusterPathResolverExported (20-22)
  • NewClusterPathResolverProviderWithFactory (90-97)
  • PathForClusterFromConfigExported (24-26)
listener/reconciler/kcp/helper_functions_test.go (1)
listener/reconciler/kcp/export_test.go (2)
  • StripAPIExportPathExported (86-86)
  • ExtractAPIExportRefExported (87-87)
listener/reconciler/kcp/export_test.go (2)
listener/reconciler/kcp/cluster_path.go (4)
  • ConfigForKCPCluster (31-50)
  • ClusterPathResolverProvider (58-63)
  • NewClusterPathResolver (65-81)
  • PathForClusterFromConfig (145-208)
listener/reconciler/kcp/reconciler.go (1)
  • KCPManager (40-50)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pipe / testSource / test
  • GitHub Check: Analyze (go)
🔇 Additional comments (9)
listener/reconciler/kcp/cluster_path_test.go (4)

125-125: LGTM!

The test correctly validates the new logger requirement for the ClusterPathResolver constructor.


179-179: LGTM!

The test properly initializes the resolver with the logger parameter.


299-305: LGTM!

The test correctly instantiates the resolver with the required logger and validates the PathForCluster method behavior through the provider interface.


327-451: Comprehensive test coverage for path resolution scenarios.

The test suite thoroughly covers various URL patterns including cluster URLs, virtual workspace URLs, edge cases with query parameters and fragments. Good coverage of fallback behavior when paths cannot be extracted.

listener/reconciler/kcp/export_test.go (3)

20-22: Type assertion needs safety check.

The interface{} cast to *runtime.Scheme could panic if wrong type is passed. Add type assertion safety.


55-78: LGTM! Well-structured test helper wrapper.

The ExportedKCPManager wrapper cleanly exposes private methods for testing without modifying the production API surface. The delegation pattern is appropriate for test utilities.


90-97: LGTM!

The factory function properly accepts and assigns the logger parameter to the provider, maintaining consistency with the updated constructor pattern.

listener/reconciler/kcp/helper_functions_test.go (1)

1-209: Exported test shims and error messages verified
StripAPIExportPathExported and ExtractAPIExportRefExported are present, and helper_functions.go error messages contain the expected substrings.

gateway/manager/context/keys_test.go (1)

1-11: LGTM: correct package boundary and imports

External test package usage is clean, and the logicalcluster v3 import matches best practices. Based on learnings.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bcc07f3 and 988e7b7.

📒 Files selected for processing (1)
  • gateway/schema/scalars_test.go (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
gateway/schema/scalars_test.go (1)
gateway/schema/exports_test.go (2)
  • StringMapScalarForTest (5-5)
  • JSONStringScalarForTest (6-6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pipe / testSource / test
  • GitHub Check: pipe / dockerBuild / docker
  • GitHub Check: Analyze (go)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
common/auth/metadata_injector_test.go (1)

631-706: Prune the unused expectError flag.

The table-driven cases define expectError, but the loop never reads it, so it just adds noise and suggests missing assertions. Drop the field or add assertions that exercise the error path.

gateway/manager/targetcluster/export_test.go (1)

34-37: Test-only wrapper is fine; consider removing if not needed

Because registry_test.go uses the same package (targetcluster), it can call unexported methods directly. Keep this wrapper only if you intend to exercise extractClusterNameFromPath from external tests (package targetcluster_test).

gateway/manager/targetcluster/cluster_test.go (1)

359-395: Run subtests in parallel and avoid range variable capture

Make the table-driven subtests parallel and rebind tt to avoid the classic capture pitfall.

Apply this diff:

 func TestTargetCluster_GetName(t *testing.T) {
+	t.Parallel()
 	tests := []struct {
 		name        string
 		clusterName string
 	}{
 		{
 			name:        "simple_name",
 			clusterName: "production",
 		},
 		{
 			name:        "name_with_dashes",
 			clusterName: "staging-cluster",
 		},
 		{
 			name:        "virtual_workspace_name",
 			clusterName: "virtual-workspace/my-workspace",
 		},
 		{
 			name:        "empty_name",
 			clusterName: "",
 		},
 		{
 			name:        "single_character",
 			clusterName: "a",
 		},
 	}
 
 	for _, tt := range tests {
-		t.Run(tt.name, func(t *testing.T) {
+		tt := tt
+		t.Run(tt.name, func(t *testing.T) {
+			t.Parallel()
 			tc := targetcluster.NewTestTargetCluster(tt.clusterName)
 
 			result := tc.GetName()
 
 			assert.Equal(t, tt.clusterName, result)
 		})
 	}
 }
gateway/manager/targetcluster/registry_test.go (3)

225-243: Add t.Parallel and a concurrent Close test

Mark the test parallel and consider exercising Close concurrently to catch potential race conditions in the registry’s shutdown.

Apply this diff:

 func TestClusterRegistry_Close(t *testing.T) {
+	t.Parallel()
 	log := testlogger.New().HideLogOutput().Logger
 	appCfg := appConfig.Config{}
 
 	registry := NewClusterRegistry(log, appCfg, nil)
 
 	// Test that Close() doesn't panic and returns no error
 	err := registry.Close()
 
 	if err != nil {
 		t.Errorf("Close() returned error: %v", err)
 	}
 
 	// Test that Close() can be called multiple times without error
 	err = registry.Close()
 	if err != nil {
 		t.Errorf("Second Close() call returned error: %v", err)
 	}
 }

Optionally add:

t.Run("concurrent_close", func(t *testing.T) {
  t.Parallel()
  registry := NewClusterRegistry(log, appCfg, nil)
  t.Cleanup(func() { _ = registry.Close() })
  var wg sync.WaitGroup
  for i := 0; i < 8; i++ {
    wg.Add(1)
    go func() { defer wg.Done(); _ = registry.Close() }()
  }
  wg.Wait()
})

245-342: Normalize path separators; adjust Windows-path expectation

The “windows_style_path” case currently expects the full path minus extension, which deviates from other cases that yield the base filename (cluster name). Prefer normalizing backslashes to forward slashes, then extracting the base sans extension so Windows-style inputs still resolve to “cluster-name.”

Apply this diff to the test expectation:

-		{
-			name:           "windows_style_path",
-			schemaFilePath: "C:\\path\\definitions\\cluster-name.json",
-			expected:       "C:\\path\\definitions\\cluster-name",
-		},
+		{
+			name:           "windows_style_path",
+			schemaFilePath: "C:\\path\\definitions\\cluster-name.json",
+			expected:       "cluster-name",
+		},

And update the implementation (outside this file) along these lines:

// inside (*ClusterRegistry).extractClusterNameFromPath
p := strings.ReplaceAll(schemaFilePath, "\\", "/")
base := path.Base(p)                 // import "path"
name := strings.TrimSuffix(base, path.Ext(base))
if name == "" {
    return "."
}
return name

245-252: Mark this test parallel

Small speedup for the suite.

Apply this diff:

-func TestClusterRegistry_ExtractClusterNameFromPath(t *testing.T) {
+func TestClusterRegistry_ExtractClusterNameFromPath(t *testing.T) {
+	t.Parallel()
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 988e7b7 and b808fd2.

📒 Files selected for processing (7)
  • common/auth/export_test.go (1 hunks)
  • common/auth/metadata_injector_test.go (1 hunks)
  • gateway/manager/roundtripper/export_test.go (1 hunks)
  • gateway/manager/roundtripper/roundtripper_test.go (6 hunks)
  • gateway/manager/targetcluster/cluster_test.go (1 hunks)
  • gateway/manager/targetcluster/export_test.go (1 hunks)
  • gateway/manager/targetcluster/registry_test.go (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • gateway/manager/roundtripper/roundtripper_test.go
🧰 Additional context used
🧬 Code graph analysis (5)
gateway/manager/targetcluster/export_test.go (1)
gateway/manager/targetcluster/registry.go (1)
  • ClusterRegistry (23-29)
common/auth/export_test.go (1)
common/auth/metadata_injector.go (1)
  • MetadataInjector (32-35)
gateway/manager/targetcluster/registry_test.go (2)
gateway/manager/context/keys.go (3)
  • KcpWorkspaceFromContext (59-62)
  • WithKcpWorkspace (54-56)
  • TokenFromContext (41-44)
gateway/manager/targetcluster/registry.go (1)
  • NewClusterRegistry (32-43)
common/auth/metadata_injector_test.go (1)
common/auth/metadata_injector.go (1)
  • NewMetadataInjector (38-43)
gateway/manager/targetcluster/cluster_test.go (1)
gateway/manager/targetcluster/export_test.go (1)
  • NewTestTargetCluster (16-20)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: pipe / testSource / test
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
common/auth/export_test.go (1)

3-6: Helper wrapper looks good.

Exposing extractCAFromKubeconfigB64 through this test-only wrapper keeps the production API clean while enabling focused tests.

gateway/manager/targetcluster/registry_test.go (1)

214-221: Good move to centralized context helpers

Using ctxkeys.KcpWorkspaceFromContext, WithKcpWorkspace, and TokenFromContext improves consistency and avoids hardcoded context keys.

Also applies to: 387-388, 400-407

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
gateway/schema/scalars.go (1)

33-44: Consistency nit: unify any vs interface{} usage

Minor style nit: this file mixes interface{} and any. Consider standardizing on any for Go 1.18+ across Serialize/ParseValue/ParseLiteral signatures for consistency.

Also applies to: 74-118

go.mod (1)

20-20: Pseudo‑version for multicluster‑provider

Looks fine short‑term. When a stable tag is cut, consider switching to a semver tag to ease reproducibility and dependency hygiene.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b808fd2 and e8beffd.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (3)
  • gateway/schema/scalars.go (1 hunks)
  • gateway/schema/scalars_test.go (3 hunks)
  • go.mod (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • gateway/schema/scalars_test.go
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pipe / testSource / test
  • GitHub Check: pipe / lint / lint
  • GitHub Check: Analyze (go)
🔇 Additional comments (4)
go.mod (4)

52-56: Indirect bumps look OK; tidy afterward

The indirect upgrades (go‑restful, cbor) are consistent with newer k8s and apiserver stacks. Remember to run go mod tidy to keep go.sum in sync.


133-140: Core k8s ecosystem updates acknowledged

apiserver/component‑base and structured‑merge‑diff/v6 align with your k8s v0.34 pins. No action needed.


3-3: Invalid go directive (patch not allowed)

Use a minor-only version; fix to 1.24.

-go 1.24.3
+go 1.24

34-41: Ensure multicluster-runtime v0.21.0-alpha.9 is compatible with controller-runtime v0.22.1
Confirmed go.mod and go list pick sigs.k8s.io/controller-runtime v0.22.1 on Go 1.24.3; run go build ./... to catch any API drift. If it fails, bump multicluster-runtime to a CR-v0.22-compatible tag or downgrade controller-runtime to v0.21.

Comment on lines +5 to +7
// this PR introduces newer version of graphiQL that supports headers
// https://github.com/graphql-go/handler/pull/93
replace github.com/graphql-go/handler => github.com/vertex451/handler v0.0.0-20250124125145-ed328e3cf42a
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Personal fork in replace: document risk and exit plan

Replacing github.com/graphql-go/handler with a user fork poses supply‑chain and reproducibility risks. Prefer:

  • Hosting the fork under the org (e.g., platform‑mesh/handler) with branch protection; or
  • Avoiding the fork once upstream merges the needed GraphiQL headers support.

At minimum:

  • Keep the reason + link (you did), add an issue to track removal, and confirm the fork is public and immutable (tag or commit signed). I can help open the tracking issue.

🏁 Script executed:

#!/bin/bash
# Check if the fork repo is public
curl -s https://api.github.com/repos/vertex451/handler | jq '{private: .private}'
# List existing tags (show first 10 for brevity)
curl -s https://api.github.com/repos/vertex451/handler/tags | jq '.[].name' | head -n 10
# Verify that the replace commit exists in the fork
curl -s https://api.github.com/repos/vertex451/handler/commits/ed328e3cf42a | jq '{sha: .sha}'
# Check upstream PR #93 status
curl -s https://api.github.com/repos/graphql-go/handler/pulls/93 | jq '{state: .state, merged: .merged, merged_at: .merged_at}'

Length of output: 140


Document fork risk and ensure immutability

  • The fork vertex451/handler is public and contains commit ed328e3cf42ae911f3477beb503ae630d2c02f31.
  • Upstream PR #93 on graphql-go/handler remains open; open an issue to track removal of this replace once it’s merged.
  • Tag or GPG-sign the fork’s commit to guarantee immutability.

Copy link

@pteich pteich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with the changes in general. I left some suggestions, maybe for future improvements.

cmd/gateway.go Outdated

if err := initializeSentry(ctx, log); err != nil {
log.Fatal().Err(err).Msg("Failed to initialize Sentry")
log.Error().Err(err).Msg("Failed to initialize Sentry")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity: What is the reason to switch to log.Error() as log.Fatal() also das the os.Exit(1)- is it the log level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After your comment I double checked the behavior and log.Fatal does basically the same as log.Error + os.Exit, but it additionally it flushes any buffered messages.
Replaced log.Error + os.Exit with log.Fatal


// Determine exit reason: error-triggered vs. normal signal
select {
case err := <-errCh:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh i am not fully happy with the context and error handling, maybe we find a better solution in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fix context propagation, but I am not sure what would you improve here in this select statement.
Please, give details if you have any, and I will create an issue.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attention, untested: This could be an idea using errgroup

	Run: func(cmd *cobra.Command, args []string) {
		log.Info().Str("LogLevel", log.GetLevel().String()).Msg("Starting the Listener...")

		// Set up signal handling
		signalCtx := ctrl.SetupSignalHandler()
		ctx, cancel := context.WithCancel(signalCtx)
		defer cancel()

		restCfg := ctrl.GetConfigOrDie()

		mgrOpts := ctrl.Options{
			Scheme:                 scheme,
			Metrics:                metricsServerOptions,
			WebhookServer:          webhookServer,
			HealthProbeBindAddress: defaultCfg.HealthProbeBindAddress,
			LeaderElection:         defaultCfg.LeaderElection.Enabled,
			LeaderElectionID:       "72231e1f.platform-mesh.io",
		}

		clt, err := client.New(restCfg, client.Options{
			Scheme: scheme,
		})
		if err != nil {
			log.Fatal().Err(err).Msg("failed to create client from config")
		}

		reconcilerOpts := reconciler.ReconcilerOpts{
			Scheme:                 scheme,
			Client:                 clt,
			Config:                 restCfg,
			ManagerOpts:            mgrOpts,
			OpenAPIDefinitionsPath: appCfg.OpenApiDefinitionsPath,
		}

		eg, eCtx := errgroup.WithContext(ctx)

		// Create the appropriate reconciler based on configuration
		var reconcilerInstance reconciler.ControllerProvider
		if appCfg.EnableKcp {
			kcpManager, err := kcp.NewKCPManager(appCfg, reconcilerOpts, log)
			if err != nil {
				log.Fatal().Err(err).Msg("unable to create KCP manager")
			}
			reconcilerInstance = kcpManager

			// Start virtual workspace watching if path is configured
			if appCfg.Listener.VirtualWorkspacesConfigPath != "" {
				eg.Go(func() error {
					if err := kcpManager.StartVirtualWorkspaceWatching(eCtx, appCfg.Listener.VirtualWorkspacesConfigPath); err != nil {
						return err
					}

					return nil
				})
			}
		} else {
			ioHandler, err := workspacefile.NewIOHandler(appCfg.OpenApiDefinitionsPath)
			if err != nil {
				log.Fatal().Err(err).Msg("unable to create IO handler")
			}

			reconcilerInstance, err = clusteraccess.NewClusterAccessReconciler(ctx, appCfg, reconcilerOpts, ioHandler, apischema.NewResolver(log), log)
			if err != nil {
				log.Fatal().Err(err).Msg("unable to create cluster access reconciler")
			}
		}

		// Setup reconciler with its own manager and start everything
		// Use the original context for the manager - it will be cancelled if watcher fails
		eg.Go(func() error {
			if err := startManagerWithReconciler(eCtx, reconcilerInstance); err != nil {
				log.Error().Err(err).Msg("failed to start manager with reconciler")
				return err
			}

			return nil
		})

		err = eg.Wait()
		if err != nil {
			log.Fatal().Err(err).Msg("exiting due to critical component failure")
		}

		log.Info().Msg("graceful shutdown complete")
	},

Maybe we don't even need our own context and can only use the signal context. The errgroup returns it's own context that is cancelled if one function returns with an error.

})

t.Run("TokenFromContext_empty_context", func(t *testing.T) {
ctx := context.Background()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better use t.Context() as in other tests too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced all entries in new tests.

parts := strings.Split(path, "/")

// Remove workspace prefixes to get the actual API path
if len(parts) >= 5 && parts[0] == "services" && parts[2] == "clusters" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For such if else pattern you can also consider using a switch statement for better readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decomposed it into a separate function and add examples in comments for easier tracking.

}

// Parse the current URL
parsedURL, err := url.Parse(req.URL.String())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not really makes sense for me: req.URL should be already the same as url.Parse(req.URL.String()). If needed you could do a shallow copy like so parsedURL := *req.URL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, this URL parsing is redundant. Simplified it.

lc := &kcpcore.LogicalCluster{}
if err := clt.Get(context.TODO(), client.ObjectKey{Name: "cluster"}, lc); err != nil {
return "", errors.Join(ErrGetLogicalCluster, err)
err := clt.Get(context.TODO(), client.ObjectKey{Name: "cluster"}, lc)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we switch to proper context handling? I don't really want to have a todo context in main branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets deal with it in a separate PR, because we have several occasions of context.TODO() and it will increase a git diff too much.
I created a separate issue for that:
#62

// Returns either a workspace path or cluster hash depending on the URL type.
// This is an alternative approach when LogicalCluster resource is not accessible.
func PathForClusterFromConfig(clusterName string, cfg *rest.Config) (string, error) {
if clusterName == "root" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used several times, consider using a constant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, created a separate issue:
#63

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
cmd/listener.go (1)

130-141: Critical: Context cancellation still treated as fatal error.

The previous review concern (lines 130-139) has not been addressed. When a normal SIGINT/SIGTERM is received, cancel() is called (line 91), causing StartVirtualWorkspaceWatching to return context.Canceled. The current code treats this as a fatal error, sends it to errCh, and exits with status 1 (line 164).

This defeats the graceful shutdown mechanism. context.Canceled and context.DeadlineExceeded must be treated as expected shutdown, not errors.

Apply this diff to handle context cancellation gracefully:

+import (
+	"errors"
 	"context"
 	"crypto/tls"
+)

 			if appCfg.Listener.VirtualWorkspacesConfigPath != "" {
 				go func() {
 					if err := kcpManager.StartVirtualWorkspaceWatching(ctx, appCfg.Listener.VirtualWorkspacesConfigPath); err != nil {
+						if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
+							log.Info().Msg("virtual workspace watching stopped due to context cancellation")
+							return
+						}
 						log.Error().Err(err).Msg("virtual workspace watching failed, initiating graceful shutdown")
 						select {
 						case errCh <- err:
 						default:
 						}
 						cancel() // Trigger coordinated shutdown
 					}
 				}()
 			}

Based on past review comments.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 779f42c and 409185a.

📒 Files selected for processing (4)
  • cmd/gateway.go (2 hunks)
  • cmd/listener.go (3 hunks)
  • gateway/manager/context/keys_test.go (1 hunks)
  • gateway/manager/roundtripper/roundtripper.go (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • cmd/gateway.go
  • gateway/manager/context/keys_test.go
🧰 Additional context used
🧬 Code graph analysis (2)
gateway/manager/roundtripper/roundtripper.go (1)
gateway/manager/context/keys.go (2)
  • TokenFromContext (41-44)
  • KcpWorkspaceFromContext (59-62)
cmd/listener.go (2)
listener/reconciler/types.go (1)
  • ControllerProvider (14-18)
listener/reconciler/kcp/reconciler.go (1)
  • NewKCPManager (52-158)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pipe / dockerBuild / docker
  • GitHub Check: pipe / testSource / test
  • GitHub Check: Analyze (go)
🔇 Additional comments (4)
cmd/listener.go (4)

80-93: LGTM! Clean coordinated shutdown setup.

The signal handling and context setup properly enables graceful shutdown with both signal-based and error-based cancellation paths. The buffered error channel prevents blocking when sending shutdown errors.


121-127: LGTM! Variable shadowing issue resolved.

The KCP manager creation correctly avoids the variable shadowing issue flagged in previous reviews. The separate kcpManager variable and explicit assignment to reconcilerInstance ensures proper initialization.


155-169: Exit logic is correct, but depends on goroutine fix.

The exit determination logic properly distinguishes error-triggered shutdown from normal signal-based shutdown. However, it will only work correctly after the context cancellation handling in the goroutine (lines 130-141) is fixed to not send context.Canceled to errCh.


173-198: LGTM! Manager startup function is well-structured.

The function signature correctly uses reconciler.ControllerProvider and properly propagates the context for cancellation. Health checks and error handling are appropriate.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
gateway/manager/roundtripper/roundtripper.go (1)

69-106: Fix auth flow: don’t set user token when impersonating + verify JWT before deriving username

  • You set Authorization to user token before checking ShouldImpersonate, then reuse the same req for impersonation. This can send user tokens to the apiserver with impersonation headers, which typically fails and is unsafe.
  • You parse JWT with ParseUnverified; this permits attackers to forge claims and impersonate arbitrary users.

Do both:

  1. Only set the user Authorization header in the non‑impersonation branch.
  2. In impersonation, strip any user Authorization and verify the JWT signature/claims before using it.

Apply this restructuring:

-  // No we are going to use token based auth only, so we are reassigning the headers
-  req.Header.Del("Authorization")
-  req.Header.Set("Authorization", "Bearer "+token)
-
-  if !rt.appCfg.Gateway.ShouldImpersonate {
-    rt.log.Debug().Str("path", req.URL.Path).Msg("Using bearer token authentication")
-
-    return rt.adminRT.RoundTrip(req)
-  }
+  if !rt.appCfg.Gateway.ShouldImpersonate {
+    // Direct token pass-through
+    req2 := req.Clone(req.Context())
+    req2.Header.Del("Authorization")
+    req2.Header.Set("Authorization", "Bearer "+token)
+    rt.log.Debug().Str("path", req2.URL.Path).Msg("Using bearer token authentication")
+    return rt.adminRT.RoundTrip(req2)
+  }

   // Impersonation mode: extract user from token and impersonate
   rt.log.Debug().Str("path", req.URL.Path).Msg("Using impersonation mode")
-  claims := jwt.MapClaims{}
-  _, _, err := jwt.NewParser().ParseUnverified(token, claims)
-  if err != nil {
+  // Ensure we don't forward a user token to the admin transport.
+  req.Header.Del("Authorization")
+  claims := jwt.MapClaims{}
+  // TODO: provide a verified keyfunc (e.g., JWKS / OIDC). Parsing unverified is insecure.
+  _, err := jwt.NewParser().ParseWithClaims(token, claims, rt.jwtKeyfunc /* implement */, jwt.WithValidMethods([]string{"RS256","ES256"}))
+  if err != nil {
     rt.log.Error().Err(err).Str("path", req.URL.Path).Msg("Failed to parse token for impersonation, denying request")
     return rt.unauthorizedRT.RoundTrip(req)
   }

Optionally, implement a helper to encapsulate verification (outside this hunk):

// Example helper to keep RoundTrip lean. Wire jwtKeyfunc from config.
func (rt *roundTripper) extractImpersonationUser(token, claim string) (string, error) {
  claims := jwt.MapClaims{}
  _, err := jwt.NewParser().ParseWithClaims(token, claims, rt.jwtKeyfunc, jwt.WithValidMethods([]string{"RS256","ES256"}))
  if err != nil { return "", err }
  v, ok := claims[claim].(string)
  if !ok || v == "" { return "", fmt.Errorf("claim %q missing/invalid", claim) }
  return v, nil
}

If OIDC/JWKS is not yet available, prefer disabling impersonation until verification is in place to avoid privilege escalation. Based on learnings

🧹 Nitpick comments (6)
cmd/listener.go (2)

132-146: Minor redundancy and comment clarity improvements.

The error handling logic is correct and addresses past review feedback. Two optional refinements:

  1. Redundant cancel call at line 136: When context is cancelled (normal signal), calling cancel() again is safe but unnecessary since the signal handler already triggered cancellation. Consider removing it for clarity.

  2. Comment at line 138: The comment "initiating graceful shutdown" might be misleading—this is a failure-triggered shutdown, not a graceful one. Consider rephrasing to "initiating shutdown due to error" or similar.

Apply this diff if you choose to refactor:

 				go func() {
 					if err := kcpManager.StartVirtualWorkspaceWatching(ctx, appCfg.Listener.VirtualWorkspacesConfigPath); err != nil {
 						if errors.Is(err, context.Canceled) || errors.Is(err, context.DeadlineExceeded) {
 							log.Info().Msg("virtual workspace watching stopped due to context cancellation")
-							cancel()
 						} else {
-							log.Error().Err(err).Msg("virtual workspace watching failed, initiating graceful shutdown")
+							log.Error().Err(err).Msg("virtual workspace watching failed, initiating shutdown due to error")
 							select {
 							case errCh <- err:
 							default:
 							}
 							cancel()
 						}
 					}
 				}()

166-174: Consider using os.Exit(1) instead of log.Fatal() for explicit error exit.

The exit logic correctly distinguishes error-driven from signal-driven shutdown. However, log.Fatal() at line 170 will call os.Exit(1) after logging, which skips any deferred cleanup (e.g., defer cancel() at line 83). While this is likely harmless in this context since the manager has already stopped, using log.Error().Err(err).Msg(...) followed by os.Exit(1) would be more explicit and consistent with typical shutdown patterns.

Apply this diff if you prefer explicit error exit:

 		select {
 		case err := <-errCh:
 			if err != nil {
-				log.Fatal().Err(err).Msg("exiting due to critical component failure")
+				log.Error().Err(err).Msg("exiting due to critical component failure")
+				os.Exit(1)
 			}
 		default:
 			log.Info().Msg("graceful shutdown complete")
 		}

Add the import:

 import (
 	"context"
 	"crypto/tls"
 	"errors"
+	"os"
gateway/manager/roundtripper/roundtripper.go (4)

173-216: Also rewrite non-/services paths for virtual workspace (prefix /clusters/)

handleVirtualWorkspaceURL only handles /services/... paths. For standard Kubernetes-style paths (/api, /apis, /api/v1/namespaces/...), you should add the /clusters/ prefix when a kcpWorkspace is present and the path isn’t already qualified.

Apply:

   if strings.HasPrefix(parsedURL.Path, "/services/") {
     // existing logic...
     if len(parts) >= 3 {
       // ...
     }
-  }
+  } else {
+    // Generic path: prefix with clusters/<workspace>
+    if !strings.HasPrefix(parsedURL.Path, "/") {
+      parsedURL.Path = "/" + parsedURL.Path
+    }
+    parsedURL.Path = "/clusters/" + kcpWorkspace + parsedURL.Path
+    rt.log.Debug().
+      Str("originalPath", req.URL.Path).
+      Str("modifiedPath", parsedURL.Path).
+      Str("workspace", kcpWorkspace).
+      Msg("Prefixed path with /clusters/<workspace> for virtual workspace")
+  }

Please confirm whether your gateway ever issues non-/services requests; if yes, this change is required. If it never does, add a comment clarifying the assumption.


122-131: Normalize paths with path.Clean before splitting to handle duplicate slashes

Using Trim + Split can mis-handle accidental “//” or extra “/./”. Prefer path.Clean, then split. Improves robustness for isDiscoveryRequest, stripWorkspacePrefix, and isWorkspaceQualified.

Minimal change example for isDiscoveryRequest:

- path := req.URL.Path
- path = strings.Trim(path, "/")
+ p := path.Clean("/" + req.URL.Path)
+ path := strings.Trim(p, "/")

And similarly before splitting in isWorkspaceQualified:

- path = strings.Trim(path, "/")
+ path = strings.Trim(path.Clean("/"+path), "/")

Import:

+ "path"

Also applies to: 145-156, 158-171


109-115: Add WWW-Authenticate header to 401 responses

Improves client UX and standards compliance.

 return &http.Response{
   StatusCode: http.StatusUnauthorized,
   Request:    req,
-  Body:       http.NoBody,
+  Body:       http.NoBody,
+  Header:     http.Header{"WWW-Authenticate": []string{`Bearer realm="k8s-gateway"`}},
 }, nil

63-68: OK to pull token from context; add quick nil/empty logging once

Looks good. Consider logging the presence (not value) once per request if diagnostics are needed.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 409185a and 73cc293.

📒 Files selected for processing (5)
  • cmd/listener.go (4 hunks)
  • gateway/manager/roundtripper/export_test.go (1 hunks)
  • gateway/manager/roundtripper/roundtripper.go (4 hunks)
  • gateway/manager/roundtripper/roundtripper_test.go (6 hunks)
  • listener/reconciler/kcp/export_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • gateway/manager/roundtripper/roundtripper_test.go
🧰 Additional context used
🧬 Code graph analysis (3)
cmd/listener.go (2)
listener/reconciler/types.go (1)
  • ControllerProvider (14-18)
listener/reconciler/kcp/reconciler.go (1)
  • NewKCPManager (52-158)
gateway/manager/roundtripper/roundtripper.go (1)
gateway/manager/context/keys.go (2)
  • TokenFromContext (41-44)
  • KcpWorkspaceFromContext (59-62)
listener/reconciler/kcp/export_test.go (2)
listener/reconciler/kcp/cluster_path.go (4)
  • ConfigForKCPCluster (31-50)
  • ClusterPathResolverProvider (58-63)
  • NewClusterPathResolver (65-81)
  • PathForClusterFromConfig (145-208)
listener/reconciler/kcp/reconciler.go (1)
  • KCPManager (40-50)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pipe / testSource / test
  • GitHub Check: pipe / dockerBuild / docker
  • GitHub Check: Analyze (go)
🔇 Additional comments (5)
cmd/listener.go (4)

6-6: LGTM!

The errors package import is correctly added to support errors.Is checks in the watcher goroutine error handling.


81-94: LGTM!

The coordinated shutdown setup is well-structured:

  • Cancellable context and error channel properly initialized.
  • Signal handler triggers graceful cancellation.
  • Error channel will correctly distinguish error-driven vs. signal-driven shutdown.

122-128: LGTM!

The reconciler instance creation correctly avoids variable shadowing. The kcpManager is properly created and assigned to reconcilerInstance.


179-179: LGTM!

The function signature correctly updated to accept reconciler.ControllerProvider, aligning with the refactored architecture that uses KCPManager and the new interface.

gateway/manager/roundtripper/export_test.go (1)

5-13: Test helpers look good

Both helpers are appropriate for testing unexported symbols within the same package.

Please add tests covering:

  • service named "clusters" (e.g., /services/clusters/api/v1/...), which should NOT be treated as workspace-qualified.
  • services path with missing workspace (e.g., /services/foo/clusters), which should NOT be treated as workspace-qualified after the suggested fix.

Comment on lines +71 to +79
// WithClusterName stores a logical cluster name in the context
func WithClusterName(ctx context.Context, name logicalcluster.Name) context.Context {
return context.WithValue(ctx, clusterNameKey, name)
}

// ClusterNameFromContext retrieves a logical cluster name from the context
func ClusterNameFromContext(ctx context.Context) (logicalcluster.Name, bool) {
name, ok := ctx.Value(clusterNameKey).(logicalcluster.Name)
return name, ok

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 I think this is might be replaced by the dedicated functions from this package package link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are similar, but have different purpose - my package is for the gateway request processing, and it uses logicalcluster.Name instead of strings.
So I am keeping it.

@vertex451 vertex451 requested a review from OlegErshov October 2, 2025 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants