NPEP-311: Best Practices for Multi-Cluster NetworkPolicy in a Flat Network #313

aojea · 2025-08-27T12:47:58Z

document some of the existing common practices for applying existing network policies across clusters without increasing the scope of the core kubernetes APIs.

Based on the discussions we had in the SIG Network Policy API meeting on Aug 26 2025, and with feedback from @fasaxc of an existing implementation in production, it will be good to get these covered on a community place,

Fixes: #311

…twork Model document some of the existing common practices for applying existing network policies across clusters without incresing the scope of the core kubernetes APIs. Change-Id: I91e8791b441b93dc6958617eb0828d0b6c879ab7

netlify · 2025-08-27T12:48:03Z

✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!

Name	Link
🔨 Latest commit	`c89c9fb`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-network-policy-api/deploys/68b0662d3cf9ef0008634d8c
😎 Deploy Preview	https://deploy-preview-313--kubernetes-sigs-network-policy-api.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2025-08-27T12:48:04Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [aojea]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

aojea · 2025-08-27T12:49:20Z

/assign @danwinship @fasaxc @tssurya @npinaeva @bowei @skitt

npeps/npep-311.md

danwinship · 2025-08-27T13:23:35Z

npeps/npep-311.md

+### Recommendation
+
+Each namespace within a cluster should be labeled with a key that identifies its
+parent cluster. The recommended label is:


"should be labeled" by who?

(In general, you should never use should/must in the passive voice in a specification. If something must be done, be clear about who must do it.)

added " by the cluster administrator "

npeps/npep-311.md

danwinship · 2025-08-27T13:39:54Z

npeps/npep-311.md

+
+## Alternatives
+
+The primary alternative is the absence of a documented best practice. This leads


I would say that the primary alternative would be adding clusterSelector.

In particular, reading use cases like "I want to apply a NetworkPolicy to my application and be confident that it only affects traffic within my local cluster", I keep thinking there needs to be an easy way to write a policy that refers to "this cluster" without having to identify the cluster by name, and an obvious way to do that would be to just extend the existing two-tiered podSelector/namespaceSelector system to three tiers:

a peer with just a podSelector selects some pods in the same namespace as the policy (unchanged).

a peer with just a namespaceSelector selects all pods in some namespaces in the same cluster as the policy (unchanged).

a peer with a podSelector and a namespaceSelector selects some pods in some namespaces in the same cluster as the policy (unchanged).

a peer with just a clusterSelector selects all pods in all namespaces in some clusters.

a peer with a namespaceSelector and a clusterSelector selects all pods in some namespaces in some clusters.

a peer with a podSelector, a namespaceSelector and a clusterSelector selects some pods in some namespaces in some clusters. (Or possibly, that's not allowed.)

NPs that do not include a clusterSelector would continue to behave exactly like they do now (in particular, only selecting traffic from the local cluster).

OTOH, having not paid much attention to SIG MC, I'm not sure what would be consistent with other multi-cluster APIs.

In SIG-MC, namespaces aren’t constrained to single clusters, so clusters and namespaces should be considered orthogonal: a clusterset admin familiar with MCS (multicluster services) would expect a multicluster-aware network policy controller to allow traffic across clusters following namespace constraints. (Yes, network administrators tend not to like this.)

End users ask about cross-cluster behaviour regularly, so it would be good to have some clarity; relying on a clusterSelector breaks the orthogonality I just mentioned but I can’t think of a better way right now. So clusterSelector: {} would select all clusters (applying namespace and pod selectors as appropriate), and as you say, policies with no clusterSelector would only apply to the local cluster.

This also avoids surprising administrators: existing network policies only apply to the local cluster (and if there’s a deny-all, no traffic is allowed across the clusterset).

In SIG-MC, namespaces aren’t constrained to single clusters

What does that mean? A namespace can span multiple clusters? The namespace foo in cluster 1 is considered to be equivalent to the namespace foo in cluster 2?

would expect a multicluster-aware network policy controller to allow traffic across clusters following namespace constraints
...
existing network policies only apply to the local cluster

These seem contrary? Network traffic should normally ignore cluster boundaries, but NetworkPolicies shouldn't?

if there’s a deny-all, no traffic is allowed across the clusterset

That's automatic, given the semantics of "isolation" in NP.

Conceptually yes, namespaces span multiple clusters; more precisely, namespaces with the same name in multiple clusters are considered as effectively equivalent. The main consequence for SIG-MC is that a service made available across clusters in one namespace is always the same, regardless of the cluster it is hosted by. This means that, from a SIG-MC perspective, granting access at the network level to a service (or rather, its endpoints) in a given cluster is the same as granting access at the network level to the same service in all clusters.

The apparent contradiction is because my first paragraph tried to explain the SIG-MC worldview, while the second paragraph was about what would be acceptable (in my mind) for multicluster network policies. A purely SIG-MC interpretation of network policies, with what we call “namespace sameness”, would imply that namespace-based network policies should apply across a clusterset; however I also think that that’s unacceptable from a “principle of least surprise” perspective, and that existing network policies should be cluster-local.

I think this approach based on @fasaxc feedback, builds on namespace sameness.
To sum up Network policy has two parts:

The selector that apply the network policy to the Pods in the namespace where the NP exists, I think this has to be local without doubts, not only because of blast radius problems, but also not doing that imply object propagation across clusters, if the NP only exists in ClusterA why the pods with the same namespace and label on clusterB need to apply the same policy?

The selectors that apply to the Ingress and Egress traffic from the selected pods, these are the ones that follow the namespace "sameness" and apply across clusters. Technically this works if the local policy engine has the metadata associated to the IP address from the other clusters.

At this point is when the following point Dan is doing raises, do we want to embed the cluster concept on the API or do we follow the "namespace sameness" concept?

If we consider the cluster concept, it looks to me we definitive need a new CRD, we can not make core APIs cluster aware without extending the scope to all objects (we've seen this in multi network how the feature eaily become viral and touches everything). I'm personally do not have time for persue this and after working on some related projects I do not think is feasible and worth the effort.

If we treat this as the existing Multi Cluster Services based on the "namespace sameness" concept I think that is valid and backed by the existing Calico implementation and the Multi Cluster Services KEP. If we follow this path, is this something is raising some red flags or are we ok on iterating? if affirmative, do we document it in SIG MC or in this subproject of SIG Network? if we document it, is it a problem if kube-network-policies implements it?

danwinship · 2025-08-27T13:41:19Z

npeps/npep-311.md

@@ -0,0 +1,134 @@
+# NPEP-311: Best Practices for Multi-Cluster NetworkPolicy in a Flat Network 


I don't feel like this is a "best practice".

If we want to document "some pod network implementations are doing something like this", then maybe we can document that, but if we think this use case is important, then I think we should support it with proper API.

I couldn't find the right name

ut if we think this use case is important, then I think we should support it with proper API.

that is what I'm trying to figure out if people just need this stretched network policy mode, if it needs something else , if is a very custom thing ...

skitt · 2025-08-27T14:15:44Z

npeps/npep-311.md

+### Recommendation
+
+Each namespace within a cluster should be labeled with a key that identifies its
+parent cluster. The recommended label is:


What’s meant by “parent cluster”?

the cluster that contains the namespace

So presumably it will have the same value for all namespaces in a given cluster, won’t it? What’s the purpose of the label then? Couldn’t multicluster network policy support require KEP-2149 instead?

I opened this PR for exploration, and @skitt you are far expert than anybody here, so if you have ideas I'm all ears , I will take a look at https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/2149-clusterid ,

The idea is that the NetworkPolicies can refer to namespaces in other clusters via this label. So you can say "allow from namespace foo (in this cluster)", or "allow from namespace foo in cluster bar" or "allow from namespace foo in all clusters", etc.

@aojea I understand perfectly that this is about having a discussion 😉

Thinking about this more, I’m not convince namespace sameness should limit what network policies can do — the way I see it, network policies apply on top of networking defaults, and exist to provide more control over those. If the multicluster default (in the SIG-MC worldview anyway) is that namespaces are the same across a clusterset, that doesn’t prevent network policies from clamping down on what’s possible there. In the same way that network policies nowadays allow administrators to say “I only want traffic between this and this”, it makes sense to me to consider that policies extended to multiples clusters should allow administrators to say “I only want traffic between this cluster and this other cluster”.

That being said, @danwinship I don’t understand your comment about using the label to refer to namespaces in other clusters. Does that mean that the idea here is to create namespaces in a local cluster to represent namespaces in another cluster, with the label indicating that the namespaces are for remote clusters and not the local one? That would indeed be inconsistent with other multicluster APIs where multicluster connectivier (or at least discovery) depends on having the same namespace in multiple clusters…

Put another way, given the practices described here, what would the CRs look like to create a network policy allowing a pod in a local cluster to send traffic to all pods in a given namespace in a remote cluster?

Does that mean that the idea here is to create namespaces in a local cluster to represent namespaces in another cluster, with the label indicating that the namespaces are for remote clusters and not the local one?

No, nothing is "representing" anything. This is just about making it possible to refer to namespaces in other clusters. If you have a NetworkPolicy in cluster-one that says:

kind: NetworkPolicy apiVersion: networking.k8s.io/v1 metadata: name: sample-multi-cluster-policy namespace: alpha spec: podSelector: {} ingress: - from: namespaceSelector: matchLabels: cluster.clusterset.k8s.io: cluster-two kubernetes.io/metadata.name: beta

That says that all pods in the namespace beta in the cluster cluster-two can connect to all pods in the namespace alpha in the cluster cluster-one. There is nothing "representing" beta in cluster-one. The policy refers directly (by name) to the remote namespace.

what would the CRs look like to create a network policy allowing a pod in a local cluster to send traffic to all pods in a given namespace in a remote cluster?

So keep in mind that NetworkPolicy is "single-sided": you can say "Pod X is allowed to send traffic to Pod Y", but that doesn't automatically imply that Pod Y will accept packets from Pod X. It just means that if Pod X was "deny-all-egress" before, then it becomes "deny-all-egress-except-egress-to-Pod-Y" now.

So the policy you're asking about is basically just what I gave above, except with ingress and from changed to egress and to. But if the remote namespace has its own NetworkPolicies, then it might not accept the packets.

In a single-cluster, with AdminNetworkPolicy, you can write "double-sided" policies that actually say "Pod X can send traffic to Pod Y", which implies both that egress from Pod X to Pod Y and ingress to Pod Y from Pod X are allowed. However, in a multi-cluster context, AdminNetworkPolicy would have the same problem plain NP does with cross-cluster policies. Each AdminNetworkPolicy would only have power over traffic in its own cluster, so an ANP in one cluster would not be able to override policies in a remote cluster.

Ah, thanks, I think I get it — this is just a way of reusing the existing label selectors to match on clusters in addition to everything else. So apart from requiring a “policy engine” somewhere that’s aware of other clusters’ namespaces, it doesn’t require anything, it’s just an agreement on label use so that policies can be written consistently across clusters. Is that correct?

On the one hand I find the approach a bit icky because it requires adding information that’s already known to each namespace; on the other hand I like that it requires an action from the administrator before any given namespace can participate in network-policy-controlled traffic…

it’s just an agreement on label use so that policies can be written consistently across clusters. Is that correct?

yeah, the question is that creating this implicity semantics simplify something, or if just people will prefer to operate with their own labeling considering the cluster as stretched ... it will be nice to have some end users feedback

npeps/npep-311.md

MrFreezeex · 2025-09-10T11:11:35Z

npeps/npep-311.md

+Each namespace within a cluster should be labeled by the cluster administrator with a key that identifies its
+parent cluster. The recommended label is:
+
+`cluster.clusterset.k8s.io: <cluster-id>`
+
+The <cluster-id> value should correspond to the unique name of the cluster
+within the ClusterSet (e.g., cluster-a, us-west-2). This practice enables policy
+authors to create selectors that precisely target peers from specific clusters.


FYI we also have something somewhat similar in Cilium ClusterMesh: https://docs.cilium.io/en/latest/network/clustermesh/policy/ (CiliumNetworkPolicy are described there but same behavior affects regular NetworkPolicy too). The main difference is that we operate at the pod level not the namespace level (technically everything is translated as "pod" level things in the end in Cilium though...).

It boils down to the following points:

We sync relevant info of all pods from remote clusters to have those infos in the local cluster and in term of Pod IPs the network is flat too (pod CIDRs from each clusters must not collide)

A label io.cilium.k8s.policy.cluster is automatically/implicitly added to all the pods (wrt to NetworkPolicies but labels aren't really added to actual Pods objects) instead of the namespace level with the "cluster name" as a value ("cluster name" here is similar to what is defined as "cluster id" in SIG-MC terminology, and we could add some layer of compat for cluster.clusterset.k8s.io to maybe translate to our cilium specific label)

There is no mechanism to auto add a NetworkPolicy across clusters, a NetworkPolicy can only affect pods in the local cluster (so essentially spec.podSelector is not selecting pods from remote cluster but things in ingress and egress can use pod info from remote clusters)

We are "now" defaulting pods selector to select the local cluster name label by auto/implicitly adding a io.cilium.k8s.policy.cluster=$localClusterName unless the selector is already using the label io.cilium.k8s.policy.cluster in a way in which case we don't implicitly "change" the selector to select the local cluster

Also, but this is a more advanced topic than only the cluster id, but still a semi-related one too: we are wondering on how to add labels at the cluster level. Here is the relevant issue on the Cilium side cilium/cilium#40413 (there's mostly implementation consideration in this issue though). To do that we are most likely going to rely on About API: https://multicluster.sigs.k8s.io/concepts/about-api/. But in our case we are a bit worried that each cluster would self declare their properties/labels while cluster name in our case is way more enforced on each clusters (local cluster knows its name and the name of each remote clusters too + for anything imported from remote clusters we enforce that it has the correct cluster name)... But yeah probably this is an entirely different "problem"?

k8s-ci-robot requested a review from astoycos August 27, 2025 12:48

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 27, 2025

k8s-ci-robot requested a review from Dyanngg August 27, 2025 12:48

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Aug 27, 2025

k8s-ci-robot assigned bowei, danwinship, fasaxc, npinaeva, skitt and tssurya Aug 27, 2025

aojea changed the title ~~NPEP-311: Best Practices for Multi-Cluster NetworkPolicy in a Flat Ne…~~ NPEP-311: Best Practices for Multi-Cluster NetworkPolicy in a Flat Network Aug 27, 2025

aojea commented Aug 27, 2025

View reviewed changes

npeps/npep-311.md Outdated Show resolved Hide resolved

Update npeps/npep-311.md

2329a41

danwinship reviewed Aug 27, 2025

View reviewed changes

skitt reviewed Aug 27, 2025

View reviewed changes

aojea commented Aug 27, 2025

View reviewed changes

npeps/npep-311.md Outdated Show resolved Hide resolved

Update npeps/npep-311.md

a7220a8

aojea commented Aug 27, 2025

View reviewed changes

npeps/npep-311.md Outdated Show resolved Hide resolved

Update npeps/npep-311.md

a88f72c

danwinship reviewed Aug 28, 2025

View reviewed changes

npeps/npep-311.md Outdated Show resolved Hide resolved

aojea added 2 commits August 28, 2025 15:18

Update npeps/npep-311.md

90f0528

Update npeps/npep-311.md

c89c9fb

MrFreezeex reviewed Sep 10, 2025

View reviewed changes

aojea marked this pull request as draft October 28, 2025 07:02

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 28, 2025


		## Alternatives

		The primary alternative is the absence of a documented best practice. This leads

		@@ -0,0 +1,134 @@
		# NPEP-311: Best Practices for Multi-Cluster NetworkPolicy in a Flat Network

Uh oh!

NPEP-311: Best Practices for Multi-Cluster NetworkPolicy in a Flat Network #313

Are you sure you want to change the base?

NPEP-311: Best Practices for Multi-Cluster NetworkPolicy in a Flat Network #313

Conversation

aojea commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!

Uh oh!

k8s-ci-robot commented Aug 27, 2025

Uh oh!

aojea commented Aug 27, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aojea Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MrFreezeex Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

aojea commented Aug 27, 2025 •

edited

Loading

netlify bot commented Aug 27, 2025 •

edited

Loading

aojea Aug 27, 2025 •

edited

Loading

MrFreezeex Sep 10, 2025 •

edited

Loading