-
Couldn't load subscription status.
- Fork 39
NPEP-311: Best Practices for Multi-Cluster NetworkPolicy in a Flat Network #313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…twork Model document some of the existing common practices for applying existing network policies across clusters without incresing the scope of the core kubernetes APIs. Change-Id: I91e8791b441b93dc6958617eb0828d0b6c879ab7
✅ Deploy Preview for kubernetes-sigs-network-policy-api ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aojea The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
| ### Recommendation | ||
|
|
||
| Each namespace within a cluster should be labeled with a key that identifies its | ||
| parent cluster. The recommended label is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"should be labeled" by who?
(In general, you should never use should/must in the passive voice in a specification. If something must be done, be clear about who must do it.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added " by the cluster administrator "
|
|
||
| ## Alternatives | ||
|
|
||
| The primary alternative is the absence of a documented best practice. This leads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that the primary alternative would be adding clusterSelector.
In particular, reading use cases like "I want to apply a NetworkPolicy to my application and be confident that it only affects traffic within my local cluster", I keep thinking there needs to be an easy way to write a policy that refers to "this cluster" without having to identify the cluster by name, and an obvious way to do that would be to just extend the existing two-tiered podSelector/namespaceSelector system to three tiers:
- a peer with just a
podSelectorselects some pods in the same namespace as the policy (unchanged). - a peer with just a
namespaceSelectorselects all pods in some namespaces in the same cluster as the policy (unchanged). - a peer with a
podSelectorand anamespaceSelectorselects some pods in some namespaces in the same cluster as the policy (unchanged). - a peer with just a
clusterSelectorselects all pods in all namespaces in some clusters. - a peer with a
namespaceSelectorand aclusterSelectorselects all pods in some namespaces in some clusters. - a peer with a
podSelector, anamespaceSelectorand aclusterSelectorselects some pods in some namespaces in some clusters. (Or possibly, that's not allowed.)
NPs that do not include a clusterSelector would continue to behave exactly like they do now (in particular, only selecting traffic from the local cluster).
OTOH, having not paid much attention to SIG MC, I'm not sure what would be consistent with other multi-cluster APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SIG-MC, namespaces aren’t constrained to single clusters, so clusters and namespaces should be considered orthogonal: a clusterset admin familiar with MCS (multicluster services) would expect a multicluster-aware network policy controller to allow traffic across clusters following namespace constraints. (Yes, network administrators tend not to like this.)
End users ask about cross-cluster behaviour regularly, so it would be good to have some clarity; relying on a clusterSelector breaks the orthogonality I just mentioned but I can’t think of a better way right now. So clusterSelector: {} would select all clusters (applying namespace and pod selectors as appropriate), and as you say, policies with no clusterSelector would only apply to the local cluster.
This also avoids surprising administrators: existing network policies only apply to the local cluster (and if there’s a deny-all, no traffic is allowed across the clusterset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In SIG-MC, namespaces aren’t constrained to single clusters
What does that mean? A namespace can span multiple clusters? The namespace foo in cluster 1 is considered to be equivalent to the namespace foo in cluster 2?
would expect a multicluster-aware network policy controller to allow traffic across clusters following namespace constraints
...
existing network policies only apply to the local cluster
These seem contrary? Network traffic should normally ignore cluster boundaries, but NetworkPolicies shouldn't?
if there’s a deny-all, no traffic is allowed across the clusterset
That's automatic, given the semantics of "isolation" in NP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conceptually yes, namespaces span multiple clusters; more precisely, namespaces with the same name in multiple clusters are considered as effectively equivalent. The main consequence for SIG-MC is that a service made available across clusters in one namespace is always the same, regardless of the cluster it is hosted by. This means that, from a SIG-MC perspective, granting access at the network level to a service (or rather, its endpoints) in a given cluster is the same as granting access at the network level to the same service in all clusters.
The apparent contradiction is because my first paragraph tried to explain the SIG-MC worldview, while the second paragraph was about what would be acceptable (in my mind) for multicluster network policies. A purely SIG-MC interpretation of network policies, with what we call “namespace sameness”, would imply that namespace-based network policies should apply across a clusterset; however I also think that that’s unacceptable from a “principle of least surprise” perspective, and that existing network policies should be cluster-local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this approach based on @fasaxc feedback, builds on namespace sameness.
To sum up Network policy has two parts:
- The selector that apply the network policy to the Pods in the namespace where the NP exists, I think this has to be local without doubts, not only because of blast radius problems, but also not doing that imply object propagation across clusters, if the NP only exists in ClusterA why the pods with the same namespace and label on clusterB need to apply the same policy?
- The selectors that apply to the Ingress and Egress traffic from the selected pods, these are the ones that follow the namespace "sameness" and apply across clusters. Technically this works if the local policy engine has the metadata associated to the IP address from the other clusters.
At this point is when the following point Dan is doing raises, do we want to embed the cluster concept on the API or do we follow the "namespace sameness" concept?
If we consider the cluster concept, it looks to me we definitive need a new CRD, we can not make core APIs cluster aware without extending the scope to all objects (we've seen this in multi network how the feature eaily become viral and touches everything). I'm personally do not have time for persue this and after working on some related projects I do not think is feasible and worth the effort.
If we treat this as the existing Multi Cluster Services based on the "namespace sameness" concept I think that is valid and backed by the existing Calico implementation and the Multi Cluster Services KEP. If we follow this path, is this something is raising some red flags or are we ok on iterating? if affirmative, do we document it in SIG MC or in this subproject of SIG Network? if we document it, is it a problem if kube-network-policies implements it?
| @@ -0,0 +1,134 @@ | |||
| # NPEP-311: Best Practices for Multi-Cluster NetworkPolicy in a Flat Network | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel like this is a "best practice".
If we want to document "some pod network implementations are doing something like this", then maybe we can document that, but if we think this use case is important, then I think we should support it with proper API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find the right name
ut if we think this use case is important, then I think we should support it with proper API.
that is what I'm trying to figure out if people just need this stretched network policy mode, if it needs something else , if is a very custom thing ...
| ### Recommendation | ||
|
|
||
| Each namespace within a cluster should be labeled with a key that identifies its | ||
| parent cluster. The recommended label is: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What’s meant by “parent cluster”?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cluster that contains the namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So presumably it will have the same value for all namespaces in a given cluster, won’t it? What’s the purpose of the label then? Couldn’t multicluster network policy support require KEP-2149 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened this PR for exploration, and @skitt you are far expert than anybody here, so if you have ideas I'm all ears , I will take a look at https://github.com/kubernetes/enhancements/tree/master/keps/sig-multicluster/2149-clusterid ,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that the NetworkPolicies can refer to namespaces in other clusters via this label. So you can say "allow from namespace foo (in this cluster)", or "allow from namespace foo in cluster bar" or "allow from namespace foo in all clusters", etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aojea I understand perfectly that this is about having a discussion 😉
Thinking about this more, I’m not convince namespace sameness should limit what network policies can do — the way I see it, network policies apply on top of networking defaults, and exist to provide more control over those. If the multicluster default (in the SIG-MC worldview anyway) is that namespaces are the same across a clusterset, that doesn’t prevent network policies from clamping down on what’s possible there. In the same way that network policies nowadays allow administrators to say “I only want traffic between this and this”, it makes sense to me to consider that policies extended to multiples clusters should allow administrators to say “I only want traffic between this cluster and this other cluster”.
That being said, @danwinship I don’t understand your comment about using the label to refer to namespaces in other clusters. Does that mean that the idea here is to create namespaces in a local cluster to represent namespaces in another cluster, with the label indicating that the namespaces are for remote clusters and not the local one? That would indeed be inconsistent with other multicluster APIs where multicluster connectivier (or at least discovery) depends on having the same namespace in multiple clusters…
Put another way, given the practices described here, what would the CRs look like to create a network policy allowing a pod in a local cluster to send traffic to all pods in a given namespace in a remote cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean that the idea here is to create namespaces in a local cluster to represent namespaces in another cluster, with the label indicating that the namespaces are for remote clusters and not the local one?
No, nothing is "representing" anything. This is just about making it possible to refer to namespaces in other clusters. If you have a NetworkPolicy in cluster-one that says:
kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
name: sample-multi-cluster-policy
namespace: alpha
spec:
podSelector: {}
ingress:
- from:
namespaceSelector:
matchLabels:
cluster.clusterset.k8s.io: cluster-two
kubernetes.io/metadata.name: betaThat says that all pods in the namespace beta in the cluster cluster-two can connect to all pods in the namespace alpha in the cluster cluster-one. There is nothing "representing" beta in cluster-one. The policy refers directly (by name) to the remote namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would the CRs look like to create a network policy allowing a pod in a local cluster to send traffic to all pods in a given namespace in a remote cluster?
So keep in mind that NetworkPolicy is "single-sided": you can say "Pod X is allowed to send traffic to Pod Y", but that doesn't automatically imply that Pod Y will accept packets from Pod X. It just means that if Pod X was "deny-all-egress" before, then it becomes "deny-all-egress-except-egress-to-Pod-Y" now.
So the policy you're asking about is basically just what I gave above, except with ingress and from changed to egress and to. But if the remote namespace has its own NetworkPolicies, then it might not accept the packets.
In a single-cluster, with AdminNetworkPolicy, you can write "double-sided" policies that actually say "Pod X can send traffic to Pod Y", which implies both that egress from Pod X to Pod Y and ingress to Pod Y from Pod X are allowed. However, in a multi-cluster context, AdminNetworkPolicy would have the same problem plain NP does with cross-cluster policies. Each AdminNetworkPolicy would only have power over traffic in its own cluster, so an ANP in one cluster would not be able to override policies in a remote cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks, I think I get it — this is just a way of reusing the existing label selectors to match on clusters in addition to everything else. So apart from requiring a “policy engine” somewhere that’s aware of other clusters’ namespaces, it doesn’t require anything, it’s just an agreement on label use so that policies can be written consistently across clusters. Is that correct?
On the one hand I find the approach a bit icky because it requires adding information that’s already known to each namespace; on the other hand I like that it requires an action from the administrator before any given namespace can participate in network-policy-controlled traffic…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it’s just an agreement on label use so that policies can be written consistently across clusters. Is that correct?
yeah, the question is that creating this implicity semantics simplify something, or if just people will prefer to operate with their own labeling considering the cluster as stretched ... it will be nice to have some end users feedback
| Each namespace within a cluster should be labeled by the cluster administrator with a key that identifies its | ||
| parent cluster. The recommended label is: | ||
|
|
||
| `cluster.clusterset.k8s.io: <cluster-id>` | ||
|
|
||
| The <cluster-id> value should correspond to the unique name of the cluster | ||
| within the ClusterSet (e.g., cluster-a, us-west-2). This practice enables policy | ||
| authors to create selectors that precisely target peers from specific clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI we also have something somewhat similar in Cilium ClusterMesh: https://docs.cilium.io/en/latest/network/clustermesh/policy/ (CiliumNetworkPolicy are described there but same behavior affects regular NetworkPolicy too). The main difference is that we operate at the pod level not the namespace level (technically everything is translated as "pod" level things in the end in Cilium though...).
It boils down to the following points:
- We sync relevant info of all pods from remote clusters to have those infos in the local cluster and in term of Pod IPs the network is flat too (pod CIDRs from each clusters must not collide)
- A label
io.cilium.k8s.policy.clusteris automatically/implicitly added to all the pods (wrt to NetworkPolicies but labels aren't really added to actual Pods objects) instead of the namespace level with the "cluster name" as a value ("cluster name" here is similar to what is defined as "cluster id" in SIG-MC terminology, and we could add some layer of compat forcluster.clusterset.k8s.ioto maybe translate to our cilium specific label) - There is no mechanism to auto add a NetworkPolicy across clusters, a NetworkPolicy can only affect pods in the local cluster (so essentially
spec.podSelectoris not selecting pods from remote cluster but things iningressandegresscan use pod info from remote clusters) - We are "now" defaulting pods selector to select the local cluster name label by auto/implicitly adding a
io.cilium.k8s.policy.cluster=$localClusterNameunless the selector is already using the labelio.cilium.k8s.policy.clusterin a way in which case we don't implicitly "change" the selector to select the local cluster
Also, but this is a more advanced topic than only the cluster id, but still a semi-related one too: we are wondering on how to add labels at the cluster level. Here is the relevant issue on the Cilium side cilium/cilium#40413 (there's mostly implementation consideration in this issue though). To do that we are most likely going to rely on About API: https://multicluster.sigs.k8s.io/concepts/about-api/. But in our case we are a bit worried that each cluster would self declare their properties/labels while cluster name in our case is way more enforced on each clusters (local cluster knows its name and the name of each remote clusters too + for anything imported from remote clusters we enforce that it has the correct cluster name)... But yeah probably this is an entirely different "problem"?
document some of the existing common practices for applying existing network policies across clusters without increasing the scope of the core kubernetes APIs.
Based on the discussions we had in the SIG Network Policy API meeting on Aug 26 2025, and with feedback from @fasaxc of an existing implementation in production, it will be good to get these covered on a community place,
Fixes: #311