-
Notifications
You must be signed in to change notification settings - Fork 244
KEP-2676: Support Istio ambient in Kubeflow #891
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Add proposal for supporting Istio ambient in Kubeflow. This has been discussed for more than a year in the Kubeflow community (see kubeflow/manifests#2676). Given that right now, there is an ongoing GSOC project working on this (see kubeflow#809 (comment)), this proposal aims to facilitate the project and: * defines the changes needed to make Kubeflow component ambient-compatible * outlines a setup that uses Istio in ambient and enables all the existing Authorization and Authentication use cases. Ref kubeflow/manifests#2676 Signed-off-by: Orfeas Kourkakis <[email protected]>
2797df2 to
55c124c
Compare
| * A high-level architecture of Kubeflow that uses Istio ambient and showcases how to: | ||
| * Use HTTPRoutes instead of VirtualServices | ||
| * Enable the user login flow | ||
| * Use waypoints to enforce namespace isolation (L7 and L4 traffic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Use waypoints to enforce namespace isolation (L7 and L4 traffic) | |
| * Use waypoints to enforce namespace isolation (L7 and L4 traffic) | |
| * Provide Zero overhead namespaces, so without actual user workloads the Pods, CPU and memory used should be 0 (after the seaweedfs PR is merged). So no waypoint proxies are allowed in kubeflow profile namespaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also related to #891 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not needed since it's not an actual goal of this proposal.
| To enable authorization for Kubeflow components, the whole `kubeflow` namespace is included in the mesh. | ||
|
|
||
| ##### Waypoints | ||
| To get the best performance out of ambient mode[^6][^7], services are enrolled to a waypoint only when the L7 features are required. For the Kubeflow control plane, only authorizationPolicies targeting `ml-pipeline` and `ml-pipeline-ui` make use of L7 attributes. Thus, a waypoint is deployed in the `kubeflow` namespace while only the services of those two components are enrolled to the waypoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that notebooks do not use L7 ? I mean everyone with the kubeflow-userid header uses L7 so i guess almost all components in Kubeflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. All those components trust the ingress gateway and the headers it provides. Thus, they accept all requests from the ingress gateway without any filtering done at the AuthorizationPolicy, like kfp components do (api server)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was talking about the jupyterlab itself in the customer namespace. They should be created on demand based on the notebook custom resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you refer to the AuthorizationPolicy created by the profile-controller? This is created dynamically when a new user namespace is created and it also allows/denies traffic to jupyterlab server pods. This looks like this, which indeed has L7 rules.
| ##### One waypoint per namespace | ||
| User namespaces are secured via an AuthorizationPolicy that is created dynamically by kubeflow's `profile-controller`. Most of the rules in this AuthorizationPolicy have L7 features[^8]. In sidecar mode, this AuthorizationPolicy is enforced to all traffic in the namespace by having no `selector`. This is due to the nature of those namespaces, where services are created dynamically. In Istio ambient mode, the only way to enforce a namespace-wide L7 AuthorizationPolicy is by proxying traffic to the whole namespace through the waypoint and then attaching the AuthorizationPolicy to the waypoint. This results in the authorization rules being enforced for all traffic proxied by the waypoint. This means that **a separate waypoint needs to be deployed in every user namespace**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we want to have exactly the opposite. Never ever waypoint proxies in user namespaces. Also the istio documentation and my talk with @kimwnasptd and the istio/solo.io founder at Kubecon supports that you do not need them in user namespaces. See https://istio.io/latest/docs/ambient/usage/waypoint/#configure-resources-to-use-a-cross-namespace-waypoint-proxy. That is the most essential part that you do not have waypoint proxies there and do not destroy zero overhead user namespaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(replied in #891 (comment))
| #### One waypoint per namespace | ||
| Apparently, deploying one waypoint per namespace adds a resource overhead for every user namespace, while it also reduces network efficiency for requests to that namespace. As mentioned above, this architecture is required by Istio's ambient model to enforce namespace-wide L7 AuthorizationPolicies. Since the `profile-controller` creates a default policy with L7 rules (e.g. path or headers based), one waypoint must correspond only to one namespace. The same Authorization model is not possible using a single cross-namespace waypoint. [Asking in Istio slack](https://istio.slack.com/archives/C041EQL1XMY/p1751905575407449?thread_ts=1751636618.350839&cid=C041EQL1XMY), they confirmed that: | ||
| >Typically, the authorization policy would be applied to a service so the destination would be "the thing my policy is attached to". | ||
| >... | ||
| >I don't think there is a shorthand expression for "all services in X namespace". | ||
|
|
||
| This leaves no other way to enforce an L7 AuthorizationPolicy to all services in a namespace, where services are being created dynamically. | ||
|
|
||
| Note that the introduced resource overhead affects cases with a large number of **idle** namespaces. For namespaces that are being actively used, the resource footprint is still greatly reduced, given that there are no sidecar containers injected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds really problematic. So i would rather stay with Istio-cni until this is possible. Follow up of https://github.com/kubeflow/community/pull/891/files#r2248473959
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should block on this. Of course, providing zero overhead namespaces without any namespaced resources should be the north star we 're after. The fact though that Istio ambient does not enable us to do so should not be a reason for Kubeflow to not be ambient compatible. Both istio-cni and ambient mode have their pros and cons and Kubeflow should provide administrators the option to decide on their own which mode suits them. For example, clusters with few namespaces but hundreds of workloads per namespace would benefit greatly from using ambient, contrary to clusters with hundreds of idle namespaces and a few workloads per namespace.
As mentioned in the proposal, instead of blocking, I suggest:
- Implement code that does not provide enforce any high-level architecture. That means that it will be configurable enough to support both waypoint per namespace and cross-namespace waypoint architectures. If Istio ambient starts supporting L7 namespace isolation using a single cross-namespace waypoint, then the code is already compatible or at minimum needs a few tweaks rather than the whole "sidecar -> istio ambient & gateway API" migration effort.
- Provide an
ambientoverlay in manifests. The Istio-cni will still be the default one but providing an overlay enables users and distributions to try ambient and experience benefits and downsides. This can greatly help the next point. - Ask feature from Istio to support L7 namespace isolation using a single cross-namespace waypoint. I 've already asked them in Slack and plan to bring this up in their meetings too. However, I suspect that they have a good reason this is not supported right now. If there is adoption of Istio ambient by kubeflow users, then this puts us in a better position by providing valid use cases for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with all of @orfeas-k's points.
While having waypoints per namespace will have a different resource footprint from CNI, still this is up to the vendors to decide if they are OK with or not rather than not supporting it in Kubeflow.
The example can keep the CNI, but the project should still give the option to end users to have Kubeflow with K8s Gateway API and ambient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i agree, lets not make istio ambient the default until this is possible. So for the time being istio-cni is the default and istio-ambient opt-in overlay of istio-cni.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think technically they 'll need to be two separate overlays but let's keep this implementation detail for implementation time. We all agree that Istio-cni will be the default one, provided in the example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to keep a single istio folder. In there we also have the legacy insecure overlay to disable istio-cni. https://github.com/kubeflow/manifests/tree/master/common/istio/istio-install/overlays/insecure. please lets not create multiple istio versions to maintain, but one base version with overlays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was brought up in the Istio community meeting by @ca-scribner and it looks like they don't see any blockers to this. It was not an intentional omission, just something that hasn't been implemented or decided on. There was nothing brought up that was a hard blocker, more just challenges to overcome. It would though be a big effort, so before they actually action it they'd want to see more than a single person requesting the feature.
I filed a feature request here https://github.com/istio/istio.io/issues/16781. What we need to do now is to:
- get more consensus/demand, which can only be done by getting more people to use the ambient overlay (at least from Kubeflow's side)
- bring it up again in a future community meeting once the demand is obvious
I 'm updating the proposal with this information to make sure it is documented.
Include information about the feature request in Istio. Signed-off-by: Orfeas Kourkakis <[email protected]>
Add proposal for supporting Istio ambient in Kubeflow.
Context
This has been discussed for more than a year in the Kubeflow community (see kubeflow/manifests#2676). Currently, there is an ongoing GSOC project working on implementing this (see
#809 (comment)), which will be greatly facilitated by agreeing on expected changes.
What is this proposal about?
This proposal:
Ref kubeflow/manifests#2676
cc @kimwnasptd @juliusvonkohout @thesuperzapper