-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Istio Fork for Fixing Multicluster Registry Bug #1
base: release-1.0
Are you sure you want to change the base?
Conversation
FAIR.md
Outdated
@@ -0,0 +1,32 @@ | |||
# Why this Fork | |||
In a mutual TLS-disabled multicluster Kubernetes setup, gRPC liveness and readiness probes don't work in remote clusters. However, this behavior could not be replicated on the Kubernetes cluster running the Istio control plane. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its not grpc specific
FAIR.md
Outdated
|
||
After diving into the Istio source, we only found that 3333 and 9999 were ports returned from the [Envoy debug registry](https://github.com/istio/istio/blob/1.0.2/pilot/pkg/proxy/envoy/v2/debug.go#L324-L337). | ||
|
||
In the Istio logs, we could see that a remote cluster registry was being picked up. However, based off of the [server bootup order]*(https://github.com/istio/istio/blob/1.0.2/pilot/pkg/bootstrap/server.go#L211-L219) (note that Discovery service is always init'd before Multicluster), it looked like the debug registry would always be loaded before the multicluster ones. Since the debug registry would always return entries for the management ports, it meant that the debug management ports would always be used instead of the remote cluster ones, meaning that liveness and readiness probes for remote clusters would NEVER work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You've got a * which is breaking the link [server bootup order]*(https://...)
should be [server bootup order](https://...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would elaborate a tiny bit to explain that their are various providers that the aggregate controller uses to get ManagementPorts, and that it iterates over them sequently, taking the first one that returns a non-nil result.
And then further highlight that the MultiCluster one is appended AFTER the debug one, which means that the remote one will never be called since debug always returns a value.
You kind of already said that, but just being explicit about how those providers are iterated over i think makes it clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoa this is really weird. I don't see that in my text editor (the *). Good catch.
No description provided.