-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: rework Reconciler
to use proper http.Transport
#929
base: main
Are you sure you want to change the base?
Conversation
logger: logger, | ||
lbLogger: logger.WithOptions(zap.IncreaseLevel(zapcore.ErrorLevel)), | ||
logLevel: logLevel, | ||
trans := &http.Transport{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use https://github.com/hashicorp/go-cleanhttp instead of hand-rolling it
ExpectContinueTimeout: 1 * time.Second, | ||
} | ||
|
||
if err := http2.ConfigureTransport(trans); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as we dial without TLS, I'm not sure it has much value
trans := &http.Transport{ | ||
Proxy: http.ProxyFromEnvironment, | ||
ForceAttemptHTTP2: true, | ||
MaxIdleConns: 100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to consider idle connections in a specific way, as the host would be service name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's also stuff like MaxConnsPerHost
return proxy, clusterID, nil | ||
// hostPortForAlias returns a unique IP:port for the given cluster and alias. | ||
func hostPortForAlias(clusterID resource.ID, alias string) string { | ||
return fmt.Sprintf("%s_%s:4242", clusterID, alias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have some kind of guarantee that _
can't be in the clusterID
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good question. Maybe pick some other (perhaps unicode) symbol? But there is no guarantee that it will not be picked in the future either.There is one in ClusterDiscoveryConfig.Validate
and one in Config.Validate
checks in Talos, but both only ensure that ClusterID is not empty. As for Omni, I'm currently try to dig usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we pick a very obscure character, we can also disallow it in the validations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use net.JoinHostPort
} | ||
|
||
func (b *backend) HealthCheck(context.Context) (upstream.Tier, error) { return 0, nil } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that should be a Dial check actually ?
|
||
return startErr | ||
} | ||
func (registry *Reconciler) removeLB(cluster resource.ID, alias string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should call this on some "idle" timeout - that is if we don't have DialContext for the cluster for some timeout, we can shut it down
06b6aae
to
754b393
Compare
75c07ff
to
e752ce3
Compare
- [x] Ensure `Reconciler` is internally consistent on all variations of `Reconcile` call (including parallel). Track aliases and clusters side by side. - [x] Add tests for the above. - [x] Replace HealthCheck logic with the actual tcp probing. - [ ] Replace probing port on alias removal. That is - if we lost an alias to the probing port, find a new one and use it. - [ ] Find a way to test the dialer logic. Pass the cache as DI? Or add probes factory? - [ ] Ensure that active probes are actually deleted on time expiration. - [?] On setting upstream.List[T] ensure that "down" clusters are added with the negative score so we don't see those on dial attempt. (Looks like it's already the case). - [ ] Expose metrics. Specifically for `idleConnections` and `inuse` connections. Register those in prometheus. - [ ] Use latency checker from standard prometheus exporter. GotConn and WriteReq are important. GotResponse maybe interesting for their workload. Count can be used for idleConnection because in-flight requests and in-flight connections. For siderolabs#886 Signed-off-by: Dmitriy Matrenichev <[email protected]>
e752ce3
to
ee5db60
Compare
chore: rework
Reconciler
to use properhttp.Transport
Reconciler
is internally consistent on all variations ofReconcile
call (including parallel). Track aliases and clustersside by side.
idleConnections
andinuse
connections. Register those in prometheus.For #886
Signed-off-by: Dmitriy Matrenichev [email protected]