-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload proxy refactoring #886
Comments
This should resolve a memory leak related to |
This was referenced Jan 29, 2025
DmitriyMV
added a commit
to DmitriyMV/omni
that referenced
this issue
Feb 13, 2025
Remove inmemory connection, and use custom `Director` and `DialContext` to proxify connections. For siderolabs#886 Signed-off-by: Dmitriy Matrenichev <[email protected]>
8 tasks
|
DmitriyMV
added a commit
to DmitriyMV/omni
that referenced
this issue
Feb 22, 2025
Remove inmemory connection, and use custom `Director` and `DialContext` to proxify connections. For siderolabs#886 Signed-off-by: Dmitriy Matrenichev <[email protected]>
DmitriyMV
added a commit
to DmitriyMV/omni
that referenced
this issue
Feb 23, 2025
- [x] Ensure `Reconciler` is internally consistent on all variations of `Reconcile` call (including parallel). Track aliases and clusters side by side. - [x] Add tests for the above. - [ ] Replace probing port on alias removal. That is - if we lost an alias to the probing port, find a new one and use it. - [ ] Find a way to test the dialer logic. Pass the cache as DI? Or add probes factory? - [ ] Replace HealthCheck logic with the actual tcp probing. - [ ] Ensure that active probes are actually deleted on time expiration. - [?] On setting upstream.List[T] ensure that "down" clusters are added with the negative score so we don't see those on dial attempt. (Looks like it's already the case). - [ ] Expose metrics. Specifically for `idleConnections` and `inuse` connections. Register those in prometheus. - [ ] Use latency checker from standard prometheus exporter. GotConn and WriteReq are important. GotResponse maybe interesting for their workload. Count can be used for idleConnection because in-flight requests and in-flight connections. For siderolabs#886 Signed-off-by: Dmitriy Matrenichev <[email protected]>
DmitriyMV
added a commit
to DmitriyMV/omni
that referenced
this issue
Feb 23, 2025
- [x] Ensure `Reconciler` is internally consistent on all variations of `Reconcile` call (including parallel). Track aliases and clusters side by side. - [x] Add tests for the above. - [ ] Replace probing port on alias removal. That is - if we lost an alias to the probing port, find a new one and use it. - [ ] Find a way to test the dialer logic. Pass the cache as DI? Or add probes factory? - [ ] Replace HealthCheck logic with the actual tcp probing. - [ ] Ensure that active probes are actually deleted on time expiration. - [?] On setting upstream.List[T] ensure that "down" clusters are added with the negative score so we don't see those on dial attempt. (Looks like it's already the case). - [ ] Expose metrics. Specifically for `idleConnections` and `inuse` connections. Register those in prometheus. - [ ] Use latency checker from standard prometheus exporter. GotConn and WriteReq are important. GotResponse maybe interesting for their workload. Count can be used for idleConnection because in-flight requests and in-flight connections. For siderolabs#886 Signed-off-by: Dmitriy Matrenichev <[email protected]>
DmitriyMV
added a commit
to DmitriyMV/omni
that referenced
this issue
Feb 23, 2025
- [x] Ensure `Reconciler` is internally consistent on all variations of `Reconcile` call (including parallel). Track aliases and clusters side by side. - [x] Add tests for the above. - [ ] Replace probing port on alias removal. That is - if we lost an alias to the probing port, find a new one and use it. - [ ] Find a way to test the dialer logic. Pass the cache as DI? Or add probes factory? - [ ] Replace HealthCheck logic with the actual tcp probing. - [ ] Ensure that active probes are actually deleted on time expiration. - [?] On setting upstream.List[T] ensure that "down" clusters are added with the negative score so we don't see those on dial attempt. (Looks like it's already the case). - [ ] Expose metrics. Specifically for `idleConnections` and `inuse` connections. Register those in prometheus. - [ ] Use latency checker from standard prometheus exporter. GotConn and WriteReq are important. GotResponse maybe interesting for their workload. Count can be used for idleConnection because in-flight requests and in-flight connections. For siderolabs#886 Signed-off-by: Dmitriy Matrenichev <[email protected]>
DmitriyMV
added a commit
to DmitriyMV/omni
that referenced
this issue
Feb 23, 2025
- [x] Ensure `Reconciler` is internally consistent on all variations of `Reconcile` call (including parallel). Track aliases and clusters side by side. - [x] Add tests for the above. - [x] Replace HealthCheck logic with the actual tcp probing. - [ ] Replace probing port on alias removal. That is - if we lost an alias to the probing port, find a new one and use it. - [ ] Find a way to test the dialer logic. Pass the cache as DI? Or add probes factory? - [ ] Ensure that active probes are actually deleted on time expiration. - [?] On setting upstream.List[T] ensure that "down" clusters are added with the negative score so we don't see those on dial attempt. (Looks like it's already the case). - [ ] Expose metrics. Specifically for `idleConnections` and `inuse` connections. Register those in prometheus. - [ ] Use latency checker from standard prometheus exporter. GotConn and WriteReq are important. GotResponse maybe interesting for their workload. Count can be used for idleConnection because in-flight requests and in-flight connections. For siderolabs#886 Signed-off-by: Dmitriy Matrenichev <[email protected]>
DmitriyMV
added a commit
to DmitriyMV/omni
that referenced
this issue
Feb 23, 2025
- [x] Ensure `Reconciler` is internally consistent on all variations of `Reconcile` call (including parallel). Track aliases and clusters side by side. - [x] Add tests for the above. - [x] Replace HealthCheck logic with the actual tcp probing. - [ ] Replace probing port on alias removal. That is - if we lost an alias to the probing port, find a new one and use it. - [ ] Find a way to test the dialer logic. Pass the cache as DI? Or add probes factory? - [ ] Ensure that active probes are actually deleted on time expiration. - [?] On setting upstream.List[T] ensure that "down" clusters are added with the negative score so we don't see those on dial attempt. (Looks like it's already the case). - [ ] Expose metrics. Specifically for `idleConnections` and `inuse` connections. Register those in prometheus. - [ ] Use latency checker from standard prometheus exporter. GotConn and WriteReq are important. GotResponse maybe interesting for their workload. Count can be used for idleConnection because in-flight requests and in-flight connections. For siderolabs#886 Signed-off-by: Dmitriy Matrenichev <[email protected]>
8 tasks
DmitriyMV
added a commit
to DmitriyMV/omni
that referenced
this issue
Feb 23, 2025
- [x] Ensure `Reconciler` is internally consistent on all variations of `Reconcile` call (including parallel). Track aliases and clusters side by side. - [x] Add tests for the above. - [x] Replace HealthCheck logic with the actual tcp probing. - [ ] Replace probing port on alias removal. That is - if we lost an alias to the probing port, find a new one and use it. - [ ] Find a way to test the dialer logic. Pass the cache as DI? Or add probes factory? - [ ] Ensure that active probes are actually deleted on time expiration. - [?] On setting upstream.List[T] ensure that "down" clusters are added with the negative score so we don't see those on dial attempt. (Looks like it's already the case). - [ ] Expose metrics. Specifically for `idleConnections` and `inuse` connections. Register those in prometheus. - [ ] Use latency checker from standard prometheus exporter. GotConn and WriteReq are important. GotResponse maybe interesting for their workload. Count can be used for idleConnection because in-flight requests and in-flight connections. For siderolabs#886 Signed-off-by: Dmitriy Matrenichev <[email protected]>
DmitriyMV
added a commit
to DmitriyMV/omni
that referenced
this issue
Feb 23, 2025
- [x] Ensure `Reconciler` is internally consistent on all variations of `Reconcile` call (including parallel). Track aliases and clusters side by side. - [x] Add tests for the above. - [x] Replace HealthCheck logic with the actual tcp probing. - [ ] Replace probing port on alias removal. That is - if we lost an alias to the probing port, find a new one and use it. - [ ] Find a way to test the dialer logic. Pass the cache as DI? Or add probes factory? - [ ] Ensure that active probes are actually deleted on time expiration. - [?] On setting upstream.List[T] ensure that "down" clusters are added with the negative score so we don't see those on dial attempt. (Looks like it's already the case). - [ ] Expose metrics. Specifically for `idleConnections` and `inuse` connections. Register those in prometheus. - [ ] Use latency checker from standard prometheus exporter. GotConn and WriteReq are important. GotResponse maybe interesting for their workload. Count can be used for idleConnection because in-flight requests and in-flight connections. For siderolabs#886 Signed-off-by: Dmitriy Matrenichev <[email protected]>
DmitriyMV
added a commit
to DmitriyMV/omni
that referenced
this issue
Feb 23, 2025
- [x] Ensure `Reconciler` is internally consistent on all variations of `Reconcile` call (including parallel). Track aliases and clusters side by side. - [x] Add tests for the above. - [x] Replace HealthCheck logic with the actual tcp probing. - [ ] Replace probing port on alias removal. That is - if we lost an alias to the probing port, find a new one and use it. - [ ] Find a way to test the dialer logic. Pass the cache as DI? Or add probes factory? - [ ] Ensure that active probes are actually deleted on time expiration. - [?] On setting upstream.List[T] ensure that "down" clusters are added with the negative score so we don't see those on dial attempt. (Looks like it's already the case). - [ ] Expose metrics. Specifically for `idleConnections` and `inuse` connections. Register those in prometheus. - [ ] Use latency checker from standard prometheus exporter. GotConn and WriteReq are important. GotResponse maybe interesting for their workload. Count can be used for idleConnection because in-flight requests and in-flight connections. For siderolabs#886 Signed-off-by: Dmitriy Matrenichev <[email protected]>
DmitriyMV
added a commit
to DmitriyMV/omni
that referenced
this issue
Feb 23, 2025
- [x] Ensure `Reconciler` is internally consistent on all variations of `Reconcile` call (including parallel). Track aliases and clusters side by side. - [x] Add tests for the above. - [x] Replace HealthCheck logic with the actual tcp probing. - [ ] Replace probing port on alias removal. That is - if we lost an alias to the probing port, find a new one and use it. - [ ] Find a way to test the dialer logic. Pass the cache as DI? Or add probes factory? - [ ] Ensure that active probes are actually deleted on time expiration. - [?] On setting upstream.List[T] ensure that "down" clusters are added with the negative score so we don't see those on dial attempt. (Looks like it's already the case). - [ ] Expose metrics. Specifically for `idleConnections` and `inuse` connections. Register those in prometheus. - [ ] Use latency checker from standard prometheus exporter. GotConn and WriteReq are important. GotResponse maybe interesting for their workload. Count can be used for idleConnection because in-flight requests and in-flight connections. For siderolabs#886 Signed-off-by: Dmitriy Matrenichev <[email protected]>
DmitriyMV
added a commit
to DmitriyMV/omni
that referenced
this issue
Feb 23, 2025
- [x] Ensure `Reconciler` is internally consistent on all variations of `Reconcile` call (including parallel). Track aliases and clusters side by side. - [x] Add tests for the above. - [x] Replace HealthCheck logic with the actual tcp probing. - [ ] Replace probing port on alias removal. That is - if we lost an alias to the probing port, find a new one and use it. - [ ] Find a way to test the dialer logic. Pass the cache as DI? Or add probes factory? - [ ] Ensure that active probes are actually deleted on time expiration. - [?] On setting upstream.List[T] ensure that "down" clusters are added with the negative score so we don't see those on dial attempt. (Looks like it's already the case). - [ ] Expose metrics. Specifically for `idleConnections` and `inuse` connections. Register those in prometheus. - [ ] Use latency checker from standard prometheus exporter. GotConn and WriteReq are important. GotResponse maybe interesting for their workload. Count can be used for idleConnection because in-flight requests and in-flight connections. For siderolabs#886 Signed-off-by: Dmitriy Matrenichev <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Ignoring Healthchecks
<uniqueID>.proxy.omni
,uniqueID
can be mapped tocluster
,workload service
.cluster,service
pair to a list of[IP:port, ...]
http://<cluster>.<service>/<original URL>
.Dial
function to "resolve"<cluster>.<service>
to one of the[IP:port]
pairs.The benefit: http.Client can re-use outgoing HTTP connections, including HTTP/1.1 keep-alive if the host part of the URL is the same.
We can control size of the HTTP transport idle connection pool, etc.
With Healthchecks
Running healthchecks has a cost: it will do
Dial
on all configured endpoints with a configured interval, which has both CPU & network resource usage.We can run healthchecks on demand - if some
cluster-service
pair is being used, we start healthchecking, if it is idle for some time, we can shut down healthchecks. We can bootstrap initial healthcheck state with machine connection status.In the flow above, on step (4), when we
Dial
, we doupstreams.Pick()
to get a random healthy machine.The text was updated successfully, but these errors were encountered: