-
Notifications
You must be signed in to change notification settings - Fork 48
ADR-40: Custom server pool and reconnecting to specific server #388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Piotr Piotrowski <[email protected]>
|
|
||
| #### Custom server pool | ||
|
|
||
| Client should have a way to specify custom server pool for reconnection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure...clients should have a way to specify a custom server pool for ALL connection attempts including the original connection and reconnection attempts.
It also seems logical that the pool is part of connection options. Is this document saying that the pool can be replaced at any time? Much like an auth handler, I think the pool should be able to change it's behavior itself, for instance in response to outside input, RTTs or any other thing without having to replace the entire pool with another instance...but this may be a language specific problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it could be more clear, but yes, the idea is to have a method which sets a server pool which you can invoke whenever you want. It does not force reconnect itself, it just configures the pool and you can call it whenever and however many times you want.
As for the initial server pool - it's already part of establishing connection in all clients right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saying... Java only allows you to set the server pool implementation as part of connection options. I'm not entirely against allowing the dev to set this at any time, I'm just saying it would make more sense if the pool implementation is dynamic in a similar way to an auth handler that reads from a file every time. Just set it at the beginning of the connection and you are done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In simpler words: calling this API replaces all the servers, with the new provided servers.
| // before attempting connection. If false, attempt connection immediately. | ||
| type ReconnectToServerHandler func([]Server, ServerInfo) (*url.URL, bool) | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There also needs to be feedback to the pool.
- Give the pool discovered servers whenever the client receives them.
- Tell the pool when a connection failed or succeeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give the pool discovered servers whenever the client receives them
Agree, we should probably add discovered servers o the pool (with information on whether a server was added from server INFO or by the user)
Tell the pool when a connection failed or succeeded.
Implementation may vary, but in go all servers in pool have a reconnect counter which is reset on successful connect (which is used for tracking MaxReconnects), so there is feedback.
|
|
||
| New pool should be used for all reconnection attempts until the client is | ||
| restarted or a new pool is specified. New pool is subject to the same rules as | ||
| the default one - randomization unless disabled, max reconnect attempts, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pool should not have to care about anything of the normal options or behavior, but it can care if it choses. This is the point of suppling a custom pool, to allow it to do anything it wants.
This means that randomization and the act of resolving ip addresses are part of the pool behavior.
Max reconnect attempts is not part of the pool behavior, it's part of the behavior that uses the pool. The behavior should ask for a resolved ip address, but the pool can chose what the result of a resolved ip address request is, so for instance it could give a resolved address or not when asked for an address
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ie - once the server list is set, those are the only servers - on connect, the client will learn new servers - if the client doesn't want to use gossiped servers, it should specify its option to ignore cluster updates.
| INFO. | ||
|
|
||
| The callback should return the server URL to which the client should attempt to | ||
| reconnect (a url outside of the pool should not be allowed). Additionally, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we care if the reconnect to a specific server wants to connect to something outside of the pool?
What happens when a reconnect to a specific server fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A retry like normal.
We want to clearly separate concerns here:
- Managing the pool
- Picking from the pool
These are different things, the callback is picking from the pool only.
|
|
||
| The callback should return the server URL to which the client should attempt to | ||
| reconnect (a url outside of the pool should not be allowed). Additionally, the | ||
| callback should allow to specify whether client should include a delay before |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The delay should not be part of the callback - it's already configured for the connection options. The pool should simply give a server address to connect to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really true, we have various dynamic reconnect delay options already.
The use case here for example when you get information about primary and backup clusters from a service discovery system. You would have a primary and backup cluster, you'd try the primary cluster a few times (with backoff) and finally decide on some value to switch to the standby cluster - that switch should essentially reset the backoff because now you're retrying against a new cluster.
I think probably the simple boolean here is not enough, but we do need the flexibility to influence delay+target selection as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I'm also not sure a boolean is enough here, but given that we already have either ReconnectWait (which is a duration) or a callback which can customize it, I think it's reasonable to be able to reuse what's there. Also (at least in go), user may want to decide to add a sleep in the reconnect callback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a different concern than the pool that is already addressed, granted in a way that the user may not be able override in certain implementations.
It may be simple enough to allow a callback for this specific behavior. For instance the Java code already has a functional callback, outside of the pool, that can have dynamic behavior, and the developer has the ability to have that callback be aware or influenced by the pool.
This is the model I propose. Keep the pool the pool. Keep other behaviors separate. It's up to the user to coordinate that behavior.
| // or a new server URL. New servers will be automatically added to the pool. | ||
| // - bool: If true, apply reconnect delay (ReconnectWait + jitter or CustomReconnectDelayCB) | ||
| // before attempting connection. If false, attempt connection immediately. | ||
| type ReconnectToServerHandler func([]Server, ServerInfo) (*url.URL, bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should simply return the server - this is a "select" server - if this is done any client flag for selection/retry etc is getting ignored - not a fan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above:
The use case here for example when you get information about primary and backup clusters from a service discovery system. You would have a primary and backup cluster, you'd try the primary cluster a few times (with backoff) and finally decide on some value to switch to the standby cluster - that switch should essentially reset the backoff because now you're retrying against a new cluster.
I think probably the simple boolean here is not enough, but we do need the flexibility to influence delay+target selection as above
|
|
||
| As arguments, it should receive the list of servers that are in the pool | ||
| (together with reconnect attempt number), as well as the most up-to-date server | ||
| INFO. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not quite sure why we want to expose all the metrics - the callbacks for reconnecting/reconnect/disconnect have info on what is happening, and grabbing the info for their investigation is available on connection - this means that they can grab/store/use their own data to supplement the selection the "server selector".
Effectively, I would move my internal selection logic to such function and use it internally, this way how it's handled is uniform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you get information from SRV records you get priority and weight as part of the SRV answer, this means reconnections have to proceed respecting those. It's metrics entirely outside of the clients knowledge, only by knowing the recent reconnection counts etc can this pluggable behaviour make those decisions
| attempts, ignoring any advertised servers from the Server. | ||
|
|
||
| New pool should be used for all reconnection attempts until the client is | ||
| restarted or a new pool is specified. New pool is subject to the same rules as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to clarify here what happens if a custom pool it set at run time and gossip learn is on, do they merge like they would normally from the initial url list or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I meant by feedback. Basically the pool cannot be just "give me the next server" There has to be a way for the pool to be given the discovered servers, although the pool can ignore them if it wants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could just add gossiped servers to the pool with an indicator whether it was configured by the user or came from auto-discovery, just like we do without the custom pool/
|
For reference here is the current Java behavior. The pool and re/connect behavior is based on the original go reference implementation. Not here: details about default pool behavior tracking failures and default pool management. Pool behavior:
Reconnect delay handler / callback:
Initial connect behavior:
Reconnect behavior:
|
Signed-off-by: Piotr Piotrowski [email protected]