-
Notifications
You must be signed in to change notification settings - Fork 48
ADR-40: Custom server pool and reconnecting to specific server #388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -7,11 +7,12 @@ | |
| | Status | Implemented | | ||
| | Tags | client, server, spec | | ||
|
|
||
| | Revision | Date | Author | Info | | ||
| | -------- | ---------- | -------- | ---------------------------- | | ||
| | 1 | 2023-10-12 | @Jarema | Initial draft | | ||
| | 2 | 2024-6-24 | @aricart | Added protocol error section | | ||
| | 3 | 2024-9-30 | @Jarema | Add connection Statistics | | ||
| | Revision | Date | Author | Info | | ||
| | -------- | ---------- | --------- | -------------------------------------------- | | ||
| | 1 | 2023-10-12 | @Jarema | Initial draft | | ||
| | 2 | 2024-6-24 | @aricart | Added protocol error section | | ||
| | 3 | 2024-9-30 | @Jarema | Add connection Statistics | | ||
| | 4 | 2025-11-05 | @piotrpio | Add custom server pool and reconnect handler | | ||
|
|
||
| ## Summary | ||
|
|
||
|
|
@@ -125,6 +126,46 @@ all reconnect options are respected. | |
| For most clients, that means having a `reconnect` method on the | ||
| Client/Connection handle. | ||
|
|
||
| #### Custom server pool | ||
|
|
||
| Client should have a way to specify custom server pool for reconnection | ||
| attempts. When specified, client should use only those servers for reconnection | ||
| attempts, ignoring any advertised servers from the Server. | ||
|
|
||
| New pool should be used for all reconnection attempts until the client is | ||
| restarted or a new pool is specified. New pool is subject to the same rules as | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Need to clarify here what happens if a custom pool it set at run time and gossip learn is on, do they merge like they would normally from the initial url list or not?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is what I meant by feedback. Basically the pool cannot be just "give me the next server" There has to be a way for the pool to be given the discovered servers, although the pool can ignore them if it wants.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, we could just add gossiped servers to the pool with an indicator whether it was configured by the user or came from auto-discovery, just like we do without the custom pool/ |
||
| the default one - randomization unless disabled, max reconnect attempts, etc. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The pool should not have to care about anything of the normal options or behavior, but it can care if it choses. This is the point of suppling a custom pool, to allow it to do anything it wants. This means that randomization and the act of resolving ip addresses are part of the pool behavior. Max reconnect attempts is not part of the pool behavior, it's part of the behavior that uses the pool. The behavior should ask for a resolved ip address, but the pool can chose what the result of a resolved ip address request is, so for instance it could give a resolved address or not when asked for an address
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ie - once the server list is set, those are the only servers - on connect, the client will learn new servers - if the client doesn't want to use gossiped servers, it should specify its option to ignore cluster updates. |
||
|
|
||
| ```go | ||
| // SetServerPool sets a new server pool for reconnection attempts. | ||
| // It replaces any existing pool, including advertised servers. | ||
| func (nc *Conn) SetServerPool(servers []string) error | ||
| ``` | ||
|
|
||
| #### Reconnect to specific server | ||
|
|
||
| Client should expose an option to configure a callback that will be called | ||
| before each reconnection attempt. | ||
|
|
||
| As arguments, it should receive the list of servers that are in the pool | ||
| (together with reconnect attempt number), as well as the most up-to-date server | ||
| INFO. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not quite sure why we want to expose all the metrics - the callbacks for reconnecting/reconnect/disconnect have info on what is happening, and grabbing the info for their investigation is available on connection - this means that they can grab/store/use their own data to supplement the selection the "server selector". Effectively, I would move my internal selection logic to such function and use it internally, this way how it's handled is uniform.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When you get information from SRV records you get priority and weight as part of the SRV answer, this means reconnections have to proceed respecting those. It's metrics entirely outside of the clients knowledge, only by knowing the recent reconnection counts etc can this pluggable behaviour make those decisions |
||
|
|
||
| The callback should return the server URL to which the client should attempt to | ||
| reconnect (a url outside of the pool should not be allowed). Additionally, the | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we care if the reconnect to a specific server wants to connect to something outside of the pool? What happens when a reconnect to a specific server fails?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A retry like normal. We want to clearly separate concerns here:
These are different things, the callback is picking from the pool only. |
||
| callback should allow to specify whether client should include a delay before | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The delay should not be part of the callback - it's already configured for the connection options. The pool should simply give a server address to connect to.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not really true, we have various dynamic reconnect delay options already. The use case here for example when you get information about primary and backup clusters from a service discovery system. You would have a primary and backup cluster, you'd try the primary cluster a few times (with backoff) and finally decide on some value to switch to the standby cluster - that switch should essentially reset the backoff because now you're retrying against a new cluster. I think probably the simple boolean here is not enough, but we do need the flexibility to influence delay+target selection as above
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed, I'm also not sure a boolean is enough here, but given that we already have either
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a different concern than the pool that is already addressed, granted in a way that the user may not be able override in certain implementations. It may be simple enough to allow a callback for this specific behavior. For instance the Java code already has a functional callback, outside of the pool, that can have dynamic behavior, and the developer has the ability to have that callback be aware or influenced by the pool. This is the model I propose. Keep the pool the pool. Keep other behaviors separate. It's up to the user to coordinate that behavior. |
||
| attempting to reconnect to the selected server (based on what's configured in | ||
| the client options). | ||
|
|
||
| ```go | ||
| // Return values: | ||
| // - url.URL: The server URL to connect to. Can be from the provided servers slice | ||
| // or a new server URL. New servers will be automatically added to the pool. | ||
| // - bool: If true, apply reconnect delay (ReconnectWait + jitter or CustomReconnectDelayCB) | ||
| // before attempting connection. If false, attempt connection immediately. | ||
| type ReconnectToServerHandler func([]Server, ServerInfo) (*url.URL, bool) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this should simply return the server - this is a "select" server - if this is done any client flag for selection/retry etc is getting ignored - not a fan.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See comment above:
|
||
| ``` | ||
|
|
||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There also needs to be feedback to the pool.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Agree, we should probably add discovered servers o the pool (with information on whether a server was added from server INFO or by the user)
Implementation may vary, but in go all servers in pool have a reconnect counter which is reset on successful connect (which is used for tracking |
||
| #### Detecting disconnection | ||
|
|
||
| There are two methods that clients should use to detect disconnections: | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sure...clients should have a way to specify a custom server pool for ALL connection attempts including the original connection and reconnection attempts.
It also seems logical that the pool is part of connection options. Is this document saying that the pool can be replaced at any time? Much like an auth handler, I think the pool should be able to change it's behavior itself, for instance in response to outside input, RTTs or any other thing without having to replace the entire pool with another instance...but this may be a language specific problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it could be more clear, but yes, the idea is to have a method which sets a server pool which you can invoke whenever you want. It does not force reconnect itself, it just configures the pool and you can call it whenever and however many times you want.
As for the initial server pool - it's already part of establishing connection in all clients right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just saying... Java only allows you to set the server pool implementation as part of connection options. I'm not entirely against allowing the dev to set this at any time, I'm just saying it would make more sense if the pool implementation is dynamic in a similar way to an auth handler that reads from a file every time. Just set it at the beginning of the connection and you are done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In simpler words: calling this API replaces all the servers, with the new provided servers.