-
Notifications
You must be signed in to change notification settings - Fork 0
failover
Usually a command is routed to the managing node that created the resource the command targets (found by the serverId part of its id), exeuted there and then distributed to subscribers. But when the managing node temporary or permanently goes down or is unreachable nothing can be updated which can be a frustrating user experience.
To migrate this there are the so called sibling managing nodes that are also used for load balancing heavy requested resources or reducing response time. Essentially they are complete or partial clones of the main managing node close or anywhere else in the world. All updates are replicated to them when the managing node receives them and the managing node only responses with an accepted response when 51% of them did that to him.
Example:
graph LR;
USER-- update --> managingNode
managingNode-- update-->siblingNode1[sibbling Node 1]
managingNode-- update-->siblingNode2
managingNode-- update-->siblingNode3
siblingNode1-- ok-->managingNode
siblingNode3-- ok-->managingNode
managingNode-- ok --> USER
linkStyle 6 stroke:#99f
linkStyle 5 stroke:#99f
linkStyle 4 stroke:#99f
- main managing node receives update and distributes it
- receives oks from 2 of 3 (66%) nodes
- sends oks to the executer
The
main managing nodewill continue to attempt to send the updating command tosibling Node 2.
In case the main managing node is unreachable the sibling managing node 1 will be promoted to the main managing node by one of the location controller of the location where the main managing node is located. It will then start accepting and executing as well as distributing incoming commands.
graph LR;
USER-- update --> managingNode
USER-- update --> siblingNode1[sibbling Node 1]
siblingNode1-- update-->managingNode
siblingNode1-- update-->siblingNode2
siblingNode1-- update-->siblingNode3
siblingNode3-- ok-->siblingNode1
siblingNode2-- ok-->siblingNode1
siblingNode1-- ok --> USER
style managingNode stroke:#900
linkStyle 2 stroke:#900
linkStyle 0 stroke:#900
The red lines are failing because the main managing node is unreachable.
You can add a sibling node at almost every time via the registerSibling command on the location Controller (which is responsible for controlling the node resource representation) It will verify your request and then tell the main managing node to add the new server. You will see syncResource commands invoked by the main managing node until you have all resources, then immediately every new updating command to the resources, as well as first reading commands from clients soon after that.
- request
- syncing (while the managing node records updating commands)
- revival of the recorded updating commands
- normal operation
This design is archived with the redundant reference concept which internally is just a List of sibling managing nodes.