failover

Usually a command is routed to the managing node that created the resource the command targets (found by the serverId part of its id), exeuted there and then distributed to subscribers. But when the managing node temporary or permanently goes down or is unreachable nothing can be updated which can be a frustrating user experience.

To migrate this there are the so called sibling managing nodes that are also used for load balancing heavy requested resources or reducing response time. Essentially they are complete or partial clones of the main managing node close or anywhere else in the world. All updates are replicated to them when the managing node receives them and the managing node only responses with an accepted response when 51% of them did that to him.

Example:

graph LR;

USER-- update --> managingNode
managingNode-- update-->siblingNode1[sibbling Node 1]
managingNode-- update-->siblingNode2
managingNode-- update-->siblingNode3
siblingNode1-- ok-->managingNode
siblingNode3-- ok-->managingNode
managingNode-- ok --> USER

linkStyle 6 stroke:#99f
linkStyle 5 stroke:#99f
linkStyle 4 stroke:#99f

main managing node receives update and distributes it
receives oks from 2 of 3 (66%) nodes
sends oks to the executer The main managing node will continue to attempt to send the updating command to sibling Node 2.

Main managing node fails

In case the main managing node is unreachable the sibling managing node 1 will be promoted to the main managing node by one of the location controller of the location where the main managing node is located. It will then start accepting and executing as well as distributing incoming commands.

graph LR;
USER-- update --> managingNode
USER-- update --> siblingNode1[sibbling Node 1]
siblingNode1-- update-->managingNode
siblingNode1-- update-->siblingNode2
siblingNode1-- update-->siblingNode3
siblingNode3-- ok-->siblingNode1
siblingNode2-- ok-->siblingNode1
siblingNode1-- ok --> USER

style managingNode stroke:#900
linkStyle 2 stroke:#900
linkStyle 0 stroke:#900

The red lines are failing because the main managing node is unreachable.

Adding a sibling node

You can add a sibling node at almost every time via the registerSibling command on the location Controller (which is responsible for controlling the node resource representation) It will verify your request and then tell the main managing node to add the new server. You will see syncResource commands invoked by the main managing node until you have all resources, then immediately every new updating command to the resources, as well as first reading commands from clients soon after that.

request
syncing (while the managing node records updating commands)
revival of the recorded updating commands
normal operation

Implementation

This design is archived with the redundant reference concept which internally is just a List of sibling managing nodes.

failover

Main managing node fails

Adding a sibling node

Implementation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally