diff --git a/commands/cluster-cancelslotmigrations.md b/commands/cluster-cancelslotmigrations.md new file mode 100644 index 00000000..4ded3e3b --- /dev/null +++ b/commands/cluster-cancelslotmigrations.md @@ -0,0 +1,7 @@ +`CLUSTER CANCELSLOTMIGRATIONS` cancels all in progress +[atomic slot migrations](../topics/atomic-slot-migration.md) initiated through +[`CLUSTER MIGRATESLOTS`](cluster-migrateslots.md). + +Only slot migrations initiated on this node are cancelled. If this node is the +target of a slot migration, the cancellation must be performed on the source +node. diff --git a/commands/cluster-getslotmigrations.md b/commands/cluster-getslotmigrations.md new file mode 100644 index 00000000..b425237f --- /dev/null +++ b/commands/cluster-getslotmigrations.md @@ -0,0 +1,93 @@ +`CLUSTER GETSLOTMIGRATIONS` returns an array of information about in-progress +and recently completed +[atomic slot migrations](../topics/atomic-slot-migration.md). + +Each job previously started by [`CLUSTER MIGRATESLOTS`](cluster-migrateslots.md) +creates a single entry. The number of visible slot migration entries depends on +the configured `cluster-slot-migration-log-max-len`. After the limit is reached, +the oldest inactive entry is removed. Note that active slot migrations will +always be visible through `CLUSTER GETSLOTMIGRATIONS`, even if there are more +entries than the configured limit. + +Information about the slot migration operations is stored in-memory and is not +persisted across restarts. Slot migration entries are visible on the primary +node of both the target and source shard and the replica nodes of the target +shard. + +The following information is reported for each slot migration entry: + +- `name`: A unique 40 byte name for the slot migration +- `operation`: The operation performed by the slot migration job on this node + (either `EXPORT` or `IMPORT`) +- `slot_ranges`: The range(s) of slots being migrated, with both start and end + inclusive. The start and end slot are separated by `-` in each range, and + multiple ranges are joined together with a space ("` `"). +- `target_node`: The primary node receiving the slot ownership as part of the + migration. This information is only supplied on the primary node of the + participating shards. +- `source_node`: The primary node sending the slot ownership as part of the + migration. This information is only supplied on the primary node of the + participating shards. +- `create_time`: The Unix timestamp (in seconds) when the slot + migration was started. +- `last_update_time`: The Unix timestamp (in seconds) when the slot + migration's status was last changed. +- `last_ack_time`: The Unix timestamp (in seconds) when the slot + migration last received a heartbeat. +- `state`: The current state of the slot migration. The terminal states are + `success` if completed successfully, `failed` if some unexpected failure + occurred, and `cancelled` if a `CLUSTER CANCELSLOTMIGRATIONS` request was + received during the migration. Any other state denotes an active slot + migration. +- `message`: Either a human readable status message, if there is more + information to display about the state, or empty string if no message is + available. +- `cow_size`: The copy-on-write overhead accumulated while the migration was + in progress, in bytes. + +## Examples + +### Response in RESP 2 + +``` +127.0.0.1:30001> CLUSTER GETSLOTMIGRATIONS +1) 1) "name" + 2) "5371b28997de6fd0bbe813ad8ebdfdf2faadb308" + 3) "operation" + 4) "EXPORT" + 5) "slot_ranges" + 6) "0-10" + 7) "target_node" + 8) "4b4f12fdfb58d5e30fef7b9ad3f1651dacbbaba9" + 9) "source_node" + 10) "93941e777e17fcbc92d4398cc957ffea888f472b" + 11) "create_time" + 12) (integer) 1754870400 + 13) "last_update_time" + 14) (integer) 1754870400 + 15) "last_ack_time" + 16) (integer) 1754870400 + 17) "state" + 18) "success" + 19) "message" + 20) "" + 21) "cow_size" + 22) (integer) 0 +``` + +### Response in RESP 3 + +``` +127.0.0.1:30001> CLUSTER GETSLOTMIGRATIONS +1) 1# "name" => "5371b28997de6fd0bbe813ad8ebdfdf2faadb308" + 2# "operation" => "EXPORT" + 3# "slot_ranges" => "0-10" + 4# "target_node" => "4b4f12fdfb58d5e30fef7b9ad3f1651dacbbaba9" + 5# "source_node" => "93941e777e17fcbc92d4398cc957ffea888f472b" + 6# "create_time" => (integer) 1754870400 + 7# "last_update_time" => (integer) 1754870400 + 8# "last_ack_time" => (integer) 1754870400 + 9# "state" => "success" + 10# "message" => "" + 11# "cow_size" => (integer) 0 +``` diff --git a/commands/cluster-migrateslots.md b/commands/cluster-migrateslots.md new file mode 100644 index 00000000..bb888c30 --- /dev/null +++ b/commands/cluster-migrateslots.md @@ -0,0 +1,22 @@ +`CLUSTER MIGRATESLOTS` initiates an asynchronous migration of the designated +slot range(s) to the specified target node using +[atomic slot migration](../topics/atomic-slot-migration.md). + +This command allows for many slot ranges in a single migration through repeated +start and end slot pairs within the `SLOTSRANGE` block. It also supports +multiple migrations in one command, through repeated `SLOTSRANGE` and `NODE` +blocks. For example: + +``` +CLUSTER MIGRATESLOTS SLOTSRANGE 0 9 20 29 NODE SLOTSRANGE 10 19 NODE +``` + +Initiates two slot migration jobs, one to `` with 20 slots (0-9 +inclusive, 20-29 inclusive) and another to `` with 10 slots (10-19 +inclusive). + +`OK` is returned if all slot migrations are successfully initiated, otherwise an +error message is returned and no slot migrations are initiated. + +To check on the progress of the slot migration, use the +[`CLUSTER GETSLOTMIGRATIONS`](cluster-getslotmigrations.md) command. diff --git a/commands/cluster-setslot.md b/commands/cluster-setslot.md index 4fb29a83..08ac6f20 100644 --- a/commands/cluster-setslot.md +++ b/commands/cluster-setslot.md @@ -1,4 +1,8 @@ -`CLUSTER SETSLOT` is responsible for changing the state of a hash slot in the receiving node in different ways. It can, depending on the subcommand used: +`CLUSTER SETSLOT` is responsible for changing the state of a hash slot in the receiving node in different ways. It is part of the legacy mechanism for cluster resharding. + +**Note:** For live resharding, the newer [atomic slot migration](../topics/atomic-slot-migration.md) mechanism using `CLUSTER MIGRATESLOTS` is recommended as it is faster, more reliable, and has less impact on client applications. + +`CLUSTER SETSLOT` can, depending on the subcommand used: 1. `MIGRATING` subcommand: Set a hash slot in *migrating* state. 2. `IMPORTING` subcommand: Set a hash slot in *importing* state. diff --git a/commands/cluster-syncslots.md b/commands/cluster-syncslots.md new file mode 100644 index 00000000..0e164db7 --- /dev/null +++ b/commands/cluster-syncslots.md @@ -0,0 +1,4 @@ +Internal command to allow navigation of the atomic slot migration state machine. + +For more information about atomic slot migration in Valkey please check the +[atomic slot migration][../topics/atomic-slot-migration.md] page. diff --git a/topics/atomic-slot-migration.md b/topics/atomic-slot-migration.md new file mode 100644 index 00000000..972b036b --- /dev/null +++ b/topics/atomic-slot-migration.md @@ -0,0 +1,120 @@ +--- +title: Atomic slot migration +description: Overview of atomic slot migration +--- + +In [Valkey Cluster](cluster-spec.md), you can use a process known as slot +migration to scale your cluster in or out. During slot migration, one or more of +the 16384 hash slots are moved from a source node to a target node. Valkey 9.0 +introduced a option for migrating hash slots known as **atomic slot +migration**, which is faster, more reliable, and has less impact on client +applications than the legacy `CLUSTER SETSLOT`-based migration. + +## Performing an atomic slot migration using `CLUSTER MIGRATESLOTS` + +Valkey 9.0 does not get rid of the legacy slot migration option, but it does +introduce atomic slot migration as a second option. To perform an atomic slot +migration, an operator performs the following steps: + +1. Send `` + `CLUSTER MIGRATESLOTS SLOTSRANGE NODE ` +2. Poll `` for progress using `CLUSTER GETSLOTMIGRATIONS` + +`CLUSTER MIGRATESLOTS` initiates a migration of the designated slot range to the +specified target node. The slot migration process is then performed +asynchronously. + +For more details on `CLUSTER MIGRATESLOTS` see the +[command documentation](../commands/cluster-migrateslots.md). + +## Polling atomic slot migrations + +The `CLUSTER GETSLOTMIGRATIONS` command allows you to poll the status of your +migration. `CLUSTER GETSLOTMIGRATIONS` can be executed on either the source node +or the target node. In progress migrations will always be shown, and recently +completed migrations will be visible up to a configurable threshold. In the case +of a failure, the slot migration will also include a short description of the +failure to allow for retry decisions. + +For more details on `CLUSTER GETSLOTMIGRATIONS` see the +[command documentation](../commands/cluster-getslotmigrations.md). + +## Canceling atomic slot migrations + +If you need to cancel a slot migration after the process was started, Valkey +provides the `CLUSTER CANCELSLOTMIGRATIONS` command to cancel all active atomic +slot migrations for which that node is the source node. This command can be sent +to the whole cluster to cancel all slot migrations everywhere. + +For more details on `CLUSTER CANCELSLOTMIGRATIONS` see the +[command documentation](../commands/cluster-cancelslotmigrations.md). + +## Behind the scenes of atomic slot migration + +Atomic slot migration utilizes a completely different process than +`CLUSTER SETSLOT`-based migrations: + +1. Immediately after `CLUSTER MIGRATESLOTS` is received by the source node, it + initiates a connection to the target node and performs authentication, + similar to how a replication link is initialized. +2. Once established, the source node uses a new internal command - + `CLUSTER SYNCSLOTS` - to inform the target of the migration. +3. The source node then forks a child process to do a one-time snapshot of the + slot contents. The fork iterates all hash slots and serializes their contents + over the slot migration link. The contents are subsequently replicated to any + replicas of the target node. +4. While the child process is doing the snapshot, the parent process tracks all + mutations performed on the migrating hash slots. +5. Once the child process snapshot finishes, the parent process sends all + accumulated mutations. Any new mutations received during this step are also + sent. +6. Once the amount of in-flight mutations goes below a configured threshold, the + parent process pauses write commands temporarily to allow final + synchronization of the hash slots. +7. Once the target node is completely caught up, it takes over the hash slots + and broadcasts ownership to the cluster +8. When the source node finds out about the migration, it deletes the keys in + the hash slot and unpauses write commands. Clients will now get `MOVED` + redirections to the target node, which now owns the hash slots. The slot + migration is completed. + +### Isolation of importing hash slots from clients + +Since slot ownership is not moved until the very end of the migration, commands +targeting migrating hash slots on the target node will receive `MOVED` +redirections per the cluster specification. But there are some commands that +operate on the entire database: + +1. `KEYS`/`SCAN`: These commands allow a client to list out all keys on a shard. +2. `DBSIZE`/`INFO`: These commands provide statistical information about how + many keys are on a shard. +3. `FLUSHDB`/`FLUSHALL`: These commands allow a client to drop all data in a + database, or on all databases, on a node. + +To handle this, all importing hash slots are marked specially and hidden from +read operations on both the target primary and the target replica. + +`FLUSHDB` and `FLUSHALL` present a special case where we fail the slot migration +when being executed on **both the source and target node**. It is expected that +operators would retry the migration after flushing, which should now succeed +almost instantly due to an empty database. + +## Configuring atomic slot migration + +Some configurations may be worth tuning based on your workload: + +- `client-output-buffer-limit`: Since atomic slot migration uses the + replication process to migrate the slots, the amount of accumulated mutations + while snapshotting could exceed that of the configured replication output + buffer limit. Both the hard and soft limits of the `replica` client output + buffer should be configured large enough to accumulate the accumulated + mutations. +- `slot-migration-max-failover-repl-bytes`: By default, atomic slot migration + will only proceed to pausing mutations on the source node once all in-flight + mutations have been sent to the target node. However, for workloads with + persistently high write throughput, atomic slot migration can be configured to + do the pause so long as all in-flight mutations are under a given threshold. +- `cluster-slot-migration-log-max-len`: atomic slot migration keeps track of all + in progress migrations and recently completed or failed migrations. These can + be viewed with `CLUSTER GETSLOTMIGRATIONS`. The number of recently completed + migrations stored can be increased using this configuration. diff --git a/topics/cluster-spec.md b/topics/cluster-spec.md index 561f50a4..2c998cdc 100644 --- a/topics/cluster-spec.md +++ b/topics/cluster-spec.md @@ -407,7 +407,25 @@ later in this document, otherwise it is not a complete Valkey Cluster client. ### Live resharding -Valkey Cluster supports the ability to add and remove nodes while the cluster +#### Atomic slot migration + +Valkey 9.0 introduced a server-side mechanism for live resharding called +[atomic slot migration](atomic-slot-migration.md), which is the recommended method. + +When compared to legacy slot migration, atomic slot migration allows you to +perform live resharding faster, with higher reliability, and less client-side +impact. Atomic slot migration is started using the `CLUSTER MIGRATESLOTS` command. + +You can find more details about atomic slot migration in the following pages: + +* [Atomic slot migration overview](atomic-slot-migration.md) +* [`CLUSTER MIGRATESLOTS` command documentation](../commands/cluster-migrateslots.md) +* [`CLUSTER GETSLOTMIGRATIONS` command documentation](../commands/cluster-getslotmigrations.md) +* [`CLUSTER CANCELSLOTMIGRATIONS` command documentation](../commands/cluster-cancelslotmigrations.md) + +#### Legacy slot migration + +Valkey Cluster also supports a legacy, client-driven mechanism to add and remove nodes while the cluster is running. Adding or removing a node is abstracted into the same operation: moving a hash slot from one node to another. This means that the same basic mechanism can be used in order to rebalance the cluster, add diff --git a/topics/index.md b/topics/index.md index b41a8fe8..03162ee3 100644 --- a/topics/index.md +++ b/topics/index.md @@ -72,6 +72,7 @@ It's released under the * [Sentinel client spec](sentinel-clients.md): How to build clients for Valkey Sentinel. * [Cluster tutorial](cluster-tutorial.md): A gentle introduction to Valkey Cluster, a deployment mode for horizontal scaling and high availability. * [Cluster specification](cluster-spec.md): The more formal description of the behavior and algorithms used in Valkey Cluster. +* [Atomic slot migration](atomic-slot-migration.md): An overview of atomic slot migration in Valkey Cluster. ### Security * [Security](security.md): An overview of Valkey's security. diff --git a/wordlist b/wordlist index ce95d685..7be5ade6 100644 --- a/wordlist +++ b/wordlist @@ -1066,4 +1066,6 @@ namespaces expirations pluggable Json -uptime \ No newline at end of file +uptime +unpause +unpauses \ No newline at end of file