Skip to content

Commit 2d19710

Browse files
authored
Merge pull request #6549 from EnterpriseDB/docs/edits_to_pgd_pr6480
Edits to DOCS-1195 Failover slots #6480
2 parents 08d15f0 + 88d1423 commit 2d19710

File tree

1 file changed

+23
-25
lines changed

1 file changed

+23
-25
lines changed

product_docs/docs/pgd/5.7/cdc-failover.mdx

+23-25
Original file line numberDiff line numberDiff line change
@@ -11,35 +11,35 @@ This is a PGD 5.7 and later feature. It is not supported on earlier versions of
1111

1212
## Background
1313

14-
Earlier versions of PGD have allowed the creation of logical replication slots on nodes that can provide a feed of the logical changes happening to the data within the database. These logical replication slots have been local to the node and not replicated. Apart from only replicating changes on the particular node, this has presented challenges when faced with node failover within the cluster. In that scenario, a consumer of the logical replication off a node which fails has no replica of the slot on another node to continue consuming from.
14+
Earlier versions of PGD have allowed the creation of logical replication slots on nodes that can provide a feed of the logical changes happening to the data in the database. These logical replication slots have been local to the node and not replicated. Apart from only replicating changes on the particular node, this behavior has presented challenges when faced with node failover in the cluster. In that scenario, a consumer of the logical replication off a node that fails has no replica of the slot on another node to continue consuming from.
1515

16-
While solutions to this can be engineered using a subscriber-only node as an intermediary, it does significantly raise the cost of logical replication.
16+
While solutions to this can be engineered using a subscriber-only node as an intermediary, it significantly raises the cost of logical replication.
1717

1818
## CDC Failover support
1919

20-
To address this need, PGD 5.7 introduces CDC Failover support. This is an optionally enabled feature which activates automatic logical slot replication across the cluster. This, in turn, allows a consumer of a logical slot’s replication to receive change data from any node when a failure occurs.
20+
To address this need, PGD 5.7 introduces CDC Failover support. This is an optionally enabled feature that activates automatic logical slot replication across the cluster. This, in turn, allows a consumer of a logical slot’s replication to receive change data from any node when a failure occurs.
2121

2222
### How CDC Failover works
2323

24-
When a logical slot is created on a node with CDC Failover support enabled, the slot is replicated across the cluster. This means that the slot is available for consumption on any node in the cluster. When a node fails, the slot can be consumed from another node in the cluster. This allows for the continuation of the logical replication stream without interruption.
24+
When a logical slot is created on a node with CDC Failover support enabled, the slot is replicated across the cluster. This means that the slot is available for consumption on any node in the cluster. When a node fails, the slot can be consumed from another node in the cluster. This allows for continuing the logical replication stream without interruption.
2525

26-
If, though, the consumer of the slot connects to a different node in the cluster, the previous connection the consumer had will be closed by PGD. This is to ensure that the slot is not being consumed from multiple nodes at the same time. In the background, PGD is using its Raft consensus protocol to ensure that the slot is only being consumed from one node at a time; this does mean that the guarantee of only one slot being consumed at a time does not hold in split-brain scenarios.
26+
If, though, the consumer of the slot connects to a different node in the cluster, the previous connection the consumer had will be closed by PGD. This behavior is to ensure that the slot isn't being consumed from multiple nodes at the same time. In the background, PGD is using its Raft consensus protocol to ensure that the slot is being consumed from only one node at a time. This means that the guarantee of only one slot being consumed at a time doesn't hold in split-brain scenarios.
2727

28-
Currently CDC Failover support is a global option that is controlled by a top-group option. The `failover_slot_scope` top-group option can currently be set to (and defaults to) local which disables replication of logical slots or `global` which enables the replication of all non-temporary logical slots created in the PGD database.
28+
Currently CDC Failover support is a global option that's controlled by a top-group option. The `failover_slot_scope` top-group option can currently be set to (and defaults to) `local`, which disables replication of logical slots, or `global`. The `global` setting enables the replication of all non-temporary logical slots created in the PGD database.
2929

30-
Temporary logical slots will not be replicated as they have a lifetime scoped to the session that created them and will go away when that session ends.
30+
Temporary logical slots aren't replicated, as they have a lifetime scoped to the session that created them and will go away when that session ends.
3131

3232
### At-least-once delivery guarantees
3333

34-
CDC Failover support takes steps to ensure that the consumer receives all changes at least once. This is done by holding back slots until delivery has been confirmed, at which point the slot is then advanced on all nodes in an asynchronous manner. In the case of a failure on the node where the slot was being consumed, the slot would be held until the consumer connected to a node in the cluster. This would then allow the slot to progress.
34+
CDC Failover support takes steps to ensure that the consumer receives all changes at least once. This is done by holding back slots until delivery has been confirmed, at which point the slot is then advanced on all nodes in an asynchronous manner. In the case of a failure on the node where the slot was being consumed, the slot is held until the consumer connects to a node in the cluster. This then allows the slot to progress.
3535

3636
!!! Important
37-
If a consuming application disconnects and doesn’t reconnect the slot will remain on held back on every node in the cluster. As this consumes disk and memory, it is essential that this situation is avoided; applications which consume slots must return to consuming as soon as possible.
37+
If a consuming application disconnects and doesn’t reconnect, the slot will remain held back on every node in the cluster. As this consumes disk and memory, it's essential to avoid this situation. Applications that consume slots must return to consuming as soon as possible.
3838
!!!
3939

4040
### Exactly-once delivery
4141

42-
Currently, there is no way to ensure exactly-once delivery, and we expect consuming applications to manage the discarding of previously completed transactions.
42+
Currently, there's no way to ensure exactly-once delivery, and we expect consuming applications to manage the discarding of previously completed transactions.
4343

4444
## Enabling CDC Failover support
4545

@@ -52,7 +52,7 @@ select bdr.alter_node_group_option(<top-level group name>,
5252

5353
```
5454

55-
Replacing `<top-level group name>` with the name of your cluster’s top level group.
55+
Replace `<top-level group name>` with the name of your cluster’s top-level group. If you don't know the name, it's the group with a node_group_parent_id equal to 0 in `[`bdr.node_group`](/pgd/5/reference/catalogs-visible#bdrnode_group)`.
5656

5757
If you do not know the name, it is the group with a node_group_parent_id equal to 0 in `[`bdr.node_group`](/pgd/latest/reference/catalogs-visible#bdrnode_group)`. You can also use:
5858

@@ -65,26 +65,24 @@ SELECT bdr.alter_node_group_option(
6565
where node_group_parent_id=0;
6666
```
6767

68-
To ensure you are setting the correct, top-level group’s option.
68+
This command ensures you're setting the correct top-level group’s option.
6969

70-
Once enabled you can use:
70+
Once CDC Failover is enabled, to create a new globally replicated slot, you can use:
7171

7272
```sql
7373
SELECT pg_create_logical_replication_slot('myslot',
7474
'test_decoding');
7575
```
7676

77-
To create a new globally replicated slot.
78-
79-
Note that logical replication slots created before the option is set to `global` will not be replicated. Only new slots will be replicated.
77+
Logical replication slots created before the option was set to `global` aren't replicated. Only new slots are replicated.
8078

8179
Failover slots can also be created with the `CREATE_REPLICATION_SLOT` command on a replication connection.
8280

8381
The status of failover slots is tracked in the [`bdr.failover_replication_slots`](/pgd/latest/reference/catalogs-visible#bdrfailover_replication_slots) table.
8482

8583
## CDC Failover support with Postgres 17+
8684

87-
For Postgres 17 and later, support for failover was added to allow standbys to be resumed, through an option in `pg_create_logical_replication_slot` named `failover`. This new setting requires that, no matter what the setting of `failover_slot_scope`, you must also set `failover` to `true`.
85+
For Postgres 17 and later, support for failover was added to allow standbys to be resumed. Use an option in `pg_create_logical_replication_slot` named `failover` for this purpose. This new setting requires that, no matter what the setting of `failover_slot_scope`, you must also set `failover` to `true`.
8886

8987
```sql
9088
SELECT pg_create_logical_replication_slot('myslot',
@@ -94,12 +92,12 @@ SELECT pg_create_logical_replication_slot('myslot',
9492

9593
## Limitations
9694

97-
The CDC Failover Slot support comes with certain limitations.
95+
The CDC Failover Slot support comes with certain limitations:
9896

99-
* CDC Failover slot support requires the latest versions of PGD (5.7+) and the latest minor releases of Postgres-Extended or EPAS (available Feb 2025\).
100-
* CDC Failover support is a global option and cannot be set on a per-slot basis. It is possible though, because changing the enabled status of CDC Failover does not affect previously provisioned slots, to enable it (set to `global`), create a replicated slot, then disable it (set to `local`), to create a singular replicated slot.
101-
* CDC Failover support is not supported on temporary slots.
102-
* CDC Failover support is not supported on slots created with the `failover` option set to `false`.
103-
* CDC Failover support works with EDB Postgres Advanced Server and EDB Postgres Extended Server only. It is not supported on community Postgres installations.
104-
* Existing slots are not automatically converted into failover slots when the option is enabled.
105-
* While Postgres’s built-in functions such as `pg_logical_slot_get_changes()` can be used they won’t ensure that the slot is not being decoded anywhere else and can’t update replication progress accurately across the cluster. Therefore it’s recommended not to rely on the function to receive decoded changes.
97+
* CDC Failover slot support requires the latest versions of EDB Postgres Distributed (PGD) 5.7+ and the latest minor releases of Postgres Extended or EDB Postgres Advanced Server (available Feb 2025).
98+
* CDC Failover support is a global option and can't be set on a per-slot basis. Because changing the enabled status of CDC Failover doesn't affect previously provisioned slots, it's possible to enable it (set to `global`), create a replicated slot, then disable it (set to `local`) to create a singular replicated slot.
99+
* CDC Failover support isn't supported on temporary slots.
100+
* CDC Failover support isn't supported on slots created with the `failover` option set to `false`.
101+
* CDC Failover support works with EDB Postgres Advanced Server and EDB Postgres Extended Server only. It isn't supported on community Postgres installations.
102+
* Existing slots aren't converted into failover slots when the option is enabled.
103+
* While Postgres’s built-in functions such as `pg_logical_slot_get_changes()` can be used, they won’t ensure that the slot isn't being decoded anywhere else and can’t update replication progress accurately across the cluster. Therefore, we recommend that you don't rely on the function to receive decoded changes.

0 commit comments

Comments
 (0)