You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: product_docs/docs/pgd/5.7/cdc-failover.mdx
+23-25
Original file line number
Diff line number
Diff line change
@@ -11,35 +11,35 @@ This is a PGD 5.7 and later feature. It is not supported on earlier versions of
11
11
12
12
## Background
13
13
14
-
Earlier versions of PGD have allowed the creation of logical replication slots on nodes that can provide a feed of the logical changes happening to the data within the database. These logical replication slots have been local to the node and not replicated. Apart from only replicating changes on the particular node, this has presented challenges when faced with node failover within the cluster. In that scenario, a consumer of the logical replication off a node which fails has no replica of the slot on another node to continue consuming from.
14
+
Earlier versions of PGD have allowed the creation of logical replication slots on nodes that can provide a feed of the logical changes happening to the data in the database. These logical replication slots have been local to the node and not replicated. Apart from only replicating changes on the particular node, this behavior has presented challenges when faced with node failover in the cluster. In that scenario, a consumer of the logical replication off a node that fails has no replica of the slot on another node to continue consuming from.
15
15
16
-
While solutions to this can be engineered using a subscriber-only node as an intermediary, it does significantly raise the cost of logical replication.
16
+
While solutions to this can be engineered using a subscriber-only node as an intermediary, it significantly raises the cost of logical replication.
17
17
18
18
## CDC Failover support
19
19
20
-
To address this need, PGD 5.7 introduces CDC Failover support. This is an optionally enabled feature which activates automatic logical slot replication across the cluster. This, in turn, allows a consumer of a logical slot’s replication to receive change data from any node when a failure occurs.
20
+
To address this need, PGD 5.7 introduces CDC Failover support. This is an optionally enabled feature that activates automatic logical slot replication across the cluster. This, in turn, allows a consumer of a logical slot’s replication to receive change data from any node when a failure occurs.
21
21
22
22
### How CDC Failover works
23
23
24
-
When a logical slot is created on a node with CDC Failover support enabled, the slot is replicated across the cluster. This means that the slot is available for consumption on any node in the cluster. When a node fails, the slot can be consumed from another node in the cluster. This allows for the continuation of the logical replication stream without interruption.
24
+
When a logical slot is created on a node with CDC Failover support enabled, the slot is replicated across the cluster. This means that the slot is available for consumption on any node in the cluster. When a node fails, the slot can be consumed from another node in the cluster. This allows for continuing the logical replication stream without interruption.
25
25
26
-
If, though, the consumer of the slot connects to a different node in the cluster, the previous connection the consumer had will be closed by PGD. This is to ensure that the slot is not being consumed from multiple nodes at the same time. In the background, PGD is using its Raft consensus protocol to ensure that the slot is only being consumed from one node at a time; this does mean that the guarantee of only one slot being consumed at a time does not hold in split-brain scenarios.
26
+
If, though, the consumer of the slot connects to a different node in the cluster, the previous connection the consumer had will be closed by PGD. This behavior is to ensure that the slot isn't being consumed from multiple nodes at the same time. In the background, PGD is using its Raft consensus protocol to ensure that the slot is being consumed from only one node at a time. This means that the guarantee of only one slot being consumed at a time doesn't hold in split-brain scenarios.
27
27
28
-
Currently CDC Failover support is a global option that is controlled by a top-group option. The `failover_slot_scope` top-group option can currently be set to (and defaults to) local which disables replication of logical slots or `global` which enables the replication of all non-temporary logical slots created in the PGD database.
28
+
Currently CDC Failover support is a global option that's controlled by a top-group option. The `failover_slot_scope` top-group option can currently be set to (and defaults to) `local`, which disables replication of logical slots, or `global`. The `global` setting enables the replication of all non-temporary logical slots created in the PGD database.
29
29
30
-
Temporary logical slots will not be replicated as they have a lifetime scoped to the session that created them and will go away when that session ends.
30
+
Temporary logical slots aren't replicated, as they have a lifetime scoped to the session that created them and will go away when that session ends.
31
31
32
32
### At-least-once delivery guarantees
33
33
34
-
CDC Failover support takes steps to ensure that the consumer receives all changes at least once. This is done by holding back slots until delivery has been confirmed, at which point the slot is then advanced on all nodes in an asynchronous manner. In the case of a failure on the node where the slot was being consumed, the slot would be held until the consumer connected to a node in the cluster. This would then allow the slot to progress.
34
+
CDC Failover support takes steps to ensure that the consumer receives all changes at least once. This is done by holding back slots until delivery has been confirmed, at which point the slot is then advanced on all nodes in an asynchronous manner. In the case of a failure on the node where the slot was being consumed, the slot is held until the consumer connects to a node in the cluster. This then allows the slot to progress.
35
35
36
36
!!! Important
37
-
If a consuming application disconnects and doesn’t reconnect the slot will remain on held back on every node in the cluster. As this consumes disk and memory, it is essential that this situation is avoided; applications which consume slots must return to consuming as soon as possible.
37
+
If a consuming application disconnects and doesn’t reconnect, the slot will remain held back on every node in the cluster. As this consumes disk and memory, it's essential to avoid this situation. Applications that consume slots must return to consuming as soon as possible.
38
38
!!!
39
39
40
40
### Exactly-once delivery
41
41
42
-
Currently, there is no way to ensure exactly-once delivery, and we expect consuming applications to manage the discarding of previously completed transactions.
42
+
Currently, there's no way to ensure exactly-once delivery, and we expect consuming applications to manage the discarding of previously completed transactions.
43
43
44
44
## Enabling CDC Failover support
45
45
@@ -52,7 +52,7 @@ select bdr.alter_node_group_option(<top-level group name>,
52
52
53
53
```
54
54
55
-
Replacing`<top-level group name>` with the name of your cluster’s toplevel group.
55
+
Replace`<top-level group name>` with the name of your cluster’s top-level group. If you don't know the name, it's the group with a node_group_parent_id equal to 0 in `[`bdr.node_group`](/pgd/5/reference/catalogs-visible#bdrnode_group)`.
56
56
57
57
If you do not know the name, it is the group with a node_group_parent_id equal to 0 in `[`bdr.node_group`](/pgd/latest/reference/catalogs-visible#bdrnode_group)`. You can also use:
Note that logical replication slots created before the option is set to `global` will not be replicated. Only new slots will be replicated.
77
+
Logical replication slots created before the option was set to `global` aren't replicated. Only new slots are replicated.
80
78
81
79
Failover slots can also be created with the `CREATE_REPLICATION_SLOT` command on a replication connection.
82
80
83
81
The status of failover slots is tracked in the [`bdr.failover_replication_slots`](/pgd/latest/reference/catalogs-visible#bdrfailover_replication_slots) table.
84
82
85
83
## CDC Failover support with Postgres 17+
86
84
87
-
For Postgres 17 and later, support for failover was added to allow standbys to be resumed, through an option in `pg_create_logical_replication_slot` named `failover`. This new setting requires that, no matter what the setting of `failover_slot_scope`, you must also set `failover` to `true`.
85
+
For Postgres 17 and later, support for failover was added to allow standbys to be resumed. Use an option in `pg_create_logical_replication_slot` named `failover` for this purpose. This new setting requires that, no matter what the setting of `failover_slot_scope`, you must also set `failover` to `true`.
The CDC Failover Slot support comes with certain limitations.
95
+
The CDC Failover Slot support comes with certain limitations:
98
96
99
-
* CDC Failover slot support requires the latest versions of PGD (5.7+) and the latest minor releases of Postgres-Extended or EPAS (available Feb 2025\).
100
-
* CDC Failover support is a global option and cannot be set on a per-slot basis. It is possible though, because changing the enabled status of CDC Failover does not affect previously provisioned slots, to enable it (set to `global`), create a replicated slot, then disable it (set to `local`), to create a singular replicated slot.
101
-
* CDC Failover support is not supported on temporary slots.
102
-
* CDC Failover support is not supported on slots created with the `failover` option set to `false`.
103
-
* CDC Failover support works with EDB Postgres Advanced Server and EDB Postgres Extended Server only. It is not supported on community Postgres installations.
104
-
* Existing slots are not automatically converted into failover slots when the option is enabled.
105
-
* While Postgres’s built-in functions such as `pg_logical_slot_get_changes()` can be used they won’t ensure that the slot is not being decoded anywhere else and can’t update replication progress accurately across the cluster. Therefore it’s recommended not to rely on the function to receive decoded changes.
97
+
* CDC Failover slot support requires the latest versions of EDB Postgres Distributed (PGD) 5.7+ and the latest minor releases of PostgresExtended or EDB Postgres Advanced Server (available Feb 2025).
98
+
* CDC Failover support is a global option and can't be set on a per-slot basis. Because changing the enabled status of CDC Failover doesn't affect previously provisioned slots, it's possible to enable it (set to `global`), create a replicated slot, then disable it (set to `local`) to create a singular replicated slot.
99
+
* CDC Failover support isn't supported on temporary slots.
100
+
* CDC Failover support isn't supported on slots created with the `failover` option set to `false`.
101
+
* CDC Failover support works with EDB Postgres Advanced Server and EDB Postgres Extended Server only. It isn't supported on community Postgres installations.
102
+
* Existing slots aren't converted into failover slots when the option is enabled.
103
+
* While Postgres’s built-in functions such as `pg_logical_slot_get_changes()` can be used, they won’t ensure that the slot isn't being decoded anywhere else and can’t update replication progress accurately across the cluster. Therefore, we recommend that you don't rely on the function to receive decoded changes.
0 commit comments