Skip to content

Commit f3d8c02

Browse files
authored
Merge pull request #2095 from EnterpriseDB/content/bdr/3.7.13/import
Import BDR 3.7.13
2 parents d92daa4 + 8245f1c commit f3d8c02

16 files changed

+196
-229
lines changed

product_docs/docs/bdr/3.7/backup.mdx

Lines changed: 2 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -236,15 +236,8 @@ of a single BDR node, optionally plus WAL archives:
236236

237237
The cleaning of leftover BDR metadata is achieved as follows:
238238

239-
1. Drop the `bdr` extension with `CASCADE`.
240-
2. Drop all the replication origins previously created by BDR.
241-
3. Drop any replication slots left over from BDR.
242-
4. Fully stop and re-start PostgreSQL (important!).
243-
5. Create the `bdr` extension.
244-
245-
The `DROP EXTENSION`/`CREATE EXTENSION` cycle guarantees that all the
246-
BDR metadata from the previous cluster is removed, and that the node
247-
can be used to grow a new BDR cluster from scratch.
239+
1. Drop the BDR node using `bdr.drop_node`
240+
2. Fully stop and re-start PostgreSQL (important!).
248241

249242
#### Cleanup of Replication Origins
250243

product_docs/docs/bdr/3.7/catalogs.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -936,7 +936,7 @@ only one node and processed on different nodes.
936936
| Column | Type | Description |
937937
| ------------------ | ------ | ------------------------------------------------------------------------------------------------------------------------ |
938938
| ap_wq_workid | bigint | The Unique ID of the work item |
939-
| ap_wq_ruleid | int | ID of the rule listed in autopartition_rules. Rules are specified using bdr.autoscale/autopartition commands |
939+
| ap_wq_ruleid | int | ID of the rule listed in autopartition_rules. Rules are specified using bdr.autopartition command |
940940
| ap_wq_relname | name | Name of the relation being autopartitioned |
941941
| ap_wq_relnamespace | name | Name of the tablespace specified in rule for this work item. |
942942
| ap_wq_partname | name | Name of the partition created by the workitem |
@@ -970,7 +970,7 @@ items, independent of other nodes in the cluster.
970970
| Column | Type | Description |
971971
| ------------------ | ------ | ------------------------------------------------------------------------------------------------------------------------ |
972972
| ap_wq_workid | bigint | The Unique ID of the work item |
973-
| ap_wq_ruleid | int | ID of the rule listed in autopartition_rules. Rules are specified using bdr.autoscale/autopartition commands |
973+
| ap_wq_ruleid | int | ID of the rule listed in autopartition_rules. Rules are specified using bdr.autopartition command |
974974
| ap_wq_relname | name | Name of the relation being autopartitioned |
975975
| ap_wq_relnamespace | name | Name of the tablespace specified in rule for this work item. |
976976
| ap_wq_partname | name | Name of the partition created by the workitem |

product_docs/docs/bdr/3.7/ddl.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -996,8 +996,8 @@ nodes should have applied the `ALTER TABLE .. ADD CONSTRAINT ... NOT VALID`
996996
command and made enough progress. BDR will wait for a consistent
997997
state to be reached before validating the constraint.
998998

999-
Note that the new facility requires the cluster to run with RAFT protocol
1000-
version 24 and beyond. If the RAFT protocol is not yet upgraded, the old
999+
Note that the new facility requires the cluster to run with Raft protocol
1000+
version 24 and beyond. If the Raft protocol is not yet upgraded, the old
10011001
mechanism will be used, resulting in a DML lock request.
10021002

10031003
!!! Note

product_docs/docs/bdr/3.7/durability.mdx

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,24 @@ can all be implemented individually:
2020
eventually be applied on all nodes without further conflicts, or get
2121
an abort directly informing the client of an error.
2222

23-
PGLogical (PGL) integrates with the `synchronous_commit` option of
23+
BDR integrates with the `synchronous_commit` option of
2424
Postgres itself, providing a variant of synchronous replication,
25-
which can be used between BDR nodes. In addition, BDR offers
26-
[Eager All-Node Replication](eager) and
27-
[Commit At Most Once](camo).
25+
which can be used between BDR nodes. BDR also offers two additional
26+
replication modes:
27+
28+
- Commit At Most Once (CAMO). This feature solves the problem with knowing
29+
whether your transaction has COMMITed (and replicated) or not in case of
30+
certain errors during COMMIT. Normally, it might be hard to know whether
31+
or not the COMMIT was processed in. With this feature, your application can
32+
find out what happened, even if your new database connection is to node
33+
than your previous connection. For more info about this feature see the
34+
[Commit At Most Once](camo) chapter.
35+
- Eager Replication. This is an optional feature to avoid replication
36+
conflicts. Every transaction is applied on *all nodes* simultaneously,
37+
and commits only if no replication conflicts are detected. This feature does
38+
reduce performance, but provides very strong consistency guarantees.
39+
For more info about this feature see the [Eager All-Node Replication](eager)
40+
chapter.
2841

2942
Postgres itself provides [Physical Streaming
3043
Replication](https://www.postgresql.org/docs/11/warm-standby.html#SYNCHRONOUS-REPLICATION)

product_docs/docs/bdr/3.7/functions.mdx

Lines changed: 0 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -44,28 +44,6 @@ value:
4444
MAJOR_VERSION * 10000 + MINOR_VERSION * 100 + PATCH_RELEASE
4545
```
4646

47-
### bdr.wal_sender_stats
48-
49-
If the [Decoding Worker](nodes#decoding-worker) is enabled, this
50-
view shows information about the decoder slot and current LCR
51-
(`Logical Change Record`) segment file being read by each WAL sender.
52-
53-
#### Synopsis
54-
55-
```sql
56-
bdr.wal_sender_stats() → setof record (pid integer, is_using_lcr boolean, decoder_slot_name TEXT, lcr_file_name TEXT)
57-
```
58-
59-
#### Output columns
60-
61-
- `pid` - PID of the WAL sender (corresponds to `pg_stat_replication`'s `pid` column)
62-
63-
- `is_using_lcr` - Whether the WAL sender is sending LCR files. The next columns will be `NULL` if `is_using_lcr` is `FALSE`.
64-
65-
- `decoder_slot_name` - The name of the decoder replication slot.
66-
67-
- `lcr_file_name` - The name of the current LCR file.
68-
6947
## System and Progress Information Parameters
7048

7149
BDR exposes some parameters that can be queried via `SHOW` in `psql`

product_docs/docs/bdr/3.7/index.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,3 +114,5 @@ Some features are only available on particular versions of Postgres server.
114114

115115
Features that are currently available only with EDB Postgres Extended are
116116
expected to be available with EDB Postgres Advanced 14.
117+
118+
This documentation is for the Enterprise Edition of BDR3.

product_docs/docs/bdr/3.7/known-issues.mdx

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -55,11 +55,10 @@ unique identifier.
5555

5656
- Decoding Worker works only with the default replication sets
5757

58-
- When Decoding Worker is enabled in BDR node group and a BDR node is shutdown
58+
- When Decoding Worker is enabled in BDR node group and a BDR node is shutdown
5959
in fast mode immediately after starting it, the shutdown may not complete
6060
because WAL sender does not exit. This happens because WAL sender waits for
61-
the Decoding Worker process to start, but it may never start since the node is
61+
WAL decoder to start and WAL decoder may never start since the node is
6262
shutting down. The situation can be worked around by using an immediate
63-
shutdown or waiting for the Decoding Worker to start. The Decoding Worker
64-
process is
63+
shutdown or waiting for WAL decoder to start. The WAL decoder process is
6564
reported in `pglogical.workers` as well as `pg_stat_activity` catalogs.

product_docs/docs/bdr/3.7/monitoring.mdx

Lines changed: 29 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -293,6 +293,17 @@ If `is_using_lcr` is `FALSE`, `decoder_slot_name`/`lcr_file_name` will be `NULL`
293293
This will be the case if the Decoding Worker is not enabled, or the WAL sender is
294294
serving a [logical standby]\(nodes.md#Logical Standby Nodes).
295295

296+
Additionally, information about the Decoding Worker can be monitored via the function
297+
[bdr.get_decoding_worker_stat](functions#bdr_get_decoding_worker_stat), e.g.:
298+
299+
```
300+
postgres=# SELECT * FROM bdr.get_decoding_worker_stat();
301+
pid | decoded_upto_lsn | waiting | waiting_for_lsn
302+
---------+------------------+---------+-----------------
303+
1153091 | 0/1E5EEE8 | t | 0/1E5EF00
304+
(1 row)
305+
```
306+
296307
## Monitoring BDR Replication Workers
297308

298309
All BDR workers show up in the system view `bdr.stat_activity`,
@@ -701,30 +712,19 @@ Peer replication slots should be active on all nodes at all times.
701712
If a peer replication slot is not active, then it might mean:
702713

703714
- The corresponding peer is shutdown or not accessible; or
704-
- BDR replication is broken. Grep the log file for `ERROR` or
705-
`FATAL` and also check `bdr.worker_errors` on all nodes.
706-
The root cause might be, for example, an incompatible DDL was
707-
executed with DDL replication disabled on one of the nodes.
715+
- BDR replication is broken.
708716

709-
The BDR group replication slot, on the other hand, is inactive most
710-
of the time. BDR keeps this slot and advances LSN, as all other peers
711-
have already consumed the corresponding transactions. So it is not
712-
possible to monitor the status (active or inactive) of the group slot.
717+
Grep the log file for `ERROR` or `FATAL` and also check `bdr.worker_errors` on
718+
all nodes. The root cause might be, for example, an incompatible DDL was
719+
executed with DDL replication disabled on one of the nodes.
713720

714-
We recommend the following monitoring alert levels:
721+
The BDR group replication slot is however inactive most of the time. BDR
722+
maintains this slot and advances its LSN when all other peers have already
723+
consumed the corresponding transactions. Consequently it is not necessary to
724+
monitor the status of the group slot.
715725

716-
- status=UNKNOWN, message=This node is not part of any BDR group
717-
- status=OK, message=All BDR replication slots are working correctly
718-
- status=CRITICAL, message=There is at least 1 BDR replication
719-
slot which is inactive
720-
- status=CRITICAL, message=There is at least 1 BDR replication
721-
slot which is missing
722-
723-
The described behavior is implemented in the function
724-
`bdr.monitor_local_replslots()`, which uses replication slot status
725-
information returned from view `bdr.node_slots` (slot active or
726-
inactive) to provide a local check considering all BDR node replication
727-
slots, except the BDR group slot.
726+
The function `bdr.monitor_local_replslots()` provides a summary of whether all
727+
BDR node replication slots are working as expected, e.g.:
728728

729729
```sql
730730
bdrdb=# SELECT * FROM bdr.monitor_local_replslots();
@@ -733,6 +733,14 @@ bdrdb=# SELECT * FROM bdr.monitor_local_replslots();
733733
OK | All BDR replication slots are working correctly
734734
```
735735

736+
One of the following status summaries will be returned:
737+
738+
- `UNKNOWN`: `This node is not part of any BDR group`
739+
- `OK`: `All BDR replication slots are working correctly`
740+
- `OK`: `This node is part of a subscriber-only group`
741+
- `CRITICAL`: `There is at least 1 BDR replication slot which is inactive`
742+
- `CRITICAL`: `There is at least 1 BDR replication slot which is missing`
743+
736744
## Monitoring Transaction COMMITs
737745

738746
By default, BDR transactions commit only on the local node. In that case,

product_docs/docs/bdr/3.7/nodes.mdx

Lines changed: 10 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,6 @@ The `bdr_init_physical` utility replaces the functionality of the
4747
`bdr_init_copy` utility from BDR1 and BDR2. It is the BDR3 equivalent of the
4848
pglogical `pglogical_create_subscriber` utility.
4949

50-
!!! Warning
51-
Only one node at the time should join the BDR node group, or be
52-
parted from it. If a new node is being joined while there is
53-
another join or part operation in progress, the new node will
54-
sometimes not have consistent data after the join has finished.
55-
5650
When a new BDR node is joined to an existing BDR group or a node is subscribed
5751
to an upstream peer, before replication can begin, the system must copy the
5852
existing data from the peer node(s) to the local node. This copy must be
@@ -87,10 +81,12 @@ performs data sync doing `COPY` operations and will use multiple writers
8781
(parallel apply) if those are enabled.
8882

8983
Node join can execute concurrently with other node joins for the majority of
90-
the time taken to join. Only one regular node at a time can be in either of
91-
the states PROMOTE or PROMOTING, which are typically fairly short.
92-
The subscriber-only nodes are an exception to this rule, and they can be
93-
cocurrently in PROMOTE and PROMOTING states as well.
84+
the time taken to join. However, only one regular node at a time can be in
85+
either of the states PROMOTE or PROMOTING, which are typically fairly short if
86+
all other nodes are up and running, otherwise the join will get serialized at
87+
this stage. The subscriber-only nodes are an exception to this rule, and they
88+
can be concurrently in PROMOTE and PROMOTING states as well, so their join
89+
process is fully concurrent.
9490

9591
Note that the join process uses only one node as the source, so can be
9692
executed when nodes are down, if a majority of nodes are available.
@@ -871,43 +867,6 @@ as `STANDBY`.
871867

872868
Only one node at a time can be in either of the states PROMOTE or PROMOTING.
873869

874-
## Managing Shard Groups
875-
876-
BDR clusters may contain an array of Shard Groups for the AutoScale feature.
877-
These are shown as a sub-node group that is composed of an array of
878-
sub-sub node groups known as Shard Groups.
879-
880-
Operations that can be performed on the Shard Group are:
881-
882-
- Create Shard Array
883-
- Drop Shard Array
884-
- Repair - add new nodes to replace failed nodes
885-
- Expand - add new Shard Groups
886-
- Re-Balance - re-distribute data across Shard Groups
887-
888-
### Create/Drop
889-
890-
### Expand
891-
892-
e.g. expand from 4 Shard Groups to 8 Shard Groups
893-
894-
This operation can occur without interfering with user operations.
895-
896-
### Re-Balance
897-
898-
e.g. move data from where it was in a 4-node array to how it would be ideally
899-
placed in an 8-node array.
900-
901-
Some portion of the data is moved from one Shard Group to another,
902-
so this action can take an extended period, depending upon how
903-
much data is to be moved. The data is moved one partition at a
904-
time, so is restartable without too much wasted effort.
905-
906-
Note that re-balancing is optional.
907-
908-
This operation can occur without interfering with user operations,
909-
even when this includes write transactions.
910-
911870
## Node Management Interfaces
912871

913872
Nodes can be added and removed dynamically using the SQL interfaces.
@@ -995,7 +954,7 @@ This function creates a BDR group with the local node as the only member of the
995954

996955
```sql
997956
bdr.create_node_group(node_group_name text,
998-
parent_group_name text,
957+
parent_group_name text DEFAULT NULL,
999958
join_node_group boolean DEFAULT true,
1000959
node_group_type text DEFAULT NULL)
1001960
```
@@ -1017,9 +976,8 @@ bdr.create_node_group(node_group_name text,
1017976
changes to other nodes. See [Subscriber-Only Nodes] for more details.
1018977
Datanode implies that the group represents a shard, whereas the other
1019978
values imply that the group represents respective coordinators.
1020-
Except 'subscriber-only', the rest three values are reserved for use
1021-
with a separate extension called autoscale. NULL implies a normal
1022-
general purpose node group will be created.
979+
Except 'subscriber-only', the rest three values are reserved for future use.
980+
NULL implies a normal general purpose node group will be created.
1023981

1024982
#### Notes
1025983

@@ -1425,6 +1383,7 @@ bdr_init_physical [OPTION] ...
14251383

14261384
- `--hba-conf -path` to the new pg_hba.conf
14271385
- `--postgresql-conf` - path to the new postgresql.conf
1386+
- `--postgresql-auto-conf` - path to the new postgresql.auto.conf
14281387

14291388
#### Notes
14301389

0 commit comments

Comments
 (0)