EnterpriseDB · eatonphil · Aug 19, 2025 · Aug 19, 2025 · Aug 20, 2025 · Aug 20, 2025
@@ -11,18 +11,100 @@ is to clone new nodes directly from the remaining nodes.
 The role of backup and recovery in PGD is to provide for disaster
 recovery (DR), such as in the following situations:
 
--   Loss of all nodes in the cluster
--   Significant, uncorrectable data corruption across multiple nodes
+- Loss of all nodes in the cluster
+- Significant, uncorrectable data corruption across multiple nodes
     as a result of data corruption, application error, or
     security breach
 
-## Backup
-
-### pg_dump
+## Logical backup and restore
 
 You can use pg_dump, sometimes referred to as *logical backup*,
 normally with PGD.
 
+### Temporary postgresql.conf settings
+
+First, temporarily set the following settings in `postgresql.conf`:
+
+```
+# Increase from the default of `1GB` to something large, but still a
+# fraction of your disk space since the non-WAL data must also fit.
+# This decreases the frequency of checkpoints.
+max_wal_size = 100GB
+
+# Increase the amount of memory for building indexes. Default is
+# 64MB. For example, 1GB assuming 128GB total RAM.
+maintenance_work_mem = 1GB
+
+# Increase the receiver and sender timeout from 1 minute to 1hr to
+# allow large transactions through.
+wal_receiver_timeout = 1h
+wal_sender_timeout = 1h
+
+# Increase the number of writers to make better use of parallel
+# apply. Default is 2. Make sure this isn't overriden lower by the
+# node group config num_writers setting.
+bdr.writers_per_subscription = 5
+
+# Increase Raft-related election timeouts with default values of 6s
+# and 3s.
+bdr.raft_global_election_timeout = 20s
+bdr.raft_group_election_timeout = 10s
+
+# Increase the size of the shared memory queue used by the receiver to
+# send data to the writer process from the default 1MB.
+bdr.writer_input_queue_size = 32MB
+```
+
+Additionally:
+
+- Make sure the default bdr.streaming_mode = 'auto' is not overridden so that transactions are streamed.
+- Make sure any session or postgresql.conf settings listed above are not overriden by node group-level settings in general.
+
+Now continue with pg_dump and pg_restore.
+
+### pg_dump / pg_restore
+
+In order to reduce the risk of global lock timeouts, we recommend
+dumping pre-data, data, and post-data separately. For example:
+
+```console
+pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=pre-data -Fc -f pgd-pre-data.dump
+pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=data -Fc -f pgd-data.dump
+pg_dump -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -v --exclude-schema='"bdr"' --exclude-extension='"bdr"' --section=post-data -Fc -f pgd-post-data.dump
+```
+
+And restore by directly executing these SQL files on a node (do not
+run these on the connection manager port):
+
+```console
+PGOPTIONS="-cbdr.commit_scope=local" pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=pre-data -f pgd-pre-data.dump
+PGOPTIONS="-cbdr.commit_scope=local" pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=data -f pgd-data.dump
+psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)'
+PGOPTIONS="-cbdr.commit_scope=local" pg_restore -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB --section=post-data -f pgd-post-data.dump
+psql -h $PG_HOST -p $PG_PORT -U $PG_USER -d $PGD_DB -c 'SELECT bdr.wait_slot_confirm_lsn(NULL, NULL)'
+```
+
+After which point the dump will be restored on all nodes in the cluster.
+
+In contrast if you do not split sections out with a naive pg_dump and
+pg_restore, the restore will likely fail with a global lock timeout.
+
+If you still get global lock timeouts with pg_restore,
+add `-cbdr.ddl_locking=off` to `PGOPTIONS`.
+
+If you choose to run pg_restore with `-j`/`--jobs` you will need to
+increase `max_worker_processes` and `max_parallel_maintenance_workers`
+by the same amount.
+
+### Prefer restoring to a single node cluster
+
+Especially when initially setting up a cluster from a Postgres dump,
+we recommend you restore to a cluster with a single PGD node. Then run
+`pgd node setup` for each node you want in the cluster which will do a
+physical join that uses `bdr_init_physical` under the hood.
+
+### Sequences
+
 pg_dump dumps both local and global sequences as if
 they were local sequences. This behavior is intentional, to allow a PGD
 schema to be dumped and ported to other PostgreSQL databases.
@@ -46,7 +128,7 @@ dump only with `bdr.crdt_raw_value = on`.
 Technical Support recommends the use of physical backup techniques for
 backup and recovery of PGD.
 
-### Physical backup
+## Physical backup and restore
 
 You can take physical backups of a node in an EDB Postgres Distributed cluster using
 standard PostgreSQL software, such as
@@ -59,25 +141,110 @@ PostgreSQL node running the BDR extension.
 Consider these specific points when applying
 PostgreSQL backup techniques to PGD:
 
--   PGD operates at the level of a single database, while a physical
+- PGD operates at the level of a single database, while a physical
     backup includes all the databases in the instance. Plan
     your databases to allow them to be easily backed up and restored.
 
--   Backups make a copy of just one node. In the simplest case,
+- Backups make a copy of just one node. In the simplest case,
     every node has a copy of all data, so you need to back up only
     one node to capture all data. However, the goal of PGD isn't
     met if the site containing that single copy goes down, so the
     minimum is at least one node backup per site (with
     many copies, and so on).
 
--   However, each node might have unreplicated local data, or the
+- However, each node might have unreplicated local data, or the
     definition of replication sets might be complex so that all nodes don't
     subscribe to all replication sets. In these cases, backup
     planning must also include plans for how to back up any unreplicated
     local data and a backup of at least one node that subscribes to each
     replication set.
 
-### Eventual consistency
+### Restore
+
+While you can take a physical backup with the same procedure as a
+standard PostgreSQL node, it's slightly more complex to
+restore the physical backup of a PGD node.
+
+#### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup
+
+The most common use case for restoring a physical backup involves the failure
+or replacement of all the PGD nodes in a cluster, for instance in the event of
+a data center failure.
+
+You might also want to perform this procedure to clone the current contents of a
+EDB Postgres Distributed cluster to seed a QA or development instance.
+
+In that case, you can restore PGD capabilities based on a physical backup
+of a single PGD node, optionally plus WAL archives:
+
+-   If you still have some PGD nodes live and running, fence off the host you
+    restored the PGD node to, so it can't connect to any surviving PGD nodes.
+    This practice ensures that the new node doesn't confuse the existing cluster.
+-   Restore a single PostgreSQL node from a physical backup of one of
+    the PGD nodes.
+-   If you have WAL archives associated with the backup, create a suitable
+    `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest
+    state. You can specify an alternative `recovery_target` here if needed.
+-   Start the restored node, or promote it to read/write if it was in standby
+    recovery. Keep it fenced from any surviving nodes!
+-   Clean up any leftover PGD metadata that was included in the physical backup.
+-   Fully stop and restart the PostgreSQL instance.
+-   Add further PGD nodes with the standard procedure based on the
+    `bdr.join_node_group()` function call.
+
+#### Cleanup of PGD metadata
+
+To clean up leftover PGD metadata:
+
+1.  Drop the PGD node using [`bdr.drop_node`](/pgd/5.6/reference/functions-internal#bdrdrop_node).
+2.  Fully stop and restart PostgreSQL (important!).
+
+#### Cleanup of replication origins
+
+You must explicitly remove replication origins with a separate step
+because they're recorded persistently in a system catalog. They're
+therefore included in the backup and in the restored instance. They
+aren't removed automatically when dropping the BDR extension because
+they aren't explicitly recorded as its dependencies.
+
+To track progress of incoming replication in a crash-safe way, 
+PGD creates one replication origin for each remote master node. Therefore, 
+for each node in the previous cluster run this once:
+
+```
+SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename');
+```
+
+You can list replication origins as follows:
+
+```
+SELECT * FROM pg_replication_origin;
+```
+
+Those created by PGD are easily recognized by their name.
+
+#### Cleanup of replication slots
+
+If a physical backup was created with `pg_basebackup`, replication slots
+are omitted from the backup.
+
+Some other backup methods might preserve replications slots, likely in
+outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots:
+
+```
+SELECT pg_drop_replication_slot(slot_name)
+FROM pg_replication_slots;
+```
+
+If you have a reason to preserve some slots,
+you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely
+useful.
+
+!!! Warning
+    Never use these commands to drop replication slots on a live PGD node
+
+
+## Eventual consistency
 
 The nodes in an EDB Postgres Distributed cluster are *eventually consistent* but not
 *entirely consistent*. A physical backup of a given node provides
@@ -125,7 +292,7 @@ replication origin.
 
 With PostgreSQL PITR, you can use the standard syntax:
 
-```
+```text
 recovery_target_time = T1
 ```
 
@@ -164,21 +331,21 @@ by `T1`, even though they weren't applied on `N1` until later.
 To request multi-origin PITR, use the standard syntax in
 the `postgresql.conf` file:
 
-```
+```text
 recovery_target_time = T1
 ```
 
 You need to specify the list of replication origins that are restored to `T1` in one of two ways. 
 You can use a separate `multi_recovery.conf` file by way of
 a new parameter, `recovery_target_origins`:
 
-```
+```text
 recovery_target_origins = '*'
 ```
 
 Or you can specify the origin subset as a list in `recovery_target_origins`:
 
-```
+```text
 recovery_target_origins = '1,3'
 ```
 
@@ -196,87 +363,51 @@ of changes arriving from a single master in COMMIT order.
     This feature is available only with EDB Postgres Extended.
     Barman doesn't create a `multi_recovery.conf` file.
 
-## Restore
-
-While you can take a physical backup with the same procedure as a
-standard PostgreSQL node, it's slightly more complex to
-restore the physical backup of a PGD node.
-
-### EDB Postgres Distributed cluster failure or seeding a new cluster from a backup
-
-The most common use case for restoring a physical backup involves the failure
-or replacement of all the PGD nodes in a cluster, for instance in the event of
-a data center failure.
-
-You might also want to perform this procedure to clone the current contents of a
-EDB Postgres Distributed cluster to seed a QA or development instance.
-
-In that case, you can restore PGD capabilities based on a physical backup
-of a single PGD node, optionally plus WAL archives:
-
--   If you still have some PGD nodes live and running, fence off the host you
-    restored the PGD node to, so it can't connect to any surviving PGD nodes.
-    This practice ensures that the new node doesn't confuse the existing cluster.
--   Restore a single PostgreSQL node from a physical backup of one of
-    the PGD nodes.
--   If you have WAL archives associated with the backup, create a suitable
-    `postgresql.conf`, and start PostgreSQL in recovery to replay up to the latest
-    state. You can specify an alternative `recovery_target` here if needed.
--   Start the restored node, or promote it to read/write if it was in standby
-    recovery. Keep it fenced from any surviving nodes!
--   Clean up any leftover PGD metadata that was included in the physical backup.
--   Fully stop and restart the PostgreSQL instance.
--   Add further PGD nodes with the standard procedure based on the
-    `bdr.join_node_group()` function call.
-
-#### Cleanup of PGD metadata
-
-To clean up leftover PGD metadata:
-
-1.  Drop the PGD node using [`bdr.drop_node`](/pgd/5.6/reference/functions-internal#bdrdrop_node).
-2.  Fully stop and restart PostgreSQL (important!).
+## Monitoring
 
-#### Cleanup of replication origins
+Use the following queries to check on the progress of the restore process.
 
-You must explicitly remove replication origins with a separate step
-because they're recorded persistently in a system catalog. They're
-therefore included in the backup and in the restored instance. They
-aren't removed automatically when dropping the BDR extension because
-they aren't explicitly recorded as its dependencies.
-
-To track progress of incoming replication in a crash-safe way, 
-PGD creates one replication origin for each remote master node. Therefore, 
-for each node in the previous cluster run this once:
-
-```
-SELECT pg_replication_origin_drop('bdr_dbname_grpname_nodename');
+```sql
+SELECT pg_size_pretty(pg_database_size('bdrdb'));
 ```
 
-You can list replication origins as follows:
+The above query shows the database size on the restoring node. The
+size should grow as the restore makes progress and approaches the size
+of the original node. However, due to bloat, logical restores are
+always a little smaller than the original.
 
+```sql
+SELECT * FROM bdr.node_replication_rates;
 ```
-SELECT * FROM pg_replication_origin;
-```
-
-Those created by PGD are easily recognized by their name.
 
-#### Cleanup of replication slots
-
-If a physical backup was created with `pg_basebackup`, replication slots
-are omitted from the backup.
+The above query shows the rate of replication. However, the progress
+info can be misleading for big transactions; lag and progress will
+appear to stair-step.
 
-Some other backup methods might preserve replications slots, likely in
-outdated or invalid states. Once you restore the backup, use these commands to drop all replication slots:
-
-```
-SELECT pg_drop_replication_slot(slot_name)
-FROM pg_replication_slots;
+```sql
+SELECT
+  application_name,
+  state,
+  wait_event_type,
+  wait_event,
+  now() - state_change AS state_change_ago
+FROM
+  pg_stat_activity
+WHERE
+  application_name LIKE '%pg_restore%';
 ```
 
-If you have a reason to preserve some slots,
-you can add a `WHERE slot_name LIKE 'bdr%'` clause, but this is rarely
-useful.
+The above query shows information on what pg_restore is doing, if it's
+blocked/waiting (on what is waiting) or working, and changing its
+status continuously.
 
-!!! Warning
-    Never use these commands to drop replication slots on a live PGD node
+Check the following views to see issues with replication slots,
+accumulated lag, broken replication, etc.
+
+- pg_catalog.pg_stat_replication_slots
+- pg_catalog.pg_replication_slots
+- bdr.node_slots
 
+And use `bdr.stat_subscription` to see statistics for each
+subscription, for example to check on parallel apply or transaction
+stream.