Skip to content

Implement PostgreSQL datafile delta storage and optimize overlay performance#14

Merged
vbp1 merged 11 commits intomainfrom
012-delta-storage
Dec 10, 2025
Merged

Implement PostgreSQL datafile delta storage and optimize overlay performance#14
vbp1 merged 11 commits intomainfrom
012-delta-storage

Conversation

@vbp1
Copy link
Copy Markdown
Owner

@vbp1 vbp1 commented Dec 9, 2025

Summary

This PR implements page-level delta storage for PostgreSQL datafiles, integrates it into the overlay, and tightens several performance-critical paths used by pbkfs when serving large relations (e.g. pgbench_accounts). It also removes the experimental PBKFS_MATERIALIZE_ON_READ path, which was unstable under real workloads, in favour of a single well-tested read path.

Changes

Overlay integration for PostgreSQL datafiles

  • Implement Overlay::read_datafile_delta and integrate it into read_block_data so that:
    • Datafile reads first consult delta storage (PATCH/FULL) and fall back to base layers.
    • Bitmap and per-block locks ensure consistent concurrent reads and writes.
  • Implement Overlay::write_datafile_delta to:
    • Compute v2 deltas vs base pages using compute_delta.
    • Store small changes as PATCH slots (v2 payload) and large changes as FULLREF pages.
    • Maintain bitmap state, punch holes in .full, and shrink tails where possible.
  • Add truncate_pg_datafile, rename_pg_datafile, and unlink_pg_datafile to:
    • Keep .patch/.full and sparse diff files in sync with PostgreSQL truncates, renames, and unlinks.
    • Ensure that zero-length truncation shadows base contents with a zeroed diff and removes delta artifacts.

Delta v2 implementation and tooling

  • Add v2 byte-stream delta encoding for PostgreSQL pages (DeltaDiff::PatchV2) and corresponding decoding helpers.
  • Introduce .patch / .full sparse file layout with:
    • Fixed-size PATCH slots (512 bytes) and a v2 flag.
    • FULLREF support with a separate .full file for whole-page writes.
  • Add DeltaIndex and BlockBitmap for thread-safe, per-file bitmap caching of PATCH/FULL/EMPTY slots.
  • Provide utils/analyze_pg_delta.py to:
    • Reconstruct pages from base + .patch/.full.
    • Compute diff_bytes, v1-style patch length, and v2-encoded length.
    • Validate invariants for FULLREF pages and report percentiles.

Logical length and block index optimizations

  • Introduce BlockIndexEntry and Overlay::pg_block_index to:
    • Build and cache per-file block indexes for pg_probackup page streams.
    • Sort entries by block and use binary search for page lookups.
  • Cache PostgreSQL datafile logical length per relation in BlockCacheEntry::logical_len, with:
    • Clear invalidation on writes (record_write) and truncates (truncate_pg_datafile).
    • Fast path in logical_len() for repeated calls against hot relations.
  • Ensure pg_probackup metadata (backup_content.control) is loaded once per layer and used to bound logical length and compression.

Sparse diff and truncate behaviour

  • Fix handling of sparse diff files for datafiles:
    • When a diff file has holes, read_block_data now probes for sparse regions and falls back to base layers instead of returning zeros.
    • Preserve the special case where a zero-length or single-block diff created by truncate_pg_datafile(size == 0) shadows the base relation with zeroed pages.
  • Ensure truncate_pg_datafile:
    • Removes .patch and .full when truncating to zero.
    • Leaves a minimal shadow diff file and resets cached logical length so subsequent reads see an empty relation.

Removal of PBKFS_MATERIALIZE_ON_READ

  • Remove the PBKFS_MATERIALIZE_ON_READ flag from OverlayInner and all call sites.
  • Simplify datafile read behaviour:
    • PostgreSQL datafiles are always served via block-wise reconstruction from base + delta, without full-file materialisation in the diff directory.
    • Non-data compressed files still use a single decompressed copy in diff for subsequent reads.
  • Clean up tests:
    • Drop tests that relied on the materialize-on-read=1 mode.
    • Adjust tests for pg_datafile logical length, sparse diff handling, and truncate behaviour to the simplified model.

Performance impact

On a large pgbench_accounts table (~1 GiB) backed by pg_probackup:

  • Initial COUNT(*) on the pbkfs-mounted backup dominated by:
    • Zlib decompression of pg_probackup page streams.
    • Expected memcpy / iterator overhead.
  • Repeated COUNT(*) runs benefit from:
    • Cached block indexes for pg_probackup streams (no rescan per page).
    • Cached logical length per relation (no repeated multi-layer scans).
  • Delta v2 keeps patch payloads compact:
    • Typical v2-encoded lengths sit well below the 504-byte slot budget for changed pages.
    • FULLREF pages are only used when v2 payloads genuinely exceed slot capacity, as confirmed by analyze_pg_delta.

vbp1 added 11 commits December 2, 2025 12:41
- Remove unused `read_incremental_block()` from overlay.rs (~65 lines)
- Remove unused `writable` field from OpenHandle in fuse.rs
- Move `has_pending()` to #[cfg(test)] as it's only used in tests
- Add `server_version` field to BackupMetadata
- Add POSTGRES_VERSION_MIN/MAX constants (14-18)
- Add UnsupportedPostgresVersion error variant
- Validate PostgreSQL version during backup store validation
- Remove unused `uncompressed_bytes`, `wal_bytes` fields from ShowBackupJson
- Fix chain_resolution_tests to include server_version field
@vbp1 vbp1 merged commit c5ad7d6 into main Dec 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant