Conversation
This commit addresses the "read only 0 of 8192 bytes" PostgreSQL error that occurred when reading unmaterialized blocks from sparse diff files. Root cause: When a pg_datafile was opened with write intent, a file handle was created pointing to the diff file. Subsequent reads through this handle would return zeros for blocks that existed in backup but hadn't been written to diff yet (sparse file holes). Key changes: 1. Never create file handles for pg_datafile - For pg_datafile, reads always go through overlay.read_range() - This ensures proper block materialization from backup layers - Writes still work correctly via on-demand file opening in worker 2. Remove attr_cache entirely - The cache was invalidated BEFORE async writes completed - Race condition: concurrent getattr could cache stale file sizes - Minor performance impact (~1-5μs per stat) not worth correctness risk 3. Use uncached logical_len for pg_datafile - logical_len_for() now calls logical_len() directly for datafiles - Prevents stale cached sizes from truncating reads 4. Add FIFO ordering for write operations (pending.rs) - Sequence-based barrier mechanism ensures operation ordering - READs see all preceding WRITEs, don't wait for concurrent ones - Uses dashmap + parking_lot for efficient per-inode synchronization 5. Parse n_blocks/full_size from backup_content.control - Enables accurate logical length computation for incremental backups - Fixes size reporting for relations with page-level backup streams Tests: - sparse_diff_reads_unmaterialized_blocks_from_backup: regression test that verifies reading unmaterialized blocks returns backup data
After PostgreSQL extends a file via writes, getattr must return the actual file size, not a stale cached value. The previous fix addressed sparse diff reads but still used cached diff_len in datafile_logical_len() and cached logical_len for non-datafile paths. Changes: - Always read diff file size from disk via fs::metadata() in logical_len() - Remove unused caching infrastructure: cached_logical_len(), cached_diff_len(), update_cached_diff_len() functions - Remove logical_len and diff_len fields from BlockCacheEntry - Simplify record_write() to only track materialized blocks The remaining cache tracks: - materialized: which blocks are already copied to diff - sources: which layer each block came from (for fallback/debugging) - pg_block_index: pg_probackup page stream indexes (static, built once) - file_meta: backup_content.control metadata (static, from backup) These are safe to cache as they don't change after writes.
Triggered on version tags (v*), builds a fully static binary using musl: - Uses Rust 1.80 with x86_64-unknown-linux-musl target - Strips binary for minimal size (~3.3MB) - Creates GitHub Release with binary and SHA256 checksums - Auto-generates release notes from commits Usage: git tag v0.1.0 && git push origin v0.1.0
- Remove unnecessary let binding in pending.rs test (let_and_return) - Add explicit truncate(true) to OpenOptions in overlay test (suspicious_open_options) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Add forget() FUSE method that calls pending_ops.remove(ino) - Clean up per-inode state when kernel forgets about an inode - Add inode_count() method for testing - Add tests for remove() and memory cleanup behavior This prevents DashMap entries from accumulating during long mount sessions with many temporary files.
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Handle-Based I/O, FIFO operation ordering, and
materialize_on_readpolicy.pread/pwriteI/O to reduce repeated file opensPendingOps) for FIFO write orderingmaterialize_on_readpolicy to keep diff small for read-heavy workloadsforget()to prevent accumulation during long sessionsKey Changes
1. Handle-Based I/O (
fuse.rs)open()creates persistent file handles stored inhandlesHashMapread()/write()usepread/pwritevia handles instead of repeatedfs::read/fs::writeoverlay.read_range()to correctly handle sparse diff files and block-level materialization2. FIFO Operation Ordering (
pending.rs)PendingOpsmodule with sequence-based barrier mechanismDashMap+parking_lot::Condvarfor efficient per-inode synchronizationwait_barrier()inopen/read/getattrensures consistencywait_for_preceding()in write workers ensures FIFO execution order3. Materialize-on-Read Policy (
overlay.rs)PBKFS_MATERIALIZE_ON_READenv var controls block materialization behaviorfalse: reads from pg_datafile do NOT copy blocks to difftrue: eager materialization (legacy behavior, useful for debugging)read_range_nonmaterializing()path for non-materializing reads4. Correctness Fixes
attr_cache— was invalidated BEFORE async writes completed, causing stale sizeslogical_len/diff_lencaching fromBlockCacheEntry— file sizes always read freshn_blocks/full_sizefrombackup_content.controlfor accurate incremental backup lengthsforget()to clean upPendingOpsstate when kernel forgets inode5. Metrics & Observability (
logging/mod.rs)OverlayIoSnapshotstruct for cache/handle metricslog_overlay_io_metrics()emits periodic snapshotshandle_hits/handle_missesandcache_hits/cache_missesFiles Changed
src/fs/pending.rs— NEW: FIFO ordering mechanismsrc/fs/fuse.rs— Handle management, barrier integration,forget()src/fs/overlay.rs—materialize_on_readpolicy, remove size cachingsrc/fs/mod.rs— Exportpendingmodulesrc/cli/mount.rs— Debug logging for layer ordersrc/logging/mod.rs— New metrics typesCargo.toml— Adddashmap,parking_lotdependenciesPost-Implementation Notes
fs::metadata()prevents "unexpected data beyond EOF"overlay.read_range()for correct block sourcing