-
Notifications
You must be signed in to change notification settings - Fork 44
Use private DSA for index builds to eliminate memory leaks #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
49ddcfa to
a0a8b60
Compare
tjgreen42
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test where the new code is exercised in a situation where the memtable already contains data? How is test coverage on new code generally (since it only kicks in for bulk-loads)?
|
Regarding test coverage for the new build mode code: Existing tests that exercise the new code paths:
For the memtable-already-contains-data case: When The key scenario is:
All the scoring tests (scoring1-6) verify that scores are correctly computed after this transition, validating that the data was properly spilled and the new memtable is functional. |
11e84c2 to
1740c2b
Compare
During CREATE INDEX, use a private DSA (Dynamic Shared Memory Area) that can be completely destroyed and recreated on each memtable spill. This provides perfect memory reclamation during large index builds. Key changes: - Add tp_create_build_index_state() for build mode with private DSA - Add tp_recreate_build_dsa() to destroy/recreate DSA on spills - Add tp_finalize_build_mode() to transition to runtime mode after build - Final spill at end of tp_build() persists memtable data to disk - tp_clear_memtable() uses DSA destruction in build mode The private DSA is only used during index builds. After build completes, the code transitions to using the global shared DSA for normal runtime operation with concurrent access.
The <@> operator now applies the same fieldnorm quantization as segments use for storing document lengths. This ensures that the operator produces identical BM25 scores to index scans, regardless of whether the data is in the memtable or segments. Previously, when data was in segments: - Index scans used quantized fieldnorm (Lucene's SmallFloat encoding) - Operator evaluation recomputed exact doc length from text This caused score mismatches between index scan NOTICEs and operator output columns. Now both paths use quantized lengths consistently. Co-Authored-By: Claude Opus 4.5 <[email protected]>
The test (18) job (PG18) has been hanging indefinitely while test (17) completes normally. Add: - 10-minute timeout to prevent indefinite hangs - Verbose PostgreSQL logging (log_min_messages = LOG) - Display server logs on failure to help debug the issue Tests pass locally on both PG17 and PG18.
When creating a local index state for an existing shared state (runtime path), the is_build_mode field was not initialized. This caused undefined behavior when tp_clear_memtable checked this flag during spill operations. If the uninitialized memory happened to contain a non-zero value, the code would incorrectly enter the build mode path and call dsa_detach on the global shared DSA, disconnecting the backend from shared memory and causing subsequent operations to hang. This bug was non-deterministic and only manifested when the uninitialized memory happened to contain garbage that evaluated to true. It was reliably reproduced in CI on PG18 during the segment.sh Test 3 which performs multiple spills in a loop. Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Remove PG_TRY/PG_CATCH for ordinary control flow in scan.c - Change warning messages to proper ereport ERROR - Remove verbose CI logging (keep timeout as safety measure) - Change LOG messages to DEBUG1 for build mode diagnostics Co-Authored-By: Claude Opus 4.5 <[email protected]>
The index scoring uses Lucene's SmallFloat encoding (fieldnorm) to compress document lengths into single bytes. This causes small differences between pure BM25 formula and actual index scores. Update validation SQL to match by: - Adding fieldnorm_quantize() function with full 256-value decode table - Using quantized doc lengths in validate_bm25_scoring() - Using quantized doc lengths in debug_bm25_computation() - Using quantized doc lengths in compare_bm25_scores() This ensures the validation functions compute scores identically to how the index computes them, fixing false validation failures.
When a CREATE INDEX transaction is aborted before tp_finalize_build_mode is called, the private DSA would be leaked. Add cleanup logic: - tp_cleanup_build_mode_on_abort() iterates local state cache - For any index in build mode, detaches private DSA and cleans registry - Called from transaction callback on XACT_EVENT_ABORT Also removes leftover CI trigger comment.
With the new private DSA build mode, data is spilled to segments during CREATE INDEX, so the memtable is empty when bm25_spill_index is called afterward. Update test expectations to reflect this.
c042895 to
5631550
Compare
Summary
Eliminates memory leaks during CREATE INDEX by using a private DSA that is destroyed and recreated on each spill, providing perfect memory reclamation.
Problem
As identified in PR #117, index builds leak ~110-400MB per spill cycle due to DSA fragmentation. Even with the threshold reduction in #117, a 50M document build still leaks ~17GB cumulative memory.
Solution: Private DSA with Destroy/Recreate
Key Insight: During CREATE INDEX, only one backend is building. We don't need a shared DSA - we can use a private one and destroy it completely between spills.
Implementation:
Architecture:
Changes
New functions:
tp_create_build_index_state(): Creates private DSA instead of using globaltp_recreate_build_dsa(): Destroys old DSA and creates fresh onetp_clear_memtable(): Calls recreation in build modeModified files:
src/state/state.h: Addedis_build_modeflagsrc/state/state.c: Implemented private DSA lifecyclesrc/am/build.c: Use build mode during CREATE INDEXExpected Results
Memory profile with private DSA:
For any dataset size: Peak stays at ~430MB
Comparison
Testing Plan
Relationship to PR #117
PR #117 provides immediate mitigation and enables large-scale benchmarks.
This PR provides the complete architectural fix for unlimited scalability.
Both can be merged independently - #117 helps immediately, this PR eliminates the issue entirely.