Use private DSA for index builds to eliminate memory leaks #118

tjgreen42 · 2026-01-08T02:41:10Z

Summary

Eliminates memory leaks during CREATE INDEX by using a private DSA that is destroyed and recreated on each spill, providing perfect memory reclamation.

Problem

As identified in PR #117, index builds leak ~110-400MB per spill cycle due to DSA fragmentation. Even with the threshold reduction in #117, a 50M document build still leaks ~17GB cumulative memory.

Solution: Private DSA with Destroy/Recreate

Key Insight: During CREATE INDEX, only one backend is building. We don't need a shared DSA - we can use a private one and destroy it completely between spills.

Implementation:

// Create private DSA for build (not in global registry)
private_dsa = dsa_create(tranche_id);  

// After spill:
dsa_detach(private_dsa);  // Destroys DSA + ALL memory → OS
private_dsa = dsa_create(tranche_id);  // Fresh DSA for next batch

Architecture:

BUILD MODE: Private DSA, destroyed/recreated per spill → 0% memory leak
RUNTIME MODE: Shared DSA for concurrent inserts (unchanged)
Same data structures: dshash, posting lists work identically in both modes
Minimal changes: ~100 lines of code

Changes

New functions:

tp_create_build_index_state(): Creates private DSA instead of using global
tp_recreate_build_dsa(): Destroys old DSA and creates fresh one
Updated tp_clear_memtable(): Calls recreation in build mode

Modified files:

src/state/state.h: Added is_build_mode flag
src/state/state.c: Implemented private DSA lifecycle
src/am/build.c: Use build mode during CREATE INDEX

Expected Results

Memory profile with private DSA:

BUILD_START: 23 MB
1M docs: 428 MB (8M postings)
BEFORE_SPILL: 428 MB
AFTER_SPILL: 25 MB  ← Perfect reclamation!
2M docs: 428 MB (no growth!)
BEFORE_SPILL: 428 MB
AFTER_SPILL: 25 MB  ← Still 25 MB!

For any dataset size: Peak stays at ~430MB

Comparison

Approach	50M Docs Peak	Memory Leak	Code Changes
Original (32M threshold)	26GB (OOM)	400MB/spill	0
PR #117 (8M threshold)	~18GB	110MB/spill	~150 lines
This PR (Private DSA)	~430MB	0MB/spill	~100 lines

Testing Plan

Build compiles successfully
Existing regression tests pass
1M document build with memory instrumentation shows perfect reclamation
50M document build completes with constant ~430MB peak
Concurrent inserts still work (runtime mode validation)

Relationship to PR #117

PR #117 provides immediate mitigation and enables large-scale benchmarks.
This PR provides the complete architectural fix for unlimited scalability.

Both can be merged independently - #117 helps immediately, this PR eliminates the issue entirely.

CLAassistant · 2026-01-08T02:41:19Z

All committers have signed the CLA.

tjgreen42

Do we have a test where the new code is exercised in a situation where the memtable already contains data? How is test coverage on new code generally (since it only kicks in for bulk-loads)?

src/am/scan.c

.github/workflows/ci.yml

src/am/build.c

src/am/scan.c

tjgreen42 · 2026-01-09T00:54:33Z

Regarding test coverage for the new build mode code:

Existing tests that exercise the new code paths:

Bulk load tests: The segment.sh Test 3 specifically creates an empty index and then performs multiple spills in a loop - this directly exercises the new private DSA and tp_finalize_build_mode() code path.
Index build with data: All tests that do CREATE INDEX ... ON table where the table already has data exercise the private DSA during build. This includes aerodocs, scoring1-6, segment, merge, etc.
Spill during build: The memory test does a large bulk insert that can trigger auto-spill, which exercises tp_recreate_build_dsa().
Recovery tests: The recovery.sh test verifies that after a crash and restart, indexes can still be used, which exercises the transition from build mode to runtime mode.

For the memtable-already-contains-data case:

When tp_finalize_build_mode() is called, if the memtable contains data, it's first spilled to a segment via the final spill logic in tp_build() (lines 850-897). The tests that have data in the index at build completion (most of them) exercise this path.

The key scenario is:

Private DSA created during build
Data inserted into memtable
Final spill writes data to segment
tp_finalize_build_mode() destroys private DSA and creates fresh memtable in global DSA

All the scoring tests (scoring1-6) verify that scores are correctly computed after this transition, validating that the data was properly spilled and the new memtable is functional.

src/state/state.c

src/mod.c

During CREATE INDEX, use a private DSA (Dynamic Shared Memory Area) that can be completely destroyed and recreated on each memtable spill. This provides perfect memory reclamation during large index builds. Key changes: - Add tp_create_build_index_state() for build mode with private DSA - Add tp_recreate_build_dsa() to destroy/recreate DSA on spills - Add tp_finalize_build_mode() to transition to runtime mode after build - Final spill at end of tp_build() persists memtable data to disk - tp_clear_memtable() uses DSA destruction in build mode The private DSA is only used during index builds. After build completes, the code transitions to using the global shared DSA for normal runtime operation with concurrent access.

The <@> operator now applies the same fieldnorm quantization as segments use for storing document lengths. This ensures that the operator produces identical BM25 scores to index scans, regardless of whether the data is in the memtable or segments. Previously, when data was in segments: - Index scans used quantized fieldnorm (Lucene's SmallFloat encoding) - Operator evaluation recomputed exact doc length from text This caused score mismatches between index scan NOTICEs and operator output columns. Now both paths use quantized lengths consistently. Co-Authored-By: Claude Opus 4.5 <[email protected]>

The test (18) job (PG18) has been hanging indefinitely while test (17) completes normally. Add: - 10-minute timeout to prevent indefinite hangs - Verbose PostgreSQL logging (log_min_messages = LOG) - Display server logs on failure to help debug the issue Tests pass locally on both PG17 and PG18.

When creating a local index state for an existing shared state (runtime path), the is_build_mode field was not initialized. This caused undefined behavior when tp_clear_memtable checked this flag during spill operations. If the uninitialized memory happened to contain a non-zero value, the code would incorrectly enter the build mode path and call dsa_detach on the global shared DSA, disconnecting the backend from shared memory and causing subsequent operations to hang. This bug was non-deterministic and only manifested when the uninitialized memory happened to contain garbage that evaluated to true. It was reliably reproduced in CI on PG18 during the segment.sh Test 3 which performs multiple spills in a loop. Co-Authored-By: Claude Opus 4.5 <[email protected]>

- Remove PG_TRY/PG_CATCH for ordinary control flow in scan.c - Change warning messages to proper ereport ERROR - Remove verbose CI logging (keep timeout as safety measure) - Change LOG messages to DEBUG1 for build mode diagnostics Co-Authored-By: Claude Opus 4.5 <[email protected]>

The index scoring uses Lucene's SmallFloat encoding (fieldnorm) to compress document lengths into single bytes. This causes small differences between pure BM25 formula and actual index scores. Update validation SQL to match by: - Adding fieldnorm_quantize() function with full 256-value decode table - Using quantized doc lengths in validate_bm25_scoring() - Using quantized doc lengths in debug_bm25_computation() - Using quantized doc lengths in compare_bm25_scores() This ensures the validation functions compute scores identically to how the index computes them, fixing false validation failures.

When a CREATE INDEX transaction is aborted before tp_finalize_build_mode is called, the private DSA would be leaked. Add cleanup logic: - tp_cleanup_build_mode_on_abort() iterates local state cache - For any index in build mode, detaches private DSA and cleans registry - Called from transaction callback on XACT_EVENT_ABORT Also removes leftover CI trigger comment.

With the new private DSA build mode, data is spilled to segments during CREATE INDEX, so the memtable is empty when bm25_spill_index is called afterward. Update test expectations to reflect this.

tjgreen42 force-pushed the feat/memorycontext-build-mode branch 6 times, most recently from 49ddcfa to a0a8b60 Compare January 8, 2026 23:54

tjgreen42 commented Jan 9, 2026

View reviewed changes

src/am/scan.c Outdated Show resolved Hide resolved

.github/workflows/ci.yml Outdated Show resolved Hide resolved

src/am/build.c Outdated Show resolved Hide resolved

src/am/scan.c Outdated Show resolved Hide resolved

src/am/scan.c Outdated Show resolved Hide resolved

tjgreen42 force-pushed the feat/memorycontext-build-mode branch from 11e84c2 to 1740c2b Compare January 9, 2026 00:55

tjgreen42 commented Jan 9, 2026

View reviewed changes

src/state/state.c Show resolved Hide resolved

src/mod.c Outdated Show resolved Hide resolved

tjgreen42 and others added 8 commits January 8, 2026 19:08

Update coverage test for build mode spill behavior

5631550

With the new private DSA build mode, data is spilled to segments during CREATE INDEX, so the memtable is empty when bm25_spill_index is called afterward. Update test expectations to reflect this.

tjgreen42 force-pushed the feat/memorycontext-build-mode branch from c042895 to 5631550 Compare January 9, 2026 03:11

tjgreen42 merged commit 7bfb0f5 into main Jan 9, 2026
12 checks passed

tjgreen42 deleted the feat/memorycontext-build-mode branch January 9, 2026 03:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use private DSA for index builds to eliminate memory leaks #118

Use private DSA for index builds to eliminate memory leaks #118

Uh oh!

tjgreen42 commented Jan 8, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Jan 8, 2026 •

edited

Loading

Uh oh!

tjgreen42 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tjgreen42 commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Use private DSA for index builds to eliminate memory leaks #118

Use private DSA for index builds to eliminate memory leaks #118

Uh oh!

Conversation

tjgreen42 commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution: Private DSA with Destroy/Recreate

Changes

Expected Results

Comparison

Testing Plan

Relationship to PR #117

Uh oh!

CLAassistant commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tjgreen42 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tjgreen42 commented Jan 9, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tjgreen42 commented Jan 8, 2026 •

edited

Loading

CLAassistant commented Jan 8, 2026 •

edited

Loading