Add Block-Max WAND (BMW) optimization for top-k queries #102

tjgreen42 · 2026-01-01T07:54:38Z

Summary

Implement Block-Max WAND (BMW) for top-k retrieval using block-level upper bounds in V2 segments
Add top-k min-heap with O(1) threshold access and O(log k) updates for efficient result tracking
BMW fast path for single-term queries (skip blocks that can't contribute to top-k)
BMW fast path for multi-term queries with WAND-style doc-ID ordered traversal
Batch doc_freq lookups to reduce segment open/close overhead for multi-term queries
GUC variables pg_textsearch.enable_bmw and log_bmw_stats for debugging/benchmarking

Performance (MS MARCO 8.8M docs, p50 latency)

Query Length	pg_textsearch	System X	Result
1 token	10.14ms	18.05ms	1.8x faster
2 tokens	12.54ms	17.24ms	1.4x faster
3 tokens	15.14ms	22.94ms	1.5x faster
4 tokens	19.72ms	24.01ms	1.2x faster
5 tokens	25.79ms	26.21ms	~same
6 tokens	32.88ms	33.47ms	~same
7 tokens	42.09ms	32.23ms	1.3x slower
8+ tokens	63.41ms	39.17ms	1.6x slower

Cranfield: 225 queries in 57ms (0.26 ms/query avg)

Note: These benchmarks establish baselines for pg_textsearch—not a head-to-head comparison. System X has different defaults and tuning options; further iteration on configurations required.

Implementation Details

New files:

src/query/bmw.h - Top-k heap, BMW stats, and scoring function interfaces
src/query/bmw.c - Min-heap implementation, block max score computation, single-term and multi-term BMW scoring

Key algorithm:

Compute block max BM25 score from skip entry metadata (block_max_tf, block_max_norm)
Only score blocks where block_max_score >= current_threshold
Update threshold as better results are found
Memtable scored exhaustively (no skip index)

Multi-term optimization:

Sort terms by IDF (highest first) for faster threshold convergence
WAND-style doc-ID ordered traversal across terms' posting lists
Batch doc_freq lookups: opens each segment once instead of once per term
Reduces segment opens from O(terms × segments) to O(segments)

Testing

All regression tests pass
Shell-based tests pass (concurrency, recovery, segment)
Results match exhaustive scoring path with correct tie-breaking

Copilot

Pull request overview

This PR implements Block-Max WAND (BMW) optimization for top-k BM25 queries, enabling significant performance improvements through intelligent block skipping. The implementation adds specialized fast paths for single-term and multi-term (2-8 terms) queries that compute block-level upper bounds and skip blocks that cannot contribute to results.

Key changes:

Single-term BMW: Pre-computes block max scores from skip entry metadata and skips blocks below the top-k threshold
Multi-term BMW: Uses WAND-style block skipping with batched doc_freq lookups to minimize segment I/O
Exhaustive fallback: Queries with >8 terms continue using the existing exhaustive scoring path

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/query/bmw.h	Top-k min-heap interface, BMW statistics, and scoring function declarations
src/query/bmw.c	Complete BMW implementation with heap operations and block-level scoring
src/query/score.c	Integration of BMW fast paths with fallback to exhaustive scoring
src/segment/segment.h	Exported segment iterator and skip entry reader for BMW block access
src/segment/scan.c	Batch doc_freq lookup and iterator function exports
test/sql/bmw.sql	Comprehensive BMW test suite covering edge cases and correctness validation
test/expected/bmw.out	Expected test outputs
test/expected/segment.out	Updated for deterministic CTID ordering
test/expected/merge.out	Updated for deterministic CTID ordering
benchmarks/datasets/msmarco/queries.sql	Enhanced benchmarks using real MS MARCO queries
benchmarks/datasets/msmarco/parallel_query.pgbench	pgbench script for parallel query testing
benchmarks/datasets/msmarco/parallel_bench.sh	Parallel benchmark harness with scaling tests
benchmarks/datasets/msmarco/load.sql	Fixed CSV loading and query set handling
Makefile	Added BMW object files to build
.github/workflows/benchmark.yml	Integrated parallel benchmark into CI

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

test/sql/bmw.sql

src/query/bmw.c

benchmarks/datasets/msmarco/parallel_bench.sh

src/query/bmw.c

benchmarks/datasets/msmarco/parallel_bench.sh

tjgreen42 · 2026-01-06T00:22:17Z

Addressed PR feedback in fc3fce5:

test/sql/bmw.sql:298 - Fixed misleading comment about 'previous version'
src/query/bmw.c:450 - Added justification for BMW_MAX_TERMS=8 threshold
src/query/bmw.c:480 - Clarified DOC_ACCUM_HASH_SIZE comment

Note: Comments on parallel_bench.sh are moot - that file was removed (benchmark changes moved to separate PR).

Implement early termination for top-k retrieval using block-level upper bounds stored in V2 segments. This skips entire blocks (128 docs) that cannot contribute to top-k results, improving query performance. Changes: - Add bmw.h/bmw.c with top-k min-heap and BMW scoring functions - Export segment iterator functions for BMW block access - Integrate BMW fast path in tp_score_documents() for single-term queries - Fix test expected files for deterministic tie-breaking order The BMW path computes block max scores from skip entry metadata and only scores blocks where block_max_score >= current threshold. Memtable is scored exhaustively (no skip index). Multi-term WAND to follow. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Extend BMW to handle queries with 2-8 terms. For each segment: - Pre-compute block max scores for all query terms - Skip blocks where sum of block_max_scores < threshold - Use hash table to accumulate scores per document within blocks The multi-term path uses the same top-k heap as single-term BMW. Memtable is scored exhaustively since it lacks skip index metadata. Falls back to exhaustive scoring for queries with >8 terms. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

V1 segments are no longer supported - all segments use the V2 format with skip index metadata for block-max scoring. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Refactor code organization: - Move bmw.c, bmw.h from src/segment/ to src/query/ - Move score.c, score.h from src/types/ to src/query/ - Update include paths across codebase - Add comprehensive BMW test suite (test/sql/bmw.sql) The query/ directory is a more appropriate location for query-time optimization code (BMW, scoring) rather than segment/ which is for storage format.

The posting iterator's tp_segment_posting_iterator_next() auto-advances to subsequent blocks when the current block is exhausted. This caused BMW to process ALL blocks once the first non-skipped block was entered, defeating the entire block-skipping optimization. Fix by: 1. Resetting iter.finished before each block so subsequent blocks can be processed 2. Breaking out of the inner while loop when the iterator advances past the current block, allowing the outer for loop to apply threshold checks to decide whether to process subsequent blocks This restores BMW's ability to skip blocks whose block_max_score is below the current top-k threshold. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

For multi-term queries, the previous implementation called tp_get_unified_doc_freq() in a loop, which opened/closed each segment once per term. With 5 terms and 6 segments, that's 30 segment open/close cycles per query. Add tp_batch_get_segment_doc_freq() which opens each segment once and looks up all terms, reducing segment opens from O(terms * segments) to O(segments). This addresses the ~32% time spent in tp_segment_get_doc_freq seen in profiling. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

Instead of synthetic queries like "what is the capital of france", use the actual MS-MARCO dev query set (10K queries) for more representative benchmarking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Use CREATE TABLE/INDEX IF NOT EXISTS to skip if already present - Avoids rebuilding index on re-runs (saves 30-60 min locally) - TRUNCATE only if data missing, not on every run

My idempotent changes broke data loading (0 passages loaded). Reverting to the simple DROP/CREATE approach that works.

The $1 and $2 need backslash escaping to prevent shell expansion when used inside psql's \copy FROM PROGRAM.

Adds pgbench-based parallel throughput testing: - parallel_query.pgbench: pgbench script for BM25 queries - parallel_bench.sh: Runs scaling test with 1-N clients (N = CPUs) - benchmark.yml: Runs parallel benchmark after sequential queries Local results (8.8M passages, release build): - 1 client: 53 QPS (18.8 ms) - 4 clients: 163 QPS (24.6 ms) - 8 clients: 210 QPS (38.2 ms) - 12 clients: 219 QPS (54.9 ms)

- Remove WAND algorithm (score_segment_wand and related cursor code) - Remove WAND iterator functions from segment/scan.c - Revert benchmark/MSMARCO changes (moved to separate PR) - Keep: GUC variables log_bmw_stats and enable_bmw - Keep: Improved stats tracking (memtable_docs vs segment_docs_scored) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

- Fix misleading comment about "previous version" in bmw.sql test - Add justification for BMW_MAX_TERMS=8 threshold - Clarify DOC_ACCUM_HASH_SIZE comment (2x block size for hash efficiency) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Benchmarks show BMW outperforms exhaustive scoring even for 8+ term queries. The previous assumption that "exhaustive wins beyond 8 terms" was incorrect - bucket 8 queries were 2.7x slower than System X due to falling back to exhaustive scanning. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

- Mark v0.3.0 BMW items as completed - Document the doc-ID ordered traversal limitation for long queries - Note: current BMW iterates by block index, not doc ID, which limits effectiveness for 8+ term queries 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Detail the two related limitations: 1. Block-index iteration vs doc-ID iteration for multi-term queries 2. Sequential single-block skipping vs binary search multi-block seeking Include code example and explain where O(log n) seeking would help. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

- Fix score.c: change query_term_count <= 8 to >= 2 (use BMW for all multi-term queries) - Add tp_topk_free() to properly clean up heap memory - Fix hash table leak: destroy doc_accum inside block loop, not after - Rename NYE banner to v0.2.0, add v0.3.0-dev banner 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Update image path after rename from nye_2026 to v0.3.0-dev. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Documents at different block positions across terms get partial scores instead of complete scores. The wand.sql test documents this bug - doc 201 should be #1 but BMW misses it entirely. Expected output shows current buggy behavior. Will be updated when WAND-style doc-ID traversal is implemented. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

The previous block-index iteration assumed blocks were aligned by doc ID across terms' posting lists. They aren't - each term has its own block structure. Documents at different block positions across terms got partial scores instead of complete scores. This implements WAND-style doc-ID ordered traversal: - Iterate by minimum doc_id across all term iterators (pivot) - Accumulate complete scores from all terms at the pivot - Maintain block-max optimization for early threshold pruning Added tp_segment_posting_iterator_seek() and current_doc_id() functions to support efficient doc-ID based iteration with binary search on skip entries.

DSA memory size varies by environment, causing CI failures.

Bare bm25query casts don't work reliably when BMW is disabled.

- Move v0.2.0 to Released (Dec 2025) - Update v0.3.0 to reflect completed BMW with doc-ID traversal - Set Jan 2026 dates for v0.3.0 and v0.4.0 - Update benchmark results (4.3x faster than exhaustive)

The zero-copy buffer management in tp_segment_get_direct() was sharing buffer pins with reader->current_buffer. When multiple term iterators were active and a different term's initialization read the dictionary from a different page, tp_segment_read() would release the shared buffer, invalidating the first iterator's block_postings pointer. Fix: tp_segment_get_direct() now creates its own independent buffer pin rather than sharing with reader->current_buffer. The release function now properly releases the pin it owns. Also adds a 3-term query test to wand.sql to prevent regression. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

New test sections: - Section 18: Partial last block (150 docs = 128 + 22) - Section 19: Term only in memtable (not in segments) - Section 20: Sparse posting lists (few postings per block) - Section 21: Multi-term with asymmetric block structures - Section 22: Threshold exactly equals block_max - Section 23: All iterators exhaust simultaneously These tests cover edge cases in the BMW algorithm that were identified as potential gaps during code review.

- Extract 7 helpers from score_segment_multi_term_bmw (228→51 lines): - advance_term_iterator - init_segment_term_states / cleanup_segment_term_states - find_pivot_doc_id / compute_pivot_max_score - score_pivot_document / skip_pivot_document - Extract 2 helpers from tp_score_all_terms_in_segment_chain (161→28 lines): - score_term_postings - score_segment_for_all_terms - Eliminate goto statement by using return value from init function - Remove duplicated binary search code, reuse tp_segment_posting_iterator_init

test/expected/bmw.out

test/expected/wand.out

- Rewrite wand.sql header as regression test (remove "critical bug" verbiage) - Remove non-verified "Verify:" comments from bmw.sql tests - Update expected output files

tjgreen42

Addressed review comments in e87f443:

wand.out:3 - Rewrote header to be a regression test description, removed "critical bug" verbiage.

bmw.out:40, 116 - Removed non-verified "Verify:" comments since they weren't being checked programmatically.

bmw.out:374 (WHERE clauses) - The WHERE clause WHERE content <@> 'term'::bm25query < 0 is necessary: without it, non-matching documents would appear with score 0. The clause ensures we only get documents that actually match the query term.

bmw.out:374 (score stability) - Scores are deterministic based on BM25 formula with fixed k1/b parameters. The full precision is useful for detecting subtle scoring regressions.

bmw.out:390 (validation.sql) - The SECTION 10 approach compares BMW-optimized vs exhaustive paths using EXCEPT, which directly validates that both paths produce identical results. The validation.sql functions compare against a reference SQL implementation, which is a different (complementary) form of validation. Both are valuable; the current approach specifically tests the BMW optimization.

tjgreen42 · 2026-01-07T03:03:25Z

Correction on WHERE clauses:

After investigation, the WHERE clauses serve two different purposes:

Non-existent query terms: When the term doesn't exist in the index, the planner falls back to Seq Scan and returns all docs with score 0. WHERE is necessary to get empty results for these tests (e.g., SECTION 2 'elephant' test).
Existing query terms: Index Scan is used and only returns matching documents. WHERE is redundant but harmless.

So for correctness tests with matching queries, WHERE can be removed. For 'empty results' tests, WHERE must stay.

Given that removing WHERE from matching queries would be a large change with no functional benefit (and WHERE with Index Scan adds only a trivial Filter step), I'd suggest keeping them for consistency unless you feel strongly about removing them.

WHERE clause `content <@> query < 0` is only needed for small tables where the planner uses seq scan. For large tables (100+ rows), the index scan is used and only returns matching documents, making WHERE redundant. Removed WHERE from tests with large tables: - SECTION 9: bmw_hybrid (1002 rows) - SECTION 10: bmw_validate (1000 rows) - SECTION 11: bmw_monotonic (500 rows) - SECTION 12: bmw_blocks (643 rows) - SECTION 15-16: bmw_scattered, bmw_multi_scattered (1000 rows) - SECTION 17: bmw_multiseg (700 rows) - SECTION 18-23: bmw_partial through bmw_simul (100-500 rows) Kept WHERE for small table tests (< 50 rows) where seq scan is used: - SECTIONS 1-8, 13-14: small functional tests - Empty result tests: 'elephant', 'xxx yyy zzz'

tjgreen42 · 2026-01-07T04:02:41Z

Correction on WHERE clauses:

After investigation, the WHERE clauses serve two different purposes:

Non-existent query terms: When the term doesn't exist in the index, the planner falls back to Seq Scan and returns all docs with score 0. WHERE is necessary to get empty results for these tests (e.g., SECTION 2 'elephant' test).

Existing query terms: Index Scan is used and only returns matching documents. WHERE is redundant but harmless.

So for correctness tests with matching queries, WHERE can be removed. For 'empty results' tests, WHERE must stay.

Given that removing WHERE from matching queries would be a large change with no functional benefit (and WHERE with Index Scan adds only a trivial Filter step), I'd suggest keeping them for consistency unless you feel strongly about removing them.

Please remove any new ones, they just muddy the waters.

SteveLauC · 2026-01-07T05:07:45Z

images/tapir_and_friends_v0.3.0-dev.png

Just wanna say I love this Stranger Things-style banner 💯

Aww, I was hoping someone would notice!

BTW, the "not yet optimized" tagline is not entirely true anymore, as can be seen from the numbers in the PR description. This little extension is starting to kick some butt.

tjgreen42 force-pushed the feat/block-max-wand branch from e047e80 to 64a0270 Compare January 1, 2026 21:56

tjgreen42 changed the title ~~Add Block-Max WAND (BMW) optimization for single-term queries~~ Add Block-Max WAND (BMW) optimization for top-k queries Jan 2, 2026

tjgreen42 requested a review from Copilot January 5, 2026 23:49

Copilot AI reviewed Jan 5, 2026

View reviewed changes

tjgreen42 and others added 14 commits January 5, 2026 17:41

Remove V1 segment fallback from BMW scoring

ca97cbe

V1 segments are no longer supported - all segments use the V2 format with skip index metadata for block-max scoring. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Make MS MARCO load.sql idempotent

f082270

- Use CREATE TABLE/INDEX IF NOT EXISTS to skip if already present - Avoids rebuilding index on re-runs (saves 30-60 min locally) - TRUNCATE only if data missing, not on every run

Revert load.sql to working version

7af8420

My idempotent changes broke data loading (0 passages loaded). Reverting to the simple DROP/CREATE approach that works.

Fix awk escaping in load.sql

43139d5

The $1 and $2 need backslash escaping to prevent shell expansion when used inside psql's \copy FROM PROGRAM.

Address PR review feedback

8cbc0a0

- Fix misleading comment about "previous version" in bmw.sql test - Add justification for BMW_MAX_TERMS=8 threshold - Clarify DOC_ACCUM_HASH_SIZE comment (2x block size for hash efficiency) 🤖 Generated with [Claude Code](https://claude.com/claude-code)

Fix expected output to match updated bmw.sql comment

a0eb255

🤖 Generated with [Claude Code](https://claude.com/claude-code)

tjgreen42 force-pushed the feat/block-max-wand branch from 2a1a849 to a0eb255 Compare January 6, 2026 01:42

tjgreen42 and others added 5 commits January 5, 2026 18:24

Fix README banner image reference

012c0a9

Update image path after rename from nye_2026 to v0.3.0-dev. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

tjgreen42 requested a review from svenklemm January 6, 2026 04:44

tjgreen42 marked this pull request as draft January 6, 2026 15:52

tjgreen42 and others added 3 commits January 6, 2026 09:29

Remove bm25_summarize_index from wand test

6aa75cd

DSA memory size varies by environment, causing CI failures.

Fix wand test: use to_bm25query for explicit index binding

c3c4fe3

Bare bm25query casts don't work reliably when BMW is disabled.

tjgreen42 marked this pull request as ready for review January 6, 2026 21:25

tjgreen42 and others added 6 commits January 6, 2026 13:27

Update ROADMAP: BMW complete with WAND traversal fix

4a5d5cb

- Move v0.2.0 to Released (Dec 2025) - Update v0.3.0 to reflect completed BMW with doc-ID traversal - Set Jan 2026 dates for v0.3.0 and v0.4.0 - Update benchmark results (4.3x faster than exhaustive)

Update banner images

dfd6aef

Update banner images

f23d806

tjgreen42 commented Jan 7, 2026

View reviewed changes

test/expected/bmw.out Outdated Show resolved Hide resolved

test/expected/bmw.out Outdated Show resolved Hide resolved

test/expected/bmw.out Show resolved Hide resolved

test/expected/bmw.out Show resolved Hide resolved

test/expected/wand.out Outdated Show resolved Hide resolved

Address PR review comments

e87f443

- Rewrite wand.sql header as regression test (remove "critical bug" verbiage) - Remove non-verified "Verify:" comments from bmw.sql tests - Update expected output files

tjgreen42 commented Jan 7, 2026

View reviewed changes

tjgreen42 merged commit 87f6ae4 into main Jan 7, 2026
11 checks passed

tjgreen42 deleted the feat/block-max-wand branch January 7, 2026 04:07

SteveLauC reviewed Jan 7, 2026

View reviewed changes

Add Block-Max WAND (BMW) optimization for top-k queries #102

Add Block-Max WAND (BMW) optimization for top-k queries #102

Uh oh!

Conversation

tjgreen42 commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance (MS MARCO 8.8M docs, p50 latency)

Implementation Details

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tjgreen42 commented Jan 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tjgreen42 left a comment

Choose a reason for hiding this comment

Uh oh!

tjgreen42 commented Jan 7, 2026

Uh oh!

tjgreen42 commented Jan 7, 2026

Uh oh!

Uh oh!

SteveLauC Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

tjgreen42 Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

tjgreen42 Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tjgreen42 commented Jan 1, 2026 •

edited

Loading