feat(retrieval): add manual BM25 query expansion and diagnostics by rwmjhb · Pull Request #438 · CortexReach/memory-lancedb-pro

rwmjhb · 2026-04-01T07:39:57Z

Summary

This PR replaces #292 with a maintainer-owned branch.

It keeps the core retrieval ergonomics changes from the original PR while addressing the review feedback that was still open:

BM25 query expansion for manual / CLI retrieval only
Retrieval diagnostics on the retriever itself
CLI debug output for retrieval diagnostics, including failure paths from the ephemeral search retriever

Why

Users often search with colloquial phrases such as 挂了, 卡住, or 报错, while stored memories often contain more technical wording like crash, timeout, error, or exception.

The vector leg already helps semantically, but the BM25 leg still matters for exact-term boosting and mixed-language memory bases. This improves the explicit/manual lookup path while deliberately leaving auto-recall unchanged.

Scope and safety

no change to auto-recall query behavior
no change to vector query text
query expansion stays limited to manual / CLI retrieval
replacement branch drops the unrelated vllm rerank provider changes from feat(retrieval): add manual BM25 query expansion and diagnostics #292
CLI failure diagnostics now come from the actual search retriever instance, not just context.retriever

Validation

node --test test/query-expander.test.mjs
node test/cli-smoke.mjs
node test/retriever-rerank-regression.mjs

Passed locally.

AliceLJY

LGTM — scope is clean, tests are thorough (774 lines), query expansion logic is solid.

One finding to confirm:

vectorOnlyRetrieval: recencyBoost no longer gated by decayEngine

In the original code, when decayEngine is present, applyRecencyBoost is skipped in the vector-only path. In the new code, applyRecencyBoost always runs regardless of decayEngine. This could cause double time-weighting (decay + recency) in the vector-only + decayEngine scenario.

bm25OnlyRetrieval and hybridRetrieval still gate recencyBoost behind !decayEngine, so this is an inconsistency in vectorOnlyRetrieval only.

If intentional, a one-line comment explaining why would be helpful. If not, restore the if (!this.decayEngine) guard.

Minor: formatRetrievalDiagnosticsLines in cli.ts uses an inline type that's a subset of RetrievalDiagnostics — consider importing the interface directly to avoid future type drift.

feat: replace PR 292 retrieval query expansion

6ca7705

github-actions bot assigned rwmjhb Apr 1, 2026

rwmjhb mentioned this pull request Apr 1, 2026

feat(retrieval): add manual BM25 query expansion and diagnostics #292

Closed

AliceLJY approved these changes Apr 1, 2026

View reviewed changes

rwmjhb merged commit c25801e into master Apr 3, 2026
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retrieval): add manual BM25 query expansion and diagnostics#438

feat(retrieval): add manual BM25 query expansion and diagnostics#438
rwmjhb merged 1 commit intomasterfrom
codex/pr292-replacement

rwmjhb commented Apr 1, 2026

Uh oh!

AliceLJY left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rwmjhb commented Apr 1, 2026

Summary

Why

Scope and safety

Validation

Uh oh!

AliceLJY left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants