feat(retrieval): add manual BM25 query expansion and diagnostics by fryeggs · Pull Request #292 · CortexReach/memory-lancedb-pro

fryeggs · 2026-03-20T15:54:09Z

Summary

This is PR 1/4 in a series that extends memory-lancedb-pro in layered steps.

This PR is intentionally the smallest and lowest-risk step. It improves manual retrieval ergonomics without changing auto-recall behavior.

It adds:

BM25 query expansion for manual / CLI retrieval only
Retrieval diagnostics for manual / CLI search
CLI debug output for retrieval diagnostics

Why

Users often search with colloquial phrases such as 挂了, 卡住, or 报错, while stored memories often contain more technical wording like crash, timeout, error, or exception.

The vector leg already helps semantically, but the BM25 leg still matters for exact-term boosting and mixed-language memory bases. This PR improves that explicit/manual lookup path while deliberately leaving auto-recall unchanged.

Scope and safety

no change to auto-recall query behavior
no change to vector query text
query expansion is limited to manual / CLI retrieval
focused tests cover colloquial expansion, false-positive protection, gating, and debug output

Companion PRs in this series

PR 2/4: scored capture pipeline and ingestion safeguards
PR 3/4: standalone runtime and MCP surface
PR 4/4: Claude / Codex host integrations

Validation

node --test test/query-expander.test.mjs

Passed locally.

rwmjhb · 2026-03-23T02:58:15Z

This error path is reading diagnostics from the wrong retriever instance.

When context.embedder is present, runSearch() executes against a fresh retriever created
by getSearchRetriever(), not context.retriever. But the outer catch block still reads
diagnostics from context.retriever.getLastDiagnostics?.().

In the normal runtime path, CLI registration does pass embedder, so on memory-pro search --debug / --json --debug failures we can lose the diagnostics payload entirely and only
return the error.

I verified this with a minimal repro against the PR branch. The current tests don’t catch it
because they stub context.retriever directly and don’t exercise the embedder -> createRetriever() path.

Can we thread the last-used search retriever (or its diagnostics) through the failure path so
debug output comes from the instance that actually executed the search?

AliceLJY

Thanks for the BM25 query expansion work! One thing to confirm before merging:

queryExpansion defaults to true in DEFAULT_RETRIEVAL_CONFIG — but the PR description says "no change to auto-recall query behavior". If DEFAULT_RETRIEVAL_CONFIG is shared with the auto-recall path, this default would enable expansion there too. Could you confirm whether queryExpansion: true only affects the manual/CLI path, or also applies to auto-recall?

Also, the vllm rerank provider is added in retriever.ts but isn't mentioned in the PR title/description — could you either note it explicitly or split it into a separate feat: PR?

Happy to approve once these are clarified!

feat: improve manual retrieval diagnostics and query expansion

23874e9

fryeggs changed the title ~~feat: improve manual retrieval diagnostics and BM25 query expansion~~ feat(retrieval): add manual BM25 query expansion and diagnostics Mar 20, 2026

andychu666 mentioned this pull request Mar 21, 2026

feat: configurable mapping table for cross-script BM25 query expansion (generalizes #292) #297

Closed

This was referenced Mar 22, 2026

feat(integrations): add Claude and Codex host bridges #295

Closed

feat(runtime): add standalone MCP server and shared host runtime #294

Closed

AliceLJY reviewed Mar 31, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(retrieval): add manual BM25 query expansion and diagnostics#292

feat(retrieval): add manual BM25 query expansion and diagnostics#292
fryeggs wants to merge 1 commit intoCortexReach:masterfrom
fryeggs:codex/query-expander

fryeggs commented Mar 20, 2026 •

edited

Loading

Uh oh!

rwmjhb commented Mar 23, 2026

Uh oh!

AliceLJY left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fryeggs commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Scope and safety

Companion PRs in this series

Validation

Uh oh!

rwmjhb commented Mar 23, 2026

Uh oh!

AliceLJY left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fryeggs commented Mar 20, 2026 •

edited

Loading