feat(retrieval): add manual BM25 query expansion and diagnostics#438
feat(retrieval): add manual BM25 query expansion and diagnostics#438
Conversation
AliceLJY
left a comment
There was a problem hiding this comment.
LGTM — scope is clean, tests are thorough (774 lines), query expansion logic is solid.
One finding to confirm:
vectorOnlyRetrieval: recencyBoost no longer gated by decayEngine
In the original code, when decayEngine is present, applyRecencyBoost is skipped in the vector-only path. In the new code, applyRecencyBoost always runs regardless of decayEngine. This could cause double time-weighting (decay + recency) in the vector-only + decayEngine scenario.
bm25OnlyRetrieval and hybridRetrieval still gate recencyBoost behind !decayEngine, so this is an inconsistency in vectorOnlyRetrieval only.
If intentional, a one-line comment explaining why would be helpful. If not, restore the if (!this.decayEngine) guard.
Minor: formatRetrievalDiagnosticsLines in cli.ts uses an inline type that's a subset of RetrievalDiagnostics — consider importing the interface directly to avoid future type drift.
Summary
This PR replaces #292 with a maintainer-owned branch.
It keeps the core retrieval ergonomics changes from the original PR while addressing the review feedback that was still open:
Why
Users often search with colloquial phrases such as
挂了,卡住, or报错, while stored memories often contain more technical wording likecrash,timeout,error, orexception.The vector leg already helps semantically, but the BM25 leg still matters for exact-term boosting and mixed-language memory bases. This improves the explicit/manual lookup path while deliberately leaving auto-recall unchanged.
Scope and safety
vllmrerank provider changes from feat(retrieval): add manual BM25 query expansion and diagnostics #292context.retrieverValidation
Passed locally.