-
Notifications
You must be signed in to change notification settings - Fork 605
Description
Classification: Feature Request (Enhancement), not a bug.
The reflection pipeline works as designed. This issue proposes adding a missing capability: the ability to mark reflection items as resolved once their underlying problem is solved, preventing stale items from continuing to be injected into new sessions.
Summary
Reflection items (both invariant and derived kinds) stored in LanceDB have no mechanism to be marked as resolved or superseded. Once a problem is solved in a later session, the reflection system continues to inject stale derived lessons into every new session's context until the item naturally decays past maxAgeDays. This causes the agent to repeatedly suggest "next steps" for problems that have already been resolved.
Impact
- User experience: The agent keeps injecting stale
derived-focusandinherited-rulesentries about problems that are no longer relevant, which confuses the agent and annoys the user. - Context pollution: 6+ stale derived items occupy precious injection budget (topK=6 default) and block relevant items from being injected.
- Trust erosion: When the agent repeatedly advises to "run a contrastive retrieval test" after the user has already confirmed rerank is working, it signals to the user that the memory system doesn't understand state changes.
Current Behavior
The reflection pipeline is unidirectional: extract → store → decay → inject. There is no resolve → invalidate → suppress path.
Detailed pipeline walkthrough
-
Storage (
reflection-item-store.ts): Reflection items are written to LanceDB with metadata typememory-reflection-item. Fields includeitemKind,decayMidpointDays,decayK,baseWeight,quality,storedAt,sessionId. Nostatusfield exists. -
Scoring (
reflection-ranking.ts): Items are scored using a logistic decay function:score = logistic(ageDays, midpointDays, k) * baseWeight * qualityFor
deriveditems: midpoint=7 days, k=0.65, quality=0.95. This means items retain >50% of their score for the first 7 days. -
Filtering (
reflection-recall.ts):filterByMaxAge: Removes items older thanmaxAgeDays(default 45 for invariant, configurable for derived). Items are kept alive for up to 45 days by default.keepMostRecentPerNormalizedKey: Caps items per strictKey within the age window.- No resolution check. No cross-reference with current context.
-
Aggregation (
reflection-aggregation.ts): Groups items by strictKey, computes support/freshness/stability/quality scores, picks representative. No check for whether the underlying problem has been resolved. -
Selection (
reflection-selection.ts): Diversity-aware selection with soft-key deduplication. No resolution filtering. -
Injection (
index.ts,before_prompt_build): Two independent paths inject reflection content:derived-focusblock: frombuildReflectionDerivedFocusBlock()inherited-rulesblock: fromorchestrateDynamicRecall()
Neither path checks if items are stale due to problem resolution.
Existing safeguards (and why they don't help)
| Mechanism | What it does | Why it doesn't solve this |
|---|---|---|
| Logistic decay | Items score decrease over time | derived midpoint is 7 days; items stay high-score for a week. maxAgeDays defaults to 45. |
Repeated-injection guard (recall-engine.ts) |
Prevents re-injecting same item within N turns | Only works within the same session, not across sessions. |
autoRecallExcludeReflection (default: true) |
Keeps reflection items out of auto-recall path | Confirms the two paths are independent; no cross-path suppression possible. |
Same-key penalty in final selection (final-topk-setwise-selection.ts) |
Penalizes duplicate key within same turn | Only applies to items within the same path, same turn. |
Cross-module analysis
I checked every module in the pipeline for resolution mechanisms:
| Module | Has resolution/invalidation? |
|---|---|
reflection-item-store.ts |
❌ No status field |
reflection-ranking.ts |
❌ Only computes decay score |
reflection-recall.ts |
❌ Only filters by age |
reflection-aggregation.ts |
❌ Only groups by key |
reflection-selection.ts |
❌ Only diversity selection |
recall-engine.ts |
❌ Only per-session dedup |
adaptive-retrieval.ts |
❌ Only skips greetings/commands |
noise-filter.ts |
❌ Only filters refusals/meta-questions |
final-topk-setwise-selection.ts |
❌ Only intra-path dedup |
index.ts (before_prompt_build) |
❌ No cross-path suppression |
No module in the entire pipeline provides a mechanism to invalidate or suppress resolved reflection items.
Reproduction Steps
- Session A: Encounter a problem (e.g., rerank misconfiguration). Plugin reflection extracts 6 derived lessons about the problem.
- Session B: Solve the problem. Confirm it's working.
- Session C, D, E...: The 6 stale derived lessons are still injected as
<derived-focus>. Agent is misled into suggesting "the next useful action is a contrastive retrieval test" even though rerank is already resolved. - Stale items persist for up to
maxAgeDays(default 45 for invariant, configurable for derived).
Proposed Solutions (in order of implementation effort)
Option A (Minimal): self_improvement_resolve tool
- Add a new agent tool:
self_improvement_reflection_resolve(query | id) - Marks matching reflection items as
resolved(adds aresolvedAttimestamp to metadata) reflection-recall.tsskips items whereresolvedAtis set- Effort: Low. Touches
tools.ts,reflection-item-store.ts,reflection-recall.ts. - Tradeoff: Requires the agent/user to know to call the tool. Not automatic.
Option B (Medium): Cross-pipeline suppression via memory signals
- When a new memory entry is stored (via
memory_store) that semantically contradicts a reflection item, automatically discount that reflection item. - Implementation: During reflection recall scoring, check if any stored memory (from the auto-recall pipeline) has high semantic similarity to a reflection item but with opposite intent (e.g., "rerank is working" vs "rerank needs contrastive test"). If so, apply an additional decay multiplier.
- Effort: Medium. Requires cross-referencing between the two pipelines.
- Tradeoff: Needs good classification of "contradictory" vs "supporting" signals.
Option C (Full): superseded status + lifecycle management
- Add
statusfield to reflection items:active | resolved | superseded - Add
self_improvement_reflection_supersede(strictKey, reason)tool - New reflection items with the same
strictKeyautomatically mark older items assuperseded - Reflection recall only returns items with
status === 'active' - Effort: Medium-High. Touches store, scoring, injection, tools.
- Tradeoff: Most robust solution, but adds complexity to the metadata schema.
Environment
- memory-lancedb-pro version: 1.1.0-beta.6
- OpenClaw version: 2026.3.22+
- sessionStrategy:
memoryReflection - memoryReflection.injectMode:
inheritance+derived - memoryReflection.recall.mode:
dynamic
Source Files Referenced
src/reflection-item-store.ts— item metadata and decay defaultssrc/reflection-ranking.ts— logistic decay scoringsrc/reflection-recall.ts— dynamic recall rankingsrc/reflection-aggregation.ts— group aggregation and scoringsrc/reflection-selection.ts— diversity-aware selectionsrc/recall-engine.ts— repeated-injection guard and age filteringsrc/adaptive-retrieval.ts— query skip logicsrc/noise-filter.ts— content quality filteringsrc/final-topk-setwise-selection.ts— final top-k selection (shared by both paths)index.ts—before_prompt_buildhook and injection orchestration