Skip to content

[Feature Request / Enhancement] Reflection items lack resolution/invalidation mechanism after underlying problem is solved #395

@superfat1988

Description

@superfat1988

Classification: Feature Request (Enhancement), not a bug.
The reflection pipeline works as designed. This issue proposes adding a missing capability: the ability to mark reflection items as resolved once their underlying problem is solved, preventing stale items from continuing to be injected into new sessions.


Summary

Reflection items (both invariant and derived kinds) stored in LanceDB have no mechanism to be marked as resolved or superseded. Once a problem is solved in a later session, the reflection system continues to inject stale derived lessons into every new session's context until the item naturally decays past maxAgeDays. This causes the agent to repeatedly suggest "next steps" for problems that have already been resolved.

Impact

  • User experience: The agent keeps injecting stale derived-focus and inherited-rules entries about problems that are no longer relevant, which confuses the agent and annoys the user.
  • Context pollution: 6+ stale derived items occupy precious injection budget (topK=6 default) and block relevant items from being injected.
  • Trust erosion: When the agent repeatedly advises to "run a contrastive retrieval test" after the user has already confirmed rerank is working, it signals to the user that the memory system doesn't understand state changes.

Current Behavior

The reflection pipeline is unidirectional: extract → store → decay → inject. There is no resolve → invalidate → suppress path.

Detailed pipeline walkthrough

  1. Storage (reflection-item-store.ts): Reflection items are written to LanceDB with metadata type memory-reflection-item. Fields include itemKind, decayMidpointDays, decayK, baseWeight, quality, storedAt, sessionId. No status field exists.

  2. Scoring (reflection-ranking.ts): Items are scored using a logistic decay function:

    score = logistic(ageDays, midpointDays, k) * baseWeight * quality
    

    For derived items: midpoint=7 days, k=0.65, quality=0.95. This means items retain >50% of their score for the first 7 days.

  3. Filtering (reflection-recall.ts):

    • filterByMaxAge: Removes items older than maxAgeDays (default 45 for invariant, configurable for derived). Items are kept alive for up to 45 days by default.
    • keepMostRecentPerNormalizedKey: Caps items per strictKey within the age window.
    • No resolution check. No cross-reference with current context.
  4. Aggregation (reflection-aggregation.ts): Groups items by strictKey, computes support/freshness/stability/quality scores, picks representative. No check for whether the underlying problem has been resolved.

  5. Selection (reflection-selection.ts): Diversity-aware selection with soft-key deduplication. No resolution filtering.

  6. Injection (index.ts, before_prompt_build): Two independent paths inject reflection content:

    • derived-focus block: from buildReflectionDerivedFocusBlock()
    • inherited-rules block: from orchestrateDynamicRecall()

    Neither path checks if items are stale due to problem resolution.

Existing safeguards (and why they don't help)

Mechanism What it does Why it doesn't solve this
Logistic decay Items score decrease over time derived midpoint is 7 days; items stay high-score for a week. maxAgeDays defaults to 45.
Repeated-injection guard (recall-engine.ts) Prevents re-injecting same item within N turns Only works within the same session, not across sessions.
autoRecallExcludeReflection (default: true) Keeps reflection items out of auto-recall path Confirms the two paths are independent; no cross-path suppression possible.
Same-key penalty in final selection (final-topk-setwise-selection.ts) Penalizes duplicate key within same turn Only applies to items within the same path, same turn.

Cross-module analysis

I checked every module in the pipeline for resolution mechanisms:

Module Has resolution/invalidation?
reflection-item-store.ts ❌ No status field
reflection-ranking.ts ❌ Only computes decay score
reflection-recall.ts ❌ Only filters by age
reflection-aggregation.ts ❌ Only groups by key
reflection-selection.ts ❌ Only diversity selection
recall-engine.ts ❌ Only per-session dedup
adaptive-retrieval.ts ❌ Only skips greetings/commands
noise-filter.ts ❌ Only filters refusals/meta-questions
final-topk-setwise-selection.ts ❌ Only intra-path dedup
index.ts (before_prompt_build) ❌ No cross-path suppression

No module in the entire pipeline provides a mechanism to invalidate or suppress resolved reflection items.

Reproduction Steps

  1. Session A: Encounter a problem (e.g., rerank misconfiguration). Plugin reflection extracts 6 derived lessons about the problem.
  2. Session B: Solve the problem. Confirm it's working.
  3. Session C, D, E...: The 6 stale derived lessons are still injected as <derived-focus>. Agent is misled into suggesting "the next useful action is a contrastive retrieval test" even though rerank is already resolved.
  4. Stale items persist for up to maxAgeDays (default 45 for invariant, configurable for derived).

Proposed Solutions (in order of implementation effort)

Option A (Minimal): self_improvement_resolve tool

  • Add a new agent tool: self_improvement_reflection_resolve(query | id)
  • Marks matching reflection items as resolved (adds a resolvedAt timestamp to metadata)
  • reflection-recall.ts skips items where resolvedAt is set
  • Effort: Low. Touches tools.ts, reflection-item-store.ts, reflection-recall.ts.
  • Tradeoff: Requires the agent/user to know to call the tool. Not automatic.

Option B (Medium): Cross-pipeline suppression via memory signals

  • When a new memory entry is stored (via memory_store) that semantically contradicts a reflection item, automatically discount that reflection item.
  • Implementation: During reflection recall scoring, check if any stored memory (from the auto-recall pipeline) has high semantic similarity to a reflection item but with opposite intent (e.g., "rerank is working" vs "rerank needs contrastive test"). If so, apply an additional decay multiplier.
  • Effort: Medium. Requires cross-referencing between the two pipelines.
  • Tradeoff: Needs good classification of "contradictory" vs "supporting" signals.

Option C (Full): superseded status + lifecycle management

  • Add status field to reflection items: active | resolved | superseded
  • Add self_improvement_reflection_supersede(strictKey, reason) tool
  • New reflection items with the same strictKey automatically mark older items as superseded
  • Reflection recall only returns items with status === 'active'
  • Effort: Medium-High. Touches store, scoring, injection, tools.
  • Tradeoff: Most robust solution, but adds complexity to the metadata schema.

Environment

  • memory-lancedb-pro version: 1.1.0-beta.6
  • OpenClaw version: 2026.3.22+
  • sessionStrategy: memoryReflection
  • memoryReflection.injectMode: inheritance+derived
  • memoryReflection.recall.mode: dynamic

Source Files Referenced

  • src/reflection-item-store.ts — item metadata and decay defaults
  • src/reflection-ranking.ts — logistic decay scoring
  • src/reflection-recall.ts — dynamic recall ranking
  • src/reflection-aggregation.ts — group aggregation and scoring
  • src/reflection-selection.ts — diversity-aware selection
  • src/recall-engine.ts — repeated-injection guard and age filtering
  • src/adaptive-retrieval.ts — query skip logic
  • src/noise-filter.ts — content quality filtering
  • src/final-topk-setwise-selection.ts — final top-k selection (shared by both paths)
  • index.tsbefore_prompt_build hook and injection orchestration

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions