-
Notifications
You must be signed in to change notification settings - Fork 594
Description
Summary
In real OpenClaw channel traffic (especially Feishu / message-wrapped sessions), memory-lancedb-pro can still persist long-term memories containing channel/system envelope text such as:
System: [2026-...] ...Conversation info (untrusted metadata)Sender (untrusted metadata)[Queued messages while agent was busy]
This leads to low-quality memories being auto-recalled repeatedly, sometimes with high access_count, which reinforces the pollution over time.
Why this matters
The plugin already has:
- inbound metadata stripping logic in
index.ts - noise filtering in
src/noise-filter.ts
But in practice, some envelope text still leaks through and gets classified by smart extraction as events / entities / decisions.
Once these entries are recalled multiple times, lifecycle reinforcement can make them harder to naturally decay.
Observed behavior
Durable memory entries were stored with text containing full blocks like:
System: [timestamp] Feishu[default] DM ...- fenced JSON under
Conversation info (untrusted metadata) - fenced JSON under
Sender (untrusted metadata)
These entries were later auto-recalled as if they were meaningful long-term facts.
Likely cause
Two things seem to combine here:
stripLeadingInboundMetadata()only strips a fairly strict leading block format
- metadata must appear at the start
- metadata must match expected sentinel + fenced JSON structure
isNoise()does not currently treat system/envelope text as noise
System: [Conversation info (untrusted metadata)Sender (untrusted metadata)- queued-message wrappers
So if prefix stripping misses even slightly, the text can still pass into smart extraction and/or regex fallback.
Minimal fix proposal
Add a second hard-stop layer:
1. In src/noise-filter.ts
Treat system/envelope markers as noise:
^System:\s*\[Conversation info \(untrusted metadata\)Sender \(untrusted metadata\)Thread starter \(untrusted, for context\)Replied message \(untrusted, for context\)Forwarded message context \(untrusted metadata\)Chat history since last reply \(untrusted, for context\)\[Queued messages while agent was busy\]
2. In normalizeAutoCaptureText()
After prefix stripping, if any of the above markers still remain in the normalized text, return null directly.
This creates a dual-layer defense:
- strip when possible
- drop if any envelope markers remain
Notes
Also worth checking whether the current published package actually matches the README claim that auto-recall uses before_prompt_build. On my local installed copy, auto-recall is still registered on before_agent_start.
That may be unrelated to the pollution bug itself, but it suggests runtime/package drift between docs and installed code.
Local workaround used
I applied the minimal dual-layer filter above locally, and it is a safe, low-risk patch because it only blocks obvious channel/system envelope artifacts from becoming durable memory.