Skip to content

System / untrusted metadata envelope can still leak into durable memory and get auto-recalled #394

@luoyejiaoe-source

Description

@luoyejiaoe-source

Summary

In real OpenClaw channel traffic (especially Feishu / message-wrapped sessions), memory-lancedb-pro can still persist long-term memories containing channel/system envelope text such as:

  • System: [2026-...] ...
  • Conversation info (untrusted metadata)
  • Sender (untrusted metadata)
  • [Queued messages while agent was busy]

This leads to low-quality memories being auto-recalled repeatedly, sometimes with high access_count, which reinforces the pollution over time.

Why this matters

The plugin already has:

  • inbound metadata stripping logic in index.ts
  • noise filtering in src/noise-filter.ts

But in practice, some envelope text still leaks through and gets classified by smart extraction as events / entities / decisions.

Once these entries are recalled multiple times, lifecycle reinforcement can make them harder to naturally decay.

Observed behavior

Durable memory entries were stored with text containing full blocks like:

  • System: [timestamp] Feishu[default] DM ...
  • fenced JSON under Conversation info (untrusted metadata)
  • fenced JSON under Sender (untrusted metadata)

These entries were later auto-recalled as if they were meaningful long-term facts.

Likely cause

Two things seem to combine here:

  1. stripLeadingInboundMetadata() only strips a fairly strict leading block format
  • metadata must appear at the start
  • metadata must match expected sentinel + fenced JSON structure
  1. isNoise() does not currently treat system/envelope text as noise
  • System: [
  • Conversation info (untrusted metadata)
  • Sender (untrusted metadata)
  • queued-message wrappers

So if prefix stripping misses even slightly, the text can still pass into smart extraction and/or regex fallback.

Minimal fix proposal

Add a second hard-stop layer:

1. In src/noise-filter.ts

Treat system/envelope markers as noise:

  • ^System:\s*\[
  • Conversation info \(untrusted metadata\)
  • Sender \(untrusted metadata\)
  • Thread starter \(untrusted, for context\)
  • Replied message \(untrusted, for context\)
  • Forwarded message context \(untrusted metadata\)
  • Chat history since last reply \(untrusted, for context\)
  • \[Queued messages while agent was busy\]

2. In normalizeAutoCaptureText()

After prefix stripping, if any of the above markers still remain in the normalized text, return null directly.

This creates a dual-layer defense:

  • strip when possible
  • drop if any envelope markers remain

Notes

Also worth checking whether the current published package actually matches the README claim that auto-recall uses before_prompt_build. On my local installed copy, auto-recall is still registered on before_agent_start.

That may be unrelated to the pollution bug itself, but it suggests runtime/package drift between docs and installed code.

Local workaround used

I applied the minimal dual-layer filter above locally, and it is a safe, low-risk patch because it only blocks obvious channel/system envelope artifacts from becoming durable memory.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions