System / untrusted metadata envelope can still leak into durable memory and get auto-recalled

## Summary
In real OpenClaw channel traffic (especially Feishu / message-wrapped sessions), `memory-lancedb-pro` can still persist long-term memories containing channel/system envelope text such as:

- `System: [2026-...] ...`
- `Conversation info (untrusted metadata)`
- `Sender (untrusted metadata)`
- `[Queued messages while agent was busy]`

This leads to low-quality memories being auto-recalled repeatedly, sometimes with high `access_count`, which reinforces the pollution over time.

## Why this matters
The plugin already has:
- inbound metadata stripping logic in `index.ts`
- noise filtering in `src/noise-filter.ts`

But in practice, some envelope text still leaks through and gets classified by smart extraction as `events / entities / decisions`.

Once these entries are recalled multiple times, lifecycle reinforcement can make them harder to naturally decay.

## Observed behavior
Durable memory entries were stored with text containing full blocks like:

- `System: [timestamp] Feishu[default] DM ...`
- fenced JSON under `Conversation info (untrusted metadata)`
- fenced JSON under `Sender (untrusted metadata)`

These entries were later auto-recalled as if they were meaningful long-term facts.

## Likely cause
Two things seem to combine here:

1. `stripLeadingInboundMetadata()` only strips a fairly strict *leading block* format
  - metadata must appear at the start
  - metadata must match expected sentinel + fenced JSON structure

2. `isNoise()` does not currently treat system/envelope text as noise
  - `System: [`
  - `Conversation info (untrusted metadata)`
  - `Sender (untrusted metadata)`
  - queued-message wrappers

So if prefix stripping misses even slightly, the text can still pass into smart extraction and/or regex fallback.

## Minimal fix proposal
Add a second hard-stop layer:

### 1. In `src/noise-filter.ts`
Treat system/envelope markers as noise:

- `^System:\s*\[`
- `Conversation info \(untrusted metadata\)`
- `Sender \(untrusted metadata\)`
- `Thread starter \(untrusted, for context\)`
- `Replied message \(untrusted, for context\)`
- `Forwarded message context \(untrusted metadata\)`
- `Chat history since last reply \(untrusted, for context\)`
- `\[Queued messages while agent was busy\]`

### 2. In `normalizeAutoCaptureText()`
After prefix stripping, if any of the above markers still remain in the normalized text, return `null` directly.

This creates a dual-layer defense:
- strip when possible
- drop if any envelope markers remain

## Notes
Also worth checking whether the current published package actually matches the README claim that auto-recall uses `before_prompt_build`. On my local installed copy, auto-recall is still registered on `before_agent_start`.

That may be unrelated to the pollution bug itself, but it suggests runtime/package drift between docs and installed code.

## Local workaround used
I applied the minimal dual-layer filter above locally, and it is a safe, low-risk patch because it only blocks obvious channel/system envelope artifacts from becoming durable memory.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System / untrusted metadata envelope can still leak into durable memory and get auto-recalled #394

Summary

Why this matters

Observed behavior

Likely cause

Minimal fix proposal

1. In `src/noise-filter.ts`

2. In `normalizeAutoCaptureText()`

Notes

Local workaround used

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

System / untrusted metadata envelope can still leak into durable memory and get auto-recalled #394

Description

Summary

Why this matters

Observed behavior

Likely cause

Minimal fix proposal

1. In src/noise-filter.ts

2. In normalizeAutoCaptureText()

Notes

Local workaround used

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. In `src/noise-filter.ts`

2. In `normalizeAutoCaptureText()`