Skip to content

memory: provenance tagging + injection-scanner gate (prompt-injection defence)#599

Open
TinkerOfThings wants to merge 2 commits into
open-jarvis:mainfrom
TinkerOfThings:memory-injection-defence
Open

memory: provenance tagging + injection-scanner gate (prompt-injection defence)#599
TinkerOfThings wants to merge 2 commits into
open-jarvis:mainfrom
TinkerOfThings:memory-injection-defence

Conversation

@TinkerOfThings

Copy link
Copy Markdown

Summary

Hardens the automatic long-term-memory path against prompt injection. The fact extractor distils facts from raw chat exchanges that can contain hostile input (scraped pages, tool output, pasted content); those facts are stored and surfaced unfiltered. This adds provenance, an injection gate, and quarantine-on-surfacing — and revives the injection scanner in Rust-less environments.

Changes

  • Provenancememory.store.Fact gains a trust tier, round-tripped through JSONL (legacy facts default to "" = trusted). FactStore.add/add_many accept trust.
  • Injection gateMemoryService scans each exchange with InjectionScanner before extraction; an overt injection attempt is suppressed (never reaches the extraction model or the store). Scanning fails open — a scanner error can't block memory, since provenance still applies. build_memory_service wires a guarded default scanner.
  • Untrusted tagging — all auto-extracted facts are stored trust="untrusted".
  • Quarantine on surfacingjarvis memory list shows a Trust column (⚠ untrusted) plus a data-not-instructions warning.
  • Scanner Python fallbackInjectionScanner hard-required the openjarvis_rust extension and crashed on construction when it wasn't built. Added a pure-Python fallback using its own _INJECTION_PATTERNS, mirroring the RUST_AVAILABLE fallback pattern used elsewhere (e.g. security.ssrf). This fixes 10 previously-failing scanner tests in Rust-less environments.

Testing

  • Test-first throughout; 6 new tests covering provenance round-trip, untrusted tagging, skip-on-injection, fail-open, and the CLI quarantine marker.
  • No regressions: affected suites went 114→104 failing (10 fixed). Remaining failures are all pre-existing openjarvis_rust-extension dependencies, unrelated to this change.

🤖 Generated with Claude Code

… defence)

The automatic memory extractor distils facts from raw chat exchanges that may
contain hostile input. Harden that path:

- store.Fact gains a `trust` tier, round-tripped through JSONL (legacy facts
  default to "" = trusted). FactStore.add/add_many accept `trust`.
- MemoryService stores all auto-extracted facts as trust="untrusted", and now
  scans each exchange with InjectionScanner BEFORE extraction — an overt
  injection attempt is suppressed (never reaches the extraction model or store).
  Scanning fails open (a scanner error can't block memory; provenance still
  applies). build_memory_service wires a guarded default scanner.
- `jarvis memory list` surfaces a Trust column (⚠ untrusted) + a
  data-not-instructions warning.

Also gives InjectionScanner a pure-Python fallback (its own _INJECTION_PATTERNS)
when the openjarvis_rust extension isn't built — mirrors the RUST_AVAILABLE
fallback pattern used elsewhere. This fixes 10 previously-failing scanner tests
in rust-less environments.

No regressions: affected suites 114→104 failing (10 fixed), +6 new tests; the
remaining failures are all pre-existing openjarvis_rust dependencies.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ening)

The JSON-array path is unchanged; the line fallback now accepts ONLY genuine
list items (bullets/numbered), not arbitrary prose lines. Prevents a model from
being steered into minting facts out of narrative or injected text. The bullet
fallback (test_line_fallback_for_bullets) still works.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant