memory: provenance tagging + injection-scanner gate (prompt-injection defence)#599
Open
TinkerOfThings wants to merge 2 commits into
Open
memory: provenance tagging + injection-scanner gate (prompt-injection defence)#599TinkerOfThings wants to merge 2 commits into
TinkerOfThings wants to merge 2 commits into
Conversation
… defence) The automatic memory extractor distils facts from raw chat exchanges that may contain hostile input. Harden that path: - store.Fact gains a `trust` tier, round-tripped through JSONL (legacy facts default to "" = trusted). FactStore.add/add_many accept `trust`. - MemoryService stores all auto-extracted facts as trust="untrusted", and now scans each exchange with InjectionScanner BEFORE extraction — an overt injection attempt is suppressed (never reaches the extraction model or store). Scanning fails open (a scanner error can't block memory; provenance still applies). build_memory_service wires a guarded default scanner. - `jarvis memory list` surfaces a Trust column (⚠ untrusted) + a data-not-instructions warning. Also gives InjectionScanner a pure-Python fallback (its own _INJECTION_PATTERNS) when the openjarvis_rust extension isn't built — mirrors the RUST_AVAILABLE fallback pattern used elsewhere. This fixes 10 previously-failing scanner tests in rust-less environments. No regressions: affected suites 114→104 failing (10 fixed), +6 new tests; the remaining failures are all pre-existing openjarvis_rust dependencies. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ening) The JSON-array path is unchanged; the line fallback now accepts ONLY genuine list items (bullets/numbered), not arbitrary prose lines. Prevents a model from being steered into minting facts out of narrative or injected text. The bullet fallback (test_line_fallback_for_bullets) still works. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hardens the automatic long-term-memory path against prompt injection. The fact extractor distils facts from raw chat exchanges that can contain hostile input (scraped pages, tool output, pasted content); those facts are stored and surfaced unfiltered. This adds provenance, an injection gate, and quarantine-on-surfacing — and revives the injection scanner in Rust-less environments.
Changes
memory.store.Factgains atrusttier, round-tripped through JSONL (legacy facts default to""= trusted).FactStore.add/add_manyaccepttrust.MemoryServicescans each exchange withInjectionScannerbefore extraction; an overt injection attempt is suppressed (never reaches the extraction model or the store). Scanning fails open — a scanner error can't block memory, since provenance still applies.build_memory_servicewires a guarded default scanner.trust="untrusted".jarvis memory listshows a Trust column (⚠ untrusted) plus a data-not-instructions warning.InjectionScannerhard-required theopenjarvis_rustextension and crashed on construction when it wasn't built. Added a pure-Python fallback using its own_INJECTION_PATTERNS, mirroring theRUST_AVAILABLEfallback pattern used elsewhere (e.g.security.ssrf). This fixes 10 previously-failing scanner tests in Rust-less environments.Testing
openjarvis_rust-extension dependencies, unrelated to this change.🤖 Generated with Claude Code