Skip to content

feat(i18n): German language support + fix fire-and-forget race in agent_end#406

Open
Banger455 wants to merge 3 commits intoCortexReach:masterfrom
Banger455:fix/german-i18n-and-autocapture
Open

feat(i18n): German language support + fix fire-and-forget race in agent_end#406
Banger455 wants to merge 3 commits intoCortexReach:masterfrom
Banger455:fix/german-i18n-and-autocapture

Conversation

@Banger455
Copy link
Copy Markdown

Summary

Fixes #393 — Memory store reports STORE-OK but recall in a new session returns RECALLED: NONE for German-speaking users.

Root cause

Three separate bugs compound into a silent data-loss scenario:

  1. Missing German trigger patternsMEMORY_TRIGGERS, AUTO_CAPTURE_EXPLICIT_REMEMBER_RE, and FORCE_RETRIEVE_PATTERNS cover English, Czech, and Chinese but have no German entries. Phrases like "Merke dir: X" fall through shouldCapture() → nothing is stored.
  2. Fire-and-forget void backgroundRun in the agent_end hook — works in gateway mode (persistent process) but races against process.exit() in --local / embedded CLI mode. The capture Promise gets killed before it flushes to LanceDB.

Changes

Commit 1 — feat(i18n): add German language triggers

Array File Added patterns
MEMORY_TRIGGERS index.ts 5 regexes: "merke dir", "vergiss nicht", "ich bevorzuge", "wir haben entschieden", "ab jetzt/sofort/in Zukunft", personal info
AUTO_CAPTURE_EXPLICIT_REMEMBER_RE index.ts German + English explicit remember commands
FORCE_RETRIEVE_PATTERNS src/adaptive-retrieval.ts 2 regexes: "erinnerst du dich", "weißt du noch", "gestern", "habe ich erwähnt", etc.

Commit 2 — fix(auto-capture): await backgroundRun with timeout

  • Hook becomes async; void backgroundRun replaced by await Promise.race([backgroundRun, 15s timeout])
  • AgentEndAutoCaptureHook type widened to Promise<void> | void
  • Session-lock / channel-delivery safety (Issue autoCapture + smartExtraction agent_end hook causes Telegram messages to be silently dropped #260): OpenClaw core calls runAgentEnd() fire-and-forget with .catch(), so the new await does not block session locks or downstream deliveries in gateway mode. The timeout is a safety net for hung API calls only.

Note on extractMinMessages

The default value of extractMinMessages (currently 4) means smart extraction never fires for CLI one-shot commands (only 1 eligible text). We worked around this with a config override (extractMinMessages: 1), but the default might be worth revisiting upstream — leaving that decision to the maintainers.

Verification (on live OpenClaw instance)

Test Result
Gateway log shows auto-capture regex fallback found 1 capturable text(s)auto-captured 1 memories
openclaw memory-pro search "Emerald Bear" ✅ Found with 75% score
Recall in fresh session: "Welche Testtokens kennst du?" ✅ All 6 test tokens correctly recalled
Memory stats: 0 garbage ✅ 32 clean memories

Copy link
Copy Markdown
Collaborator

@rwmjhb rwmjhb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Banger455 — both the German i18n support and the agent_end race fix address real user pain points (issue #260). A few things need attention before this can merge.

Must Fix

Build failuretest/plugin-manifest-regression.mjs fails because the test expects api.hooks["command:new"] === undefined, but appendSelfImprovementNote already registers on command:new upstream. This is likely a pre-existing main/test mismatch rather than something your diff introduced, but the branch still needs to be green. A rebase onto latest main and re-verification should clarify this.

German regex issues (MEDIUM)

Three related problems with the new German trigger patterns:

  1. False positives from substring matching/immer|niemals|wichtig/i at index.ts:1266 matches inside compound words: Zimmermann, Schwimmerin, Flimmern, Wichtigkeit. Consider using word boundary assertions (\b) or a more targeted pattern.

  2. Ungrouped alternation in fact triggers/mein\s+\w+\s+ist|heißt|wohne|arbeite/i at index.ts:1265 causes heißt, wohne, and arbeite to match anywhere, not just after mein. Sentences like Er heißt Peter or Die Stadt heißt Berlin will spuriously trigger capture. Wrap the alternation: /mein\s+\w+\s+ist|mein\s+\w+\s+heißt|ich\s+wohne|ich\s+arbeite/i or similar.

  3. Inconsistency between explicit-remember detection and capture triggersisExplicitRememberCommand at index.ts:741 uses merke?\s+dir while shouldCapture at index.ts:1262 uses merk es dir. Result: "merk es dir" bypasses the explicit path but hits the capture path; "vergiss das nicht" does the opposite. These should be aligned.

Nice to have

  • Timer cleanup (index.ts:2895-2904): The setTimeout in the Promise.race keeps Node alive up to 15s after backgroundRun resolves. Consider clearTimeout in the .then() handler. Minor, not blocking.
  • Missing tests: The new German regex patterns and async hook timeout have no direct test coverage. Given the regex issues above, adding a few positive/negative test cases would catch future regressions.

…eval

Add German patterns to three trigger arrays that previously only supported
English, Czech, and Chinese:

- MEMORY_TRIGGERS: detect "merke dir", "vergiss nicht", "ich bevorzuge",
  "wir haben entschieden", "ab jetzt", personal info patterns, etc.
- AUTO_CAPTURE_EXPLICIT_REMEMBER_RE: match "merke dir" / "vergiss nicht"
  as explicit remember commands (also adds English "remember this")
- FORCE_RETRIEVE_PATTERNS (adaptive-retrieval.ts): match "erinnerst du
  dich", "weißt du noch", "gestern", "habe ich erwähnt", etc.

Without these patterns, German-speaking users' store requests silently
fall through shouldCapture() and retrieval skips force-retrieve, causing
RECALLED: NONE in new sessions.

Fixes CortexReach#393
The agent_end hook used `void backgroundRun` (fire-and-forget), which
works in gateway mode (persistent process) but races against process.exit()
in `--local` / embedded CLI mode — the capture Promise gets killed before
it can flush to LanceDB.

Changes:
- Make the hook async and await backgroundRun with a 15 s timeout
- Widen the AgentEndAutoCaptureHook type to allow Promise<void>

The timeout is a safety net only; in gateway mode the await resolves
instantly because the process stays alive.  Session-lock and channel-
delivery timing is unaffected because OpenClaw core already calls
runAgentEnd() as fire-and-forget with .catch() (cf. Issue CortexReach#260).

Fixes CortexReach#393
…erage

Addresses all review feedback from @rwmjhb:

Regex precision (MUST FIX):
- Add \b word boundaries to /immer|niemals|wichtig/ to prevent false
  positives on compound words like Zimmermann, Schwimmerin, Wichtigkeit
- Fix ungrouped alternation in personal info trigger: heißt/wohne/arbeite
  now require proper context (mein X heißt / ich wohne / ich arbeite)
- Align isExplicitRememberCommand and MEMORY_TRIGGERS to cover all
  German remember variants consistently (merk dir, merke dir, merk es
  dir, vergiss nicht, vergiss das nicht, nicht vergessen)

Timer cleanup (NICE TO HAVE):
- Store setTimeout handle and clearTimeout in finally block to prevent
  the 15s safety timer from keeping Node alive after backgroundRun resolves

Test coverage (NICE TO HAVE):
- german-i18n-triggers.test.mjs: 54 test cases covering shouldCapture
  (24 positive, 9 false-positive prevention, 15 retrieval, 6 consistency)
- agent-end-async-capture.test.mjs: 4 test cases covering async hook
  behavior (Promise return, error swallowing, __lastRun, early return)

Note on build failure: plugin-manifest-regression.mjs L155 expects
command:new === undefined, but selfImprovement now defaults to enabled
(26abb04). This is a pre-existing upstream test mismatch unrelated to
this PR.
@Banger455 Banger455 force-pushed the fix/german-i18n-and-autocapture branch from 5572d15 to e773153 Compare March 30, 2026 20:44
@Banger455
Copy link
Copy Markdown
Author

Thanks for the thorough review @rwmjhb! All points addressed in e773153:

Regex fixes (MUST FIX) ✅

  1. Word boundaries/immer|niemals|wichtig/ now uses \b(immer|niemals|wichtig)\b to prevent false positives on Zimmermann, Schwimmerin, Flimmern, Wichtigkeit, etc.

  2. Ungrouped alternation — rewritten to mein\s+\w+\s+(?:ist|heißt)|ich\s+(?:wohne|arbeite)\b so heißt/wohne/arbeite require proper context. "Die Stadt heißt Berlin" no longer triggers.

  3. Explicit-remember consistency — Both AUTO_CAPTURE_EXPLICIT_REMEMBER_RE and MEMORY_TRIGGERS now cover the same German variants: merk dir, merke dir, merk es dir, vergiss nicht, vergiss das nicht, nicht vergessen, erinnere dich, erinner dich.

Timer cleanup (NICE TO HAVE) ✅

setTimeout handle is now stored and clearTimeout'd in a finally block so the 15s safety timer doesn't keep Node alive after backgroundRun resolves.

Test coverage (NICE TO HAVE) ✅

Two new test files, 58 test cases total:

  • test/german-i18n-triggers.test.mjs (54 cases)

    • shouldCapture positive: 24 cases (remember commands, preferences, decisions, personal info, emphasis)
    • shouldCapture false-positive prevention: 9 cases (compound words, missing context prefix, length bounds)
    • shouldSkipRetrieval force-retrieve: 15 cases (German retrieval triggers + short-command skip verification)
    • Explicit remember consistency: 6 cases (all German variants padded above length threshold)
  • test/agent-end-async-capture.test.mjs (4 cases)

    • Hook returns Promise (async signature verification)
    • Error swallowing (unreachable endpoint doesn't throw)
    • __lastRun stores a Promise for downstream consumers
    • Early return for empty events (no 15s timer leak)

Build failure note

The plugin-manifest-regression.mjs L155 failure (command:new === undefined) is a pre-existing upstream mismatch: 26abb04 changed selfImprovement.enabled to default true, which registers on command:new, but the test wasn't updated. This is unrelated to our diff — happy to fix it in a separate PR if helpful.

Rebase

Branch rebased onto latest master (7fe2ae0).

Copy link
Copy Markdown
Collaborator

@AliceLJY AliceLJY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the well-structured PR! Two things to confirm before merging:

  1. async hook safety: Making agentEndAutoCaptureHook async is the right direction for --local embedded mode, but could you confirm the OpenClaw core runAgentEnd() call path handles unhandled rejections from async hooks correctly? If core has any special handling for sync-only hooks, the async conversion could silently swallow errors in edge cases.

  2. German immer trigger precision: /\b(immer|niemals|wichtig)\b/iimmer is a very common German adverb ("always", "still") and could produce false-positive captures on ordinary sentences like "Das ist immer so". Would you consider scoping it to explicit memory-intent contexts like /\b(immer\s+(?:wenn|daran|denken)|niemals|wichtig)\b/i, or adding it to a lower-confidence tier? The merke dir / vergiss nicht patterns look solid.

Happy to approve once these two points are addressed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Auto-capture silently fails: missing German/i18n triggers + extractMinMessages too high for short conversations

3 participants