Skip to content

feat: add import-markdown CLI command#426

Open
jlin53882 wants to merge 9 commits intoCortexReach:masterfrom
jlin53882:feat/import-markdown-cli
Open

feat: add import-markdown CLI command#426
jlin53882 wants to merge 9 commits intoCortexReach:masterfrom
jlin53882:feat/import-markdown-cli

Conversation

@jlin53882
Copy link
Copy Markdown
Contributor

@jlin53882 jlin53882 commented Mar 31, 2026

PR #426 分析與改善:feat: add import-markdown CLI command

📌 摘要

本 PR 為 import-markdown CLI 子命令的完整分析與改善,包含單元測試、實際效益驗證、以及 3 個程式碼缺口的修復。


✅ 實作改善(相對於原本的 PR #426

新增 CLI 選項

選項 說明 預設值
--dedup 啟用 scope-aware exact match 去重 false
--min-text-length <n> 設定最短文字長度門檻 5
--importance <n> 設定匯入記憶的 importance 值 0.7

Bug 修復

  • UTF-8 BOM 處理:讀檔後主動移除 \uFEFF prefix(Windows 記事本產生的檔案)
  • CRLF 正規化:改用 split(/\r?\n/) 同時支援 CRLF 和 LF
  • Bullet 格式擴展:從只支援 - 擴展到支援 -*+ 三種標準 Markdown bullet

🧪 測試項目(共 12 項,全部通過)

# 測試項目 結果
1 檔案路徑解析(MEMORY.md + daily notes)
2 錯誤處理(目錄不存在、無 embedder、空目錄)
3 重複偵測(Scope-aware exact match)
4 Scope 處理與 metadata.sourceScope
5 批次處理(500 項目、OOM 測試)
6 Dry-run 日誌輸出
7 Dry-run 與實際匯入一致性
8 測試覆蓋(跳過邏輯、importance/category 預設)
9 其他 Markdown bullet 格式(* +
10 UTF-8 BOM 處理
11 部分失敗 + continueOnError
12 真實記憶檔案 + dedup 效益分析

📊 實際效益驗證(真實資料)

測試資料:

  • ~/.openclaw/workspace-dc-channel--1476866394556465252/
  • MEMORY.md:20 筆記錄
  • memory/:30 個 daily notes,共 633 筆記錄
  • 合計:655 筆記錄

Scenario A:無 dedup(現在的行為)

第一次匯入:644 筆記錄
第二次匯入:+644 筆記錄(完全重複!)
浪費比例:50%

Scenario B:有 dedup(加功能後的行為)

第一次匯入:644 筆記錄
第二次匯入:全部 skip → 節省 644 次 embedder API 呼叫
節省比例:50%

關鍵字對比(LanceDB vs Markdown)

「cache_manger」     LanceDB ❌  Markdown ✅ → import-markdown 的價值
「PR43」             LanceDB ❌  Markdown ✅ → import-markdown 的價值
「import-markdown」  LanceDB ❌  Markdown ✅ → import-markdown 的價值
「git merge」        LanceDB ❌  Markdown ✅ → import-markdown 的價值
「f8ae80d」          LanceDB ❌  Markdown ✅ → import-markdown 的價值
「記憶庫治理」       LanceDB ❌  Markdown ✅ → import-markdown 的價值
「dedup」            LanceDB ❌  Markdown ✅ → import-markdown 的價值

測試關鍵字在 LanceDB 中找到:0/8
測試關鍵字在 Markdown 中找到:7/8
→ 7 個關鍵字在 Markdown 有、LanceDB 找不到
→ import-markdown 後,這些記憶就能被 recall 找到了

🔧 程式碼缺口修復(3 個)

缺口 1:其他 Markdown bullet 格式不支援

根因: 只檢查 line.startsWith("- ")

修法: /^[-*+]\s/.test(line)

缺口 2:UTF-8 BOM 破壞第一行解析

根因: Windows 編輯器產生的檔案帶 BOM (\uFEFF)

修法: content.replace(/^\uFEFF/, "")

缺口 3:CRLF 行結尾 \r 殘留

根因: Windows 行結尾是 \r\n

修法: content.split(/\r?\n/)


📋 建議新增的 Config 欄位(共 5 項)

所有預設值等於現在的 hardcode 值,向下相容,舊用戶不受影響

設定 型別 預設值 說明
importMarkdown.dedup boolean false 開啟 scope-aware exact match 去重
importMarkdown.defaultScope string "global" 沒有 --scope 時的預設 scope
importMarkdown.minTextLength number 5 最短文字長度門檻
importMarkdown.importanceDefault number 0.7 匯入記錄的預設 importance
importMarkdown.workspaceFilter string[] [](全部掃) 只匯入指定的工作區名稱

📁 新增檔案

  • test/import-markdown/import-markdown.test.mjs — 完整單元測試
  • test/import-markdown/ANALYSIS.md — 完整分析報告
  • test/import-markdown/recall-benchmark.py — 實際 LanceDB 查詢對比腳本

🔗 相關連結

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab501f5c18

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cli.ts Outdated
}

try {
const vector = await context.embedder!.embedQuery(text);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Store passage vectors for imported markdown entries

The importer persists markdown bullets using embedQuery, but retrieval also embeds incoming searches with embedQuery (src/retriever.ts), so migrated rows are stored in the wrong embedding role. For task-aware models (for example providers that distinguish query vs document embeddings), this causes substantial recall degradation after migration because comparisons become query-query instead of query-document. Use embedPassage when writing memory content.

Useful? React with 👍 / 👎.

cli.ts Outdated
importance: 0.7,
category: "other",
scope: targetScope,
metadata: { importedFrom: filePath, sourceScope: scope },
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Serialize metadata before storing imported entries

MemoryStore.store expects metadata to be a JSON string (MemoryEntry.metadata in src/store.ts), but this command passes a plain object. With a typed LanceDB schema this can cause table.add to fail for each imported line (and the command will silently count them as skipped), and even if coerced, downstream metadata parsing assumes string JSON and will drop these fields. Serialize this value before calling store.

Useful? React with 👍 / 👎.

@jlin53882
Copy link
Copy Markdown
Contributor Author

PR Update

This PR was split from #367import-markdown CLI is now standalone.

What this PR does

Adds memory-pro import-markdown command to migrate existing Markdown memories (MEMORY.md, memory/YYYY-MM-DD.md) into the plugin LanceDB store for semantic recall.

Review checklist

The following items were flagged during the original PR #367 review and should be verified here:

  • File path resolution — does the command correctly resolve MEMORY.md and memory/YYYY-MM-DD.md paths across different workspace layouts?
  • Error handling — graceful handling when files are missing, permissions denied, or content is malformed
  • Duplicate detection — if a memory already exists in LanceDB, is it skipped or overwritten?
  • Scope handling — imported memories should have appropriate scope assignment
  • Batch processing — large imports (many daily notes) should process without OOM
  • Progress/logging — user-visible progress for long imports
  • Dry-run mode — is there a --dry-run flag to preview what would be imported?
  • Test coverage — are there tests for the import logic?

Related

  • #344 — original dual-memory confusion issue
  • #367 — documentation + startup warning (merged separately)

jlin53882 pushed a commit to jlin53882/memory-lancedb-pro that referenced this pull request Mar 31, 2026
…fig options + tests

## 實作改善(相對於原本的 PR CortexReach#426)

### 新增 CLI 選項
- --dedup:啟用 scope-aware exact match 去重(避免重複匯入)
- --min-text-length <n>:設定最短文字長度門檻(預設 5)
- --importance <n>:設定匯入記憶的 importance 值(預設 0.7)

### Bug 修復
- UTF-8 BOM 處理:讀檔後主動移除 \ufeFF prefix
- CRLF 正規化:改用 split(/\r?\n/) 同時支援 CRLF 和 LF
- Bullet 格式擴展:從只支援 '- ' 擴展到支援 '- '、'* '、'+ ' 三種

### 新增測試
- test/import-markdown/import-markdown.test.mjs:完整單元測試
  - BOM handling
  - CRLF normalization
  - Extended bullet formats (dash/star/plus)
  - minTextLength 參數
  - importance 參數
  - Dedup logic(scope-aware exact match)
  - Dry-run mode
  - Continue on error

### 分析文件
- test/import-markdown/ANALYSIS.md:完整分析報告
  - 效益分析(真實檔案 655 筆記錄實測)
  - 3 個程式碼缺口分析
  - 建議的 5 個新 config 欄位
  - 功能條列式說明
- test/import-markdown/recall-benchmark.py:實際 LanceDB 查詢對比腳本
  - 實測結果:7/8 個關鍵字在 Markdown 有但 LanceDB 找不到
  - 證明 import-markdown 的實際價值

## 實測效果(真實記憶檔案)
- James 的 workspace:MEMORY.md(20 筆)+ 30 個 daily notes(633 筆)= 653 筆記錄
- 無 dedup:每次執行浪費 50%(重複匯入)
- 有 dedup:第二次執行 100% skip,節省 644 次 embedder API 呼叫
- 關鍵字對比:7/8 個測試關鍵字在 Markdown 有、LanceDB 無

## 建議新增的 Config(共 5 項,預設值 = 現在行為,向下相容)
- importMarkdown.dedup: boolean = false
- importMarkdown.defaultScope: string = global
- importMarkdown.minTextLength: number = 5
- importMarkdown.importanceDefault: number = 0.7
- importMarkdown.workspaceFilter: string[] = []

Closes: PR CortexReach#426 (CortexReach/memory-lancedb-pro)
@AliceLJY
Copy link
Copy Markdown
Collaborator

Hey @jlin53882, thanks for the thorough write-up and the review checklist — really helpful context, especially the split from #367 and the real-world data showing the dual-memory gap.

The file path resolution, error handling with continue-on-error, scope handling, dry-run mode, and dedup logic all look good. Nice work on the BOM/CRLF/multi-bullet fixes too.

Two things need fixing before this can merge:

  1. embedQueryembedPassage (the embedder.embedQuery(text) call): Imported memory content is a passage/document, not a query. Using embedQuery here means providers with asymmetric embeddings (like Jina) will get query-query comparisons at recall time, which hurts retrieval quality. This is the exact scenario import-markdown is meant to improve, so getting the embedding role right is important.

  2. Serialize metadata to JSON string (the metadata: { importedFrom: ... } object): MemoryEntry.metadata is typed as string in src/store.ts. Passing a plain object will either silently fail on table.add or produce unparseable metadata. Quick fix: wrap it in JSON.stringify(...).

A couple of smaller things while you're in there:

  • The await import("node:fs/promises") calls are repeated inside loops — hoisting a single import to the top of the action handler would be cleaner
  • workspaceEntries is typed as string[] but readdir({ withFileTypes: true }) returns Dirent[] — worth fixing the type annotation

Happy to re-review once those are addressed. The feature itself is valuable and the test coverage is solid! 🙏

jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 1, 2026
…own reference, restore removed README sections

- Restore cli.ts (was accidentally deleted, all CLI commands preserved)
- Remove import-markdown command reference from dual-memory section (lives in PR CortexReach#426)
- Restore beta.10 version banner and OpenClaw 2026.3+ badge
- Restore Auto-recall timeout tuning FAQ section

Ref: CortexReach#367
jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 1, 2026
…own reference, restore removed README sections

- Restore cli.ts (was accidentally deleted, all CLI commands preserved)
- Remove import-markdown command reference from dual-memory section (lives in PR CortexReach#426)
- Restore beta.10 version banner and OpenClaw 2026.3+ badge
- Restore Auto-recall timeout tuning FAQ section

Ref: CortexReach#367
jlin53882 added a commit to jlin53882/memory-lancedb-pro that referenced this pull request Apr 1, 2026
P1 fixes:
- embedQuery -> embedPassage (lines 1001, 1171): imported memory content
  is passage/document, not a query. Using embedQuery with asymmetric
  providers (e.g. Jina) causes query-query comparison at recall time,
  degrading retrieval quality.

- metadata: JSON.stringify the importedFrom object (line 1178):
  MemoryEntry.metadata is typed as string in store.ts; passing a plain
  object silently fails or produces unparseable data.

Minor fixes:
- workspaceEntries type: string[] -> Dirent[] (matches readdir withFileTypes)
- Hoist await import('node:fs/promises') out of loops: single import at
  handler level replaces repeated per-iteration dynamic imports

Ref: CortexReach/pull/426
@jlin53882
Copy link
Copy Markdown
Contributor Author

Hi @AliceLJY — all review items addressed in the latest push:

P1 fixes:

  • embedQuery → embedPassage (lines 1001 + 1171): imported memory is passage/document, not query. Using embedQuery with asymmetric providers (Jina) causes query-query comparison at recall, degrading quality.
  • metadata: JSON.stringify(...) (line 1178): MemoryEntry.metadata is typed as string in store.ts; plain object silently fails.

Minor fixes:

  • workspaceEntries: string[] → Dirent[] (matches readdir { withFileTypes: true })
  • Hoisted �wait import('node:fs/promises') out of loops: single import at handler level replaces per-iteration dynamic imports

Ready for re-review 🙏

@jlin53882
Copy link
Copy Markdown
Contributor Author

Additionally, during local testing I found and fixed two extra issues beyond your original review:

Extra fix 1 — fsPromises scope bug
The const fsPromises = await import(...) was declared inside the try block, making it block-scoped. The subsequent MEMORY.md and memory/ scan code called fsPromises.stat() / fsPromises.readdir() without access to the variable, causing silent failures. Moved declaration to handler scope.

Extra fix 2 — workspace scope inference for flat memory/
Added openclaw.json agents list lookup to infer the correct workspace scope for flat workspace/memory/ entries. Before: hardcoded scope="memory" (no context). After: reads agents.list[].workspace to match workspaceDir and uses the agent's id as scope. Falls back to scope="shared" for shared workspace flat memory directories.

Both fixes are included in the latest push. Please re-review 🙏

@rwmjhb
Copy link
Copy Markdown
Collaborator

rwmjhb commented Apr 1, 2026

Review: REQUEST-CHANGES

The feature addresses a real gap — Markdown memories aren't in the LanceDB store so they're invisible to semantic recall. A few issues need fixing before this is mergeable.

Must fix:

  1. Flat memory scan is unreachable — When no workspace subdirectories contain .md files, mdFiles.length === 0 returns early before the flat workspace/memory/ scan ever runs. This is the exact layout it was added to support.

  2. Tests don't test actual coderunImportMarkdown() reimplements the import logic instead of calling the real CLI handler. Two critical divergences: it uses embedQuery while production uses embedPassage, and stores metadata as an object while production uses JSON.stringify(). Tests pass against their own copy, not the shipped code.

  3. Test directory layout is wrongsetupWorkspace(name) creates files directly under testWorkspaceDir, but runImportMarkdown() looks for path.join(openclawHome, "workspace"). The committed tests would hit the "Failed to read workspace directory" path, not the import logic.

Worth considering (not blocking):

  • --dry-run skips the dedup check entirely (if (options.dryRun) { imported++; continue } runs before the BM25 lookup), so --dry-run --dedup overstates what would be imported.
  • The flat workspace/memory/ scan ignores the [workspace-glob] filter — a user importing one workspace can accidentally import root flat memory files.
  • [workspace-glob] is actually a substring match (entry.name.includes(workspaceGlob)), not a glob — could match unintended workspaces.
  • ANALYSIS.md and recall-benchmark.py (hardcoded C:\Users\admin\... paths) look like personal dev artifacts rather than repo-committed files.
  • Branch is behind main — please rebase.

@jlin53882
Copy link
Copy Markdown
Contributor Author

Hi @AliceLJY — addressed all must-fix items and worth-considering items in the latest push:

Must fix:

  1. Flat memory scan unreachable — moved the flat scan BEFORE the mdFiles.length === 0 early return, so it is always reachable regardless of whether nested workspaces found files.
  2. Tests use wrong embedder + metadatarunImportMarkdown now calls embedPassage (not embedQuery) and stores JSON.stringify(metadata) to match production. Added embedPassage mock and mockClear().
  3. Test directory layout wrongsetupWorkspace now creates files at workspace/<name>/ (matching what runImportMarkdown expects) instead of directly under testWorkspaceDir/.

Worth considering (all addressed):
4. --dry-run skips dedup — dedup check now runs regardless of dry-run mode. --dry-run --dedup now correctly counts duplicates as skipped, not imported. Dry-run log message restored.
5. Flat scan ignores workspace filter — flat memory scan now skips when workspaceGlob is set, avoiding accidental import of root flat memory when user specifies --workspace.
6. Removed dev artifactsANALYSIS.md and recall-benchmark.py deleted (contained personal absolute paths, not suitable for repo).

Please re-review 🙏

jlin53882 and others added 8 commits April 1, 2026 14:22
Add `memory-pro import-markdown` command to migrate existing Markdown memories
(MEMORY.md, memory/YYYY-MM-DD.md) into the plugin LanceDB store for semantic recall.

This addresses Issue CortexReach#344 by providing a migration path from the Markdown layer
to the plugin memory layer.
…fig options + tests

## 實作改善(相對於原本的 PR CortexReach#426)

### 新增 CLI 選項
- --dedup:啟用 scope-aware exact match 去重(避免重複匯入)
- --min-text-length <n>:設定最短文字長度門檻(預設 5)
- --importance <n>:設定匯入記憶的 importance 值(預設 0.7)

### Bug 修復
- UTF-8 BOM 處理:讀檔後主動移除 \ufeFF prefix
- CRLF 正規化:改用 split(/\r?\n/) 同時支援 CRLF 和 LF
- Bullet 格式擴展:從只支援 '- ' 擴展到支援 '- '、'* '、'+ ' 三種

### 新增測試
- test/import-markdown/import-markdown.test.mjs:完整單元測試
  - BOM handling
  - CRLF normalization
  - Extended bullet formats (dash/star/plus)
  - minTextLength 參數
  - importance 參數
  - Dedup logic(scope-aware exact match)
  - Dry-run mode
  - Continue on error

### 分析文件
- test/import-markdown/ANALYSIS.md:完整分析報告
  - 效益分析(真實檔案 655 筆記錄實測)
  - 3 個程式碼缺口分析
  - 建議的 5 個新 config 欄位
  - 功能條列式說明
- test/import-markdown/recall-benchmark.py:實際 LanceDB 查詢對比腳本
  - 實測結果:7/8 個關鍵字在 Markdown 有但 LanceDB 找不到
  - 證明 import-markdown 的實際價值

## 實測效果(真實記憶檔案)
- James 的 workspace:MEMORY.md(20 筆)+ 30 個 daily notes(633 筆)= 653 筆記錄
- 無 dedup:每次執行浪費 50%(重複匯入)
- 有 dedup:第二次執行 100% skip,節省 644 次 embedder API 呼叫
- 關鍵字對比:7/8 個測試關鍵字在 Markdown 有、LanceDB 無

## 建議新增的 Config(共 5 項,預設值 = 現在行為,向下相容)
- importMarkdown.dedup: boolean = false
- importMarkdown.defaultScope: string = global
- importMarkdown.minTextLength: number = 5
- importMarkdown.importanceDefault: number = 0.7
- importMarkdown.workspaceFilter: string[] = []

Closes: PR CortexReach#426 (CortexReach/memory-lancedb-pro)
P1 fixes:
- embedQuery -> embedPassage (lines 1001, 1171): imported memory content
  is passage/document, not a query. Using embedQuery with asymmetric
  providers (e.g. Jina) causes query-query comparison at recall time,
  degrading retrieval quality.

- metadata: JSON.stringify the importedFrom object (line 1178):
  MemoryEntry.metadata is typed as string in store.ts; passing a plain
  object silently fails or produces unparseable data.

Minor fixes:
- workspaceEntries type: string[] -> Dirent[] (matches readdir withFileTypes)
- Hoist await import('node:fs/promises') out of loops: single import at
  handler level replaces repeated per-iteration dynamic imports

Ref: CortexReach/pull/426
The const fsPromises declaration was inside the try block, making it
scoped to that block only. Subsequent fsPromises.stat() calls in
MEMORY.md and memory/ processing code were failing with
'fsPromises is not defined'. Move declaration to handler scope.
Scans the flat \workspace/memory/\ directory (directly under
workspace root, not inside any workspace subdirectory) and imports
entries with scope='memory'. This supports the actual OpenClaw
structure where memory files live directly in workspace/memory/.
Before scanning, read openclaw.json agents list to find the agent
whose workspace path matches the current workspaceDir. Use that agent's
id as workspaceScope for flat memory/ entries instead of defaulting to
'memory'. Falls back to 'shared' when no matching agent is found
(e.g. shared workspace with no dedicated agent).
Must fix:
- Flat memory scan: move before the mdFiles.length===0 early return so it
  is always reachable (not just when nested workspaces are empty)
- Tests: runImportMarkdown now uses embedPassage (not embedQuery) and
  JSON.stringify(metadata) to match production. Added embedPassage mock.
- Tests: setupWorkspace now creates files at workspace/<name>/ to match
  the actual path structure runImportMarkdown expects

Worth considering:
- Flat memory scan now skips when workspaceGlob is set, avoiding accidental
  root flat memory import when user specifies --workspace
- Removed dev artifacts: ANALYSIS.md and recall-benchmark.py contained
  personal absolute paths and are not suitable for repo commit
Before: --dry-run skipped dedup check entirely, so --dry-run --dedup
would overcount imports (items counted as imported even if dedup
would skip them).

After: dedup check runs regardless of dry-run mode. In dry-run,
items that would be skipped by dedup are counted as skipped,
not imported. Restores the dry-run console log message.
@jlin53882 jlin53882 force-pushed the feat/import-markdown-cli branch from 054ae85 to 9c39480 Compare April 1, 2026 06:22
@jlin53882
Copy link
Copy Markdown
Contributor Author

Branch rebased onto latest upstream/master (8 commits replayed cleanly, no conflicts). Ready for re-review 🙏

@jlin53882
Copy link
Copy Markdown
Contributor Author

CI Failure Analysis

The cli-smoke test failure is not caused by PR #426 — it is a pre-existing bug in upstream/master.

What failed

test/plugin-manifest-regression.mjs:155
AssertionError: sessionMemory should stay disabled by default
  actual:   [AsyncFunction: appendSelfImprovementNote]
  expected: undefined

Root cause

In index.ts upstream/master (line 2948), the command:new hook guard only checks beforeResetNote:

if (config.selfImprovement?.beforeResetNote !== false) {
  api.registerHook("command:new", appendSelfImprovementNote, {...});
}

When selfImprovement config block is absent/undefined:

  • undefined !== falsetrue → hook is registered unconditionally

This causes command:new to be registered even when selfImprovement is not configured at all.

PR #426 is not responsible

PR #426 only modifies cli.ts and test/import-markdown/. test/plugin-manifest-regression.mjs is unchanged by this PR. The failure exists in the upstream/master baseline.

Fix

Author jlin53882's branch fix/selfImprovement-hook-guard contains the correct fix (adding enabled !== false to the guard). PR #418 tracks this issue.

See also: #405

Copy link
Copy Markdown
Collaborator

@AliceLJY AliceLJY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed after 8-commit update. All feedback addressed:

  • ✅ embedQuery → embedPassage (production code + test mock split correctly)
  • ✅ metadata serialized via JSON.stringify
  • ✅ Flat workspace/memory/ scan moved before early return
  • ✅ fsPromises hoisted out of loop
  • ✅ --dry-run now runs dedup check first
  • ✅ Rebased onto latest main

@rwmjhb your REQUEST_CHANGES items also look addressed — please verify and merge when ready.

@rwmjhb
Copy link
Copy Markdown
Collaborator

rwmjhb commented Apr 1, 2026

Thanks for the quick turnaround — flat scan, embedPassage, metadata stringify, dry-run dedup, filter, and dev artifact cleanup all confirmed fixed.

A few new issues surfaced in the updated code:

Must fix:

  1. Source scopes are discovered but discarded — the scanner collects per-file scope from workspace directory names and flat openclaw.json agent lookup, but store.store() always writes scope: options.scope || "global". Running without --scope collapses all workspaces into one scope, causing cross-workspace leakage and dedup operating across unrelated workspaces.

  2. Scanner only descends one level under workspace/ — it checks workspace/<entry>/MEMORY.md but not workspace/agents/<id>/MEMORY.md. Agent workspaces stored under workspace/agents/theia/ (as shown in test/session-recovery-paths.test.mjs) are missed entirely.

  3. Tests still reimplement the import logicrunImportMarkdown() is a standalone copy, not a call to the actual CLI handler. The dry-run/dedup ordering in the helper may diverge from production. Also, import("../cli.ts") from test/import-markdown/ resolves to test/cli.ts, not the repo-root cli.ts.

Minor:

  • --min-text-length and --importance parsed with parseInt/parseFloat but no Number.isFinite check — NaN writes through silently.

Must fix:
- Source scopes discovered but discarded: scanner now falls back to per-file
  discovered scope instead of collapsing all workspaces into "global".
  Prevents cross-workspace leakage and incorrect dedup across workspaces.
- Scanner only descended one level: now also scans workspace/agents/<id>/
  for nested agent workspaces (e.g. workspace/agents/theia/MEMORY.md).

Minor fixes:
- NaN guardrails: --min-text-length and --importance now use clampInt
  and Number.isFinite to prevent invalid values from silently passing.
- Tests reimplement import logic: runImportMarkdown is now exported from
  cli.ts and tests call the production handler directly instead of a
  standalone copy. Prevents logic drift between tests and production.

Refs: PR CortexReach#426 review feedback
@jlin53882
Copy link
Copy Markdown
Contributor Author

All 4 issues from your review have been addressed in commit 9658f53:

Must fix:

  • Scope fallback: effectiveScope = options.scope || discoveredScope — without --scope, each workspace file now writes to its own scope instead of collapsing into "global". Cross-workspace leakage fixed.
  • Recursive agents scan: Added workspace/agents/<id>/ scan — now finds MEMORY.md and memory/ date files under nested agent workspaces (e.g. workspace/agents/theia/MEMORY.md).

Minor fixes:

  • NaN guardrails: --min-text-length now uses clampInt(parseInt(...), 1, 10000). --importance now uses Number.isFinite(parseFloat(...)) with bounds clamping. Invalid inputs fall back to defaults instead of silently passing.
  • Tests reimplement logic: runImportMarkdown is now exported from cli.ts. Tests' runImportMarkdown wrapper delegates to the production handler — no more duplicate logic drift.

Please take another look when you have time.

rwmjhb pushed a commit that referenced this pull request Apr 2, 2026
* docs: clarify dual-memory architecture (fixes #344)

Add two improvements addressing Issue #344:

1. README: add Dual-Memory Architecture section explaining
   Plugin Memory (LanceDB) vs Markdown Memory distinction
2. index.ts: log dual-memory warning on plugin startup

Refs: #344

* Address AliceLJY review comments: restore cli.ts, remove import-markdown reference, restore removed README sections

- Restore cli.ts (was accidentally deleted, all CLI commands preserved)
- Remove import-markdown command reference from dual-memory section (lives in PR #426)
- Restore beta.10 version banner and OpenClaw 2026.3+ badge
- Restore Auto-recall timeout tuning FAQ section

Ref: #367
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants