fix(import-markdown): P0 missing return + register test in CI by jlin53882 · Pull Request #482 · CortexReach/memory-lancedb-pro

jlin53882 · 2026-04-03T07:00:30Z

Summary

Fixes two critical issues identified in PR #426 review:

P0: Missing return in runImportMarkdown — cli.ts non-empty path was missing return { imported, skipped, foundFiles }, causing caller to receive undefined
P1: Test not registered in CI — test/import-markdown/import-markdown.test.mjs was not included in scripts.test

Additional fixes discovered during implementation

Jest → Node built-in test runner: Test file used @jest/globals but project uses node --test; all jest.fn() replaced with node:test's mock.fn()
Jiti TypeScript compilation: Direct import("../../cli.ts") fails with ERR_MODULE_NOT_FOUND; replaced with jiti factory pattern consistent with other tests in the repo

Changes

File	Change
`cli.ts`	Add `return { imported, skipped, foundFiles }` in non-empty import path
`package.json`	Register test in `scripts.test`
`test/import-markdown/import-markdown.test.mjs`	Jest→Node test runner + jiti import fix
`package-lock.json`	Updated dependencies

Testing

All 9 tests pass:

node --test test/import-markdown/import-markdown.test.mjs → pass 9 / fail 0

Add `memory-pro import-markdown` command to migrate existing Markdown memories (MEMORY.md, memory/YYYY-MM-DD.md) into the plugin LanceDB store for semantic recall. This addresses Issue CortexReach#344 by providing a migration path from the Markdown layer to the plugin memory layer.

…fig options + tests ## 實作改善（相對於原本的 PR CortexReach#426） ### 新增 CLI 選項 - --dedup：啟用 scope-aware exact match 去重（避免重複匯入） - --min-text-length <n>：設定最短文字長度門檻（預設 5） - --importance <n>：設定匯入記憶的 importance 值（預設 0.7） ### Bug 修復 - UTF-8 BOM 處理：讀檔後主動移除 \ufeFF prefix - CRLF 正規化：改用 split(/\r?\n/) 同時支援 CRLF 和 LF - Bullet 格式擴展：從只支援 '- ' 擴展到支援 '- '、'* '、'+ ' 三種 ### 新增測試 - test/import-markdown/import-markdown.test.mjs：完整單元測試 - BOM handling - CRLF normalization - Extended bullet formats (dash/star/plus) - minTextLength 參數 - importance 參數 - Dedup logic（scope-aware exact match） - Dry-run mode - Continue on error ### 分析文件 - test/import-markdown/ANALYSIS.md：完整分析報告 - 效益分析（真實檔案 655 筆記錄實測） - 3 個程式碼缺口分析 - 建議的 5 個新 config 欄位 - 功能條列式說明 - test/import-markdown/recall-benchmark.py：實際 LanceDB 查詢對比腳本 - 實測結果：7/8 個關鍵字在 Markdown 有但 LanceDB 找不到 - 證明 import-markdown 的實際價值 ## 實測效果（真實記憶檔案） - James 的 workspace：MEMORY.md（20 筆）+ 30 個 daily notes（633 筆）= 653 筆記錄 - 無 dedup：每次執行浪費 50%（重複匯入） - 有 dedup：第二次執行 100% skip，節省 644 次 embedder API 呼叫 - 關鍵字對比：7/8 個測試關鍵字在 Markdown 有、LanceDB 無 ## 建議新增的 Config（共 5 項，預設值 = 現在行為，向下相容） - importMarkdown.dedup: boolean = false - importMarkdown.defaultScope: string = global - importMarkdown.minTextLength: number = 5 - importMarkdown.importanceDefault: number = 0.7 - importMarkdown.workspaceFilter: string[] = [] Closes: PR CortexReach#426 (CortexReach/memory-lancedb-pro)

P1 fixes: - embedQuery -> embedPassage (lines 1001, 1171): imported memory content is passage/document, not a query. Using embedQuery with asymmetric providers (e.g. Jina) causes query-query comparison at recall time, degrading retrieval quality. - metadata: JSON.stringify the importedFrom object (line 1178): MemoryEntry.metadata is typed as string in store.ts; passing a plain object silently fails or produces unparseable data. Minor fixes: - workspaceEntries type: string[] -> Dirent[] (matches readdir withFileTypes) - Hoist await import('node:fs/promises') out of loops: single import at handler level replaces repeated per-iteration dynamic imports Ref: CortexReach/pull/426

The const fsPromises declaration was inside the try block, making it scoped to that block only. Subsequent fsPromises.stat() calls in MEMORY.md and memory/ processing code were failing with 'fsPromises is not defined'. Move declaration to handler scope.

Scans the flat \workspace/memory/\ directory (directly under workspace root, not inside any workspace subdirectory) and imports entries with scope='memory'. This supports the actual OpenClaw structure where memory files live directly in workspace/memory/.

Before scanning, read openclaw.json agents list to find the agent whose workspace path matches the current workspaceDir. Use that agent's id as workspaceScope for flat memory/ entries instead of defaulting to 'memory'. Falls back to 'shared' when no matching agent is found (e.g. shared workspace with no dedicated agent).

Must fix: - Flat memory scan: move before the mdFiles.length===0 early return so it is always reachable (not just when nested workspaces are empty) - Tests: runImportMarkdown now uses embedPassage (not embedQuery) and JSON.stringify(metadata) to match production. Added embedPassage mock. - Tests: setupWorkspace now creates files at workspace/<name>/ to match the actual path structure runImportMarkdown expects Worth considering: - Flat memory scan now skips when workspaceGlob is set, avoiding accidental root flat memory import when user specifies --workspace - Removed dev artifacts: ANALYSIS.md and recall-benchmark.py contained personal absolute paths and are not suitable for repo commit

Before: --dry-run skipped dedup check entirely, so --dry-run --dedup would overcount imports (items counted as imported even if dedup would skip them). After: dedup check runs regardless of dry-run mode. In dry-run, items that would be skipped by dedup are counted as skipped, not imported. Restores the dry-run console log message.

Must fix: - Source scopes discovered but discarded: scanner now falls back to per-file discovered scope instead of collapsing all workspaces into "global". Prevents cross-workspace leakage and incorrect dedup across workspaces. - Scanner only descended one level: now also scans workspace/agents/<id>/ for nested agent workspaces (e.g. workspace/agents/theia/MEMORY.md). Minor fixes: - NaN guardrails: --min-text-length and --importance now use clampInt and Number.isFinite to prevent invalid values from silently passing. - Tests reimplement import logic: runImportMarkdown is now exported from cli.ts and tests call the production handler directly instead of a standalone copy. Prevents logic drift between tests and production. Refs: PR CortexReach#426 review feedback

…s ERR_MODULE_NOT_FOUND)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c1bc08c3c0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-03T07:05:15Z

cli.ts

+      // This handles the structure used by session-recovery and other OpenClaw
+      // components: workspace/agents/<id>/MEMORY.md and workspace/agents/<id>/memory/.
+      // We scan one additional level deeper than the top-level workspace scan.
+      if (!workspaceGlob) {


Apply workspace-glob when scanning nested agent workspaces

The nested workspace/agents/<id> scan is gated behind if (!workspaceGlob), so passing a workspace-glob skips agent workspaces entirely. Because the earlier top-level loop only filters direct children of workspace/ (e.g., agents), a command like import-markdown <agent-id> cannot find workspace/agents/<agent-id>/MEMORY.md or memory/*.md, causing targeted imports to silently return no files in common agent-based layouts.

Useful? React with 👍 / 👎.

jlin53882 · 2026-04-03T07:21:06Z

已開 Issue 追蹤：#484

jlin53882 · 2026-04-03T07:29:24Z

✅ 已修復。問題出在 if (!workspaceGlob) 把整個 nested agent scan gate 住了，導致 import-markdown <agent-id> 時 workspaceGlob 有值就直接略過 agents 目錄。

修改：將 nested scan 邏輯重構，現在當有明確 workspaceGlob 時只掃描 workspace/agents/<workspaceGlob>/；無指定時掃描全部 agents。

Commit: 212e96a

rwmjhb · 2026-04-03T11:10:08Z

Review: fix(import-markdown): P0 missing return + register test in CI

Verdict: request-changes | Confidence: 0.95 | Value: 45%

The missing return fix and test registration are both needed. A few issues to address:

Must Fix

1. Build failure — rebase needed

Build is red (stale_base=true). Rebase onto current main to confirm the PR's changes don't interact with the failure.

2. Import-markdown test not confirmed running

Verification only ran cli-smoke.mjs. The new test/import-markdown/import-markdown.test.mjs was not executed — can't confirm the 9 test cases pass in the full pipeline.

Should Fix

3. dedupEnabled reads wrong option in action handler

// current (action handler summary)
const dedupEnabled = !!options.dryRun;  // ← should be options.dedup

The actual dedup logic inside runImportMarkdown correctly uses options.dedup, so stored data is fine — but the CLI summary line lies to the user about whether dedup was active.

4. Dedup only checks top-1 BM25 hit — can miss exact duplicates

--dedup does bm25Search(text, 1, [scope]) and checks if the top result matches exactly. BM25 is a ranked lexical search, not exact-text lookup. If a similar-but-different entry ranks above the actual duplicate, the duplicate is missed and a second copy gets imported.

Suggestion: either increase the limit (e.g. bm25Search(text, 5, ...) and check all results), or use a hash-based dedup.

5. Scope inference misses flat root-memory files

agentsList.find(a => path.normalize(a.workspace) === workspaceDir) compares the root workspace dir against per-agent paths like .openclaw/workspace/agents/theia. This never matches, so flat workspace/memory/*.md files fall back to scope: "shared" instead of the intended workspace scope.

6. CLI help text says --scope defaults to global, but code uses discoveredScope

--scope <scope> description says "default: global", but the implementation falls through to the discovered agent/workspace scope. Users omitting --scope get unexpected scope assignment.

Nit

Duplicate summary logging (both runImportMarkdown and the action handler print a summary)
tsx added as devDependency but test uses jiti — unused dep?
Inconsistent indentation in runImportMarkdown body

Auto-reviewed by auto-pr-review-orchestrator | 7 rounds | Claude + Codex adversarial

Should Fix from PR CortexReach#482 review: 3. Fix dedupEnabled option read in CLI summary line - cli.ts:655: !!options.dryRun -> !!options.dedup 4. Expand dedup BM25 search from top-1 to top-5 - cli.ts:682: bm25Search(text, 1, ...) -> bm25Search(text, 5, ...) - BM25 is lexical ranking, not exact match; top-1 can miss duplicates 5. Fallback scope for flat root-memory files - cli.ts:637: "shared" -> "global" - "shared" is not a valid scope (not in scopes.ts definitions) - flat workspace/memory/*.md are at workspace root level, semantics imply "global" (all agents accessible) 6. Fix CLI --scope help text to match actual behavior - cli.ts:1415: "default: global" -> "default: auto-discovered from workspace" - actual behavior: options.scope || discoveredScope (not global)

jlin53882 · 2026-04-03T12:30:01Z

Review 回覆：Fix 3/4/5/6 全部已處理

Fix 3 ✅ — dedupEnabled 讀錯選項

!!options.dryRun → !!options.dedup（cli.ts:655）

Fix 4 ✅ — Dedup top-1 → top-5

bm25Search(text, 1, ...) → bm25Search(text, 5, ...)（cli.ts:682）

Fix 5 ✅ — Scope fallback 從 "shared" → "global"

方向確認：

查看 src/scopes.ts 的正式定義：

"global" — "Shared knowledge across all agents"，系統預設 scope
"shared" — 不存在，不是正式 scope

flat root-memory (workspace/memory/*.md) 在 workspace root 層級，不在任何特定 agent 目錄下。語意上應該是「所有 agent 都能用」，對應到 "global"。

原本用 "shared" 應該是開發者直覺命名，但 "shared" 在 scope 系統沒有實際意義（也沒有任何 agent 可以讀取 scope: "shared" 的資料）。

如果你認為應該 fallback 到某個特定 agent scope（例如只有一個 agent 的情況），請讓我知道，我可以調整方向。

Fix 6 ✅ — --scope help text 修正

"default: global" → "default: auto-discovered from workspace"

jlin53882 · 2026-04-03T12:33:04Z

Must Fix 回覆

1. Build failure — rebase needed ✅ 已處理

Branch fix/import-markdown-rebase 已 rebase 到 current origin/master，conflict（package.json）已解決，master 和 PR 的測試檔都保留。

2. Import-markdown test 未確認在 CI 執行 ✅ 已處理

測試結果：

node --test test/import-markdown/import-markdown.test.mjs → 9/9 PASS
完整 npm run test 有 2 個預先存在的失敗（strip-envelope-metadata.test.mjs），與 import-markdown 無關

Should Fix 回覆（3/4/5/6）也已在另一個 comment 說明

以上，請確認。

AliceLJY

Review: PR #482 — fix(import-markdown): P0 missing return + register test in CI

Fixes two real issues from PR #426 review feedback. The extracted runImportMarkdown function is testable and the test suite covers the important edge cases.

Codex P2 assessment — "Apply workspace-glob when scanning nested agent workspaces"

Valid concern, and already addressed in the current code. The Codex comment noted that workspace-glob would skip agent workspaces because the nested scan was gated behind if (!workspaceGlob). The fix (lines ~137-153 in the diff) correctly handles both cases:

When workspaceGlob is set: scans only the matching agent via agentEntries.find(e => e.isDirectory() && e.name === workspaceGlob)
When not set: scans all agent workspaces

This was also filed as Issue #484, which this PR resolves.

Code review findings

P0 fix verified — return { imported, skipped, foundFiles } is now present at the end of the non-empty path (line 257 in diff). Without this, callers received undefined.
Test registration — The import-markdown.test.mjs is appended to the scripts.test chain in package.json. Correct.
Jest to Node test runner migration — Clean conversion from @jest/globals to node:test built-ins (describe, it, mock.fn()). Consistent with the rest of the repo.
Indentation inconsistency — The function body of runImportMarkdown starts with extra indentation (the first const openclawHome line at line 30 has ~8 spaces of leading whitespace, while the function signature is at the normal level). This looks like the function body was copy-pasted from a nested context (the original inline action handler). Not a blocker but worth a cosmetic cleanup.
Test calls runImportMarkdown(ctx, options) with 2 args but function signature takes 3 — The function signature is runImportMarkdown(ctx, workspaceGlob, options), but the tests call it as runImportMarkdown(ctx, { openclawHome, workspaceGlob, ... }), passing options as the second argument. This means workspaceGlob receives the entire options object (which is truthy), and options is undefined. The tests pass because the function internally reads workspaceGlob from the options object too (the .action handler passes it separately). Verify that the tests are actually exercising the exported function correctly — the 2-arg call signature suggests the tests may be testing a different code path than the CLI invocation.
process.exit(1) in a library function — runImportMarkdown calls process.exit(1) when the embedder is missing or the workspace directory is unreadable. For a function that's now exported and testable, throwing an error would be more appropriate than killing the process. The CLI action handler can catch and exit. Minor, follow-up material.
Missing trailing newline — package.json lost its trailing newline (\ No newline at end of file). Trivial but worth fixing.

Overall: the P0 fix and test registration are correct. Item 5 (call signature mismatch) is the main concern — please verify the tests call the function with the right arity.

LGTM with the above note.

…view ## Fixes applied ### P1 — process.exit(1) in library function (critical) - `runImportMarkdown` now throws `Error` instead of calling `process.exit(1)` when embedder is missing or workspace directory is unreadable. - CLI handler (`registerMemoryCLI`) wraps the call with try/catch and handles the error gracefully (still exits with code 1, but intentionally). ### P1 — CLI handler missing error boundary - Added try/catch wrapper to the `import-markdown` action handler so that errors thrown by `runImportMarkdown` are caught and reported cleanly instead of bubbling up as unhandled promise rejections. ### P2 — Dedup check silent failure - The `bm25Search` catch block now logs a `console.warn` instead of silently continuing, making dedup failures visible in output. ### P3 — Function body indentation - Normalized indentation: body lines were at column 0 / inconsistent 6-space indent; now consistently 2-space indented (matching the function signature). ### P3 — Duplicate CLI output removed - Removed redundant summary console.log from CLI handler; the summary is printed only once inside `runImportMarkdown`. ### P3 — package.json trailing newline - Added missing trailing newline to package.json (was ending with `}`). ## Regarding Codex Review P5 (test arity mismatch) Codex noted: "Test calls `runImportMarkdown(ctx, {options})` with 2 args but function signature takes 3 args." Clarification: the test file has a **module-level adapter** at the bottom (`async function runImportMarkdown(context, options = {})`) that correctly reconstitutes the 3-argument call: return importMarkdown(context, options.workspaceGlob ?? null, {...options}); This adapter is the intentional API design — the outer test helper accepts an options-object style call and normalizes it to the production 3-arg signature. The tests ARE exercising the production code path correctly. No functional issue; the Codex concern was a false positive from reading only the top-level call signature without following the adapter.

jlin53882 · 2026-04-03T19:46:08Z

PR #482 — P5 Codex Review Clarification + P1/P2/P3 Fixes Committed

P5 — Test arity mismatch (Codex concern): NOT a real bug

Codex noted that tests call runImportMarkdown(ctx, {options}) with 2 args but the production function takes 3 args. This is a false positive.

The test file has an intentional adapter layer at module level:

// Bottom of test file — adapter that normalizes call signature
async function runImportMarkdown(context, options = {}) {
  return importMarkdown(
    context,
    options.workspaceGlob ?? null,  // ← correctly passes as 2nd argument
    { dryRun: !!options.dryRun, scope: options.scope, ... }
  );
}

The outer test helper accepts an options-object style call and internally converts it to the correct 3-argument production signature. The adapter is specifically designed to bridge the simpler test API and the full production API. Tests are exercising the real production code path correctly. No change needed for this concern.

P1/P2/P3 fixes — now committed in `c1127ed`

The following code quality fixes have been pushed to the PR branch:

Severity	Issue	Fix
P1	`process.exit(1)` in `runImportMarkdown`	→ throw `Error` instead; CLI handler catches
P1	No error boundary in CLI handler	→ added `try/catch` wrapper
P2	`bm25Search` catch was silently swallowed	→ added `console.warn` logging
P3	Function body indentation inconsistent	→ normalized to 2-space
P3	CLI handler duplicated summary output	→ removed duplicate
P3	`package.json` missing trailing newline	→ added `\n`

Commit: c1127ed (fix/import-markdown-rebase branch)

Revert all changes except the isOwnedByAgent fix (src/reflection-store.ts): - Remove import-markdown CLI (cli.ts) — tracked separately in PR CortexReach#426/CortexReach#482 - Remove autoRecallExcludeAgents config — tracked separately in PR CortexReach#516/CortexReach#521 - Remove idempotent register guard — separate feature request needed - Remove recallMode parsing — unrelated to CortexReach#448 - Remove dual-memory docs (README.md) — already merged in PR CortexReach#367 - Remove script mode changes — unrelated - Remove embedder/llm-client changes — unrelated - Restore deleted nvidia test file — unrelated to CortexReach#448 Only src/reflection-store.ts isOwnedByAgent fix remains.

jlin53882 and others added 13 commits April 1, 2026 14:22

fix(import-markdown): add missing return + register test in CI

458bbff

fix(import-markdown): replace Jest with Node built-in test runner

21a5447

fix(import-markdown): use jiti for TypeScript import in test (resolve…

fc9c2de

…s ERR_MODULE_NOT_FOUND)

Merge fix/import-markdown-p0-return into master (resolve conflicts)

c1bc08c

jlin53882 mentioned this pull request Apr 3, 2026

fix(import-markdown): P0 missing return + register test in CI #479

Closed

chatgpt-codex-connector bot reviewed Apr 3, 2026

View reviewed changes

jlin53882 mentioned this pull request Apr 3, 2026

[import-markdown] workspace-glob 導致 nested agent workspace 被略過 #484

Open

fix(import-markdown): respect workspace-glob in nested agent scan

212e96a

JiwaniZakir mentioned this pull request Apr 3, 2026

feat: separate internal schema dimensions from embedding API request dimensions #377

Open

AliceLJY approved these changes Apr 3, 2026

View reviewed changes

AliceLJY assigned rwmjhb Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(import-markdown): P0 missing return + register test in CI#482

fix(import-markdown): P0 missing return + register test in CI#482
jlin53882 wants to merge 16 commits intoCortexReach:masterfrom
jlin53882:fix/import-markdown-rebase

jlin53882 commented Apr 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 3, 2026

Uh oh!

jlin53882 commented Apr 3, 2026

Uh oh!

jlin53882 commented Apr 3, 2026

Uh oh!

rwmjhb commented Apr 3, 2026

Uh oh!

jlin53882 commented Apr 3, 2026

Uh oh!

jlin53882 commented Apr 3, 2026

Uh oh!

AliceLJY left a comment

Uh oh!

jlin53882 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jlin53882 commented Apr 3, 2026

Summary

Additional fixes discovered during implementation

Changes

Testing

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

jlin53882 commented Apr 3, 2026

Uh oh!

jlin53882 commented Apr 3, 2026

Uh oh!

rwmjhb commented Apr 3, 2026

Review: fix(import-markdown): P0 missing return + register test in CI

Must Fix

Should Fix

Nit

Uh oh!

jlin53882 commented Apr 3, 2026

Review 回覆：Fix 3/4/5/6 全部已處理

Fix 3 ✅ — dedupEnabled 讀錯選項

Fix 4 ✅ — Dedup top-1 → top-5

Fix 5 ✅ — Scope fallback 從 "shared" → "global"

Fix 6 ✅ — --scope help text 修正

Uh oh!

jlin53882 commented Apr 3, 2026

Must Fix 回覆

1. Build failure — rebase needed ✅ 已處理

2. Import-markdown test 未確認在 CI 執行 ✅ 已處理

Should Fix 回覆（3/4/5/6）也已在另一個 comment 說明

Uh oh!

AliceLJY left a comment

Choose a reason for hiding this comment

Review: PR #482 — fix(import-markdown): P0 missing return + register test in CI

Codex P2 assessment — "Apply workspace-glob when scanning nested agent workspaces"

Code review findings

Uh oh!

jlin53882 commented Apr 3, 2026

PR #482 — P5 Codex Review Clarification + P1/P2/P3 Fixes Committed

P5 — Test arity mismatch (Codex concern): NOT a real bug

P1/P2/P3 fixes — now committed in c1127ed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

P1/P2/P3 fixes — now committed in `c1127ed`