fix(import-markdown): P0 missing return + register test in CI#482
fix(import-markdown): P0 missing return + register test in CI#482jlin53882 wants to merge 16 commits intoCortexReach:masterfrom
Conversation
Add `memory-pro import-markdown` command to migrate existing Markdown memories (MEMORY.md, memory/YYYY-MM-DD.md) into the plugin LanceDB store for semantic recall. This addresses Issue CortexReach#344 by providing a migration path from the Markdown layer to the plugin memory layer.
…fig options + tests ## 實作改善(相對於原本的 PR CortexReach#426) ### 新增 CLI 選項 - --dedup:啟用 scope-aware exact match 去重(避免重複匯入) - --min-text-length <n>:設定最短文字長度門檻(預設 5) - --importance <n>:設定匯入記憶的 importance 值(預設 0.7) ### Bug 修復 - UTF-8 BOM 處理:讀檔後主動移除 \ufeFF prefix - CRLF 正規化:改用 split(/\r?\n/) 同時支援 CRLF 和 LF - Bullet 格式擴展:從只支援 '- ' 擴展到支援 '- '、'* '、'+ ' 三種 ### 新增測試 - test/import-markdown/import-markdown.test.mjs:完整單元測試 - BOM handling - CRLF normalization - Extended bullet formats (dash/star/plus) - minTextLength 參數 - importance 參數 - Dedup logic(scope-aware exact match) - Dry-run mode - Continue on error ### 分析文件 - test/import-markdown/ANALYSIS.md:完整分析報告 - 效益分析(真實檔案 655 筆記錄實測) - 3 個程式碼缺口分析 - 建議的 5 個新 config 欄位 - 功能條列式說明 - test/import-markdown/recall-benchmark.py:實際 LanceDB 查詢對比腳本 - 實測結果:7/8 個關鍵字在 Markdown 有但 LanceDB 找不到 - 證明 import-markdown 的實際價值 ## 實測效果(真實記憶檔案) - James 的 workspace:MEMORY.md(20 筆)+ 30 個 daily notes(633 筆)= 653 筆記錄 - 無 dedup:每次執行浪費 50%(重複匯入) - 有 dedup:第二次執行 100% skip,節省 644 次 embedder API 呼叫 - 關鍵字對比:7/8 個測試關鍵字在 Markdown 有、LanceDB 無 ## 建議新增的 Config(共 5 項,預設值 = 現在行為,向下相容) - importMarkdown.dedup: boolean = false - importMarkdown.defaultScope: string = global - importMarkdown.minTextLength: number = 5 - importMarkdown.importanceDefault: number = 0.7 - importMarkdown.workspaceFilter: string[] = [] Closes: PR CortexReach#426 (CortexReach/memory-lancedb-pro)
P1 fixes:
- embedQuery -> embedPassage (lines 1001, 1171): imported memory content
is passage/document, not a query. Using embedQuery with asymmetric
providers (e.g. Jina) causes query-query comparison at recall time,
degrading retrieval quality.
- metadata: JSON.stringify the importedFrom object (line 1178):
MemoryEntry.metadata is typed as string in store.ts; passing a plain
object silently fails or produces unparseable data.
Minor fixes:
- workspaceEntries type: string[] -> Dirent[] (matches readdir withFileTypes)
- Hoist await import('node:fs/promises') out of loops: single import at
handler level replaces repeated per-iteration dynamic imports
Ref: CortexReach/pull/426
The const fsPromises declaration was inside the try block, making it scoped to that block only. Subsequent fsPromises.stat() calls in MEMORY.md and memory/ processing code were failing with 'fsPromises is not defined'. Move declaration to handler scope.
Scans the flat \workspace/memory/\ directory (directly under workspace root, not inside any workspace subdirectory) and imports entries with scope='memory'. This supports the actual OpenClaw structure where memory files live directly in workspace/memory/.
Before scanning, read openclaw.json agents list to find the agent whose workspace path matches the current workspaceDir. Use that agent's id as workspaceScope for flat memory/ entries instead of defaulting to 'memory'. Falls back to 'shared' when no matching agent is found (e.g. shared workspace with no dedicated agent).
Must fix: - Flat memory scan: move before the mdFiles.length===0 early return so it is always reachable (not just when nested workspaces are empty) - Tests: runImportMarkdown now uses embedPassage (not embedQuery) and JSON.stringify(metadata) to match production. Added embedPassage mock. - Tests: setupWorkspace now creates files at workspace/<name>/ to match the actual path structure runImportMarkdown expects Worth considering: - Flat memory scan now skips when workspaceGlob is set, avoiding accidental root flat memory import when user specifies --workspace - Removed dev artifacts: ANALYSIS.md and recall-benchmark.py contained personal absolute paths and are not suitable for repo commit
Before: --dry-run skipped dedup check entirely, so --dry-run --dedup would overcount imports (items counted as imported even if dedup would skip them). After: dedup check runs regardless of dry-run mode. In dry-run, items that would be skipped by dedup are counted as skipped, not imported. Restores the dry-run console log message.
Must fix: - Source scopes discovered but discarded: scanner now falls back to per-file discovered scope instead of collapsing all workspaces into "global". Prevents cross-workspace leakage and incorrect dedup across workspaces. - Scanner only descended one level: now also scans workspace/agents/<id>/ for nested agent workspaces (e.g. workspace/agents/theia/MEMORY.md). Minor fixes: - NaN guardrails: --min-text-length and --importance now use clampInt and Number.isFinite to prevent invalid values from silently passing. - Tests reimplement import logic: runImportMarkdown is now exported from cli.ts and tests call the production handler directly instead of a standalone copy. Prevents logic drift between tests and production. Refs: PR CortexReach#426 review feedback
…s ERR_MODULE_NOT_FOUND)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c1bc08c3c0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
cli.ts
Outdated
| // This handles the structure used by session-recovery and other OpenClaw | ||
| // components: workspace/agents/<id>/MEMORY.md and workspace/agents/<id>/memory/. | ||
| // We scan one additional level deeper than the top-level workspace scan. | ||
| if (!workspaceGlob) { |
There was a problem hiding this comment.
Apply workspace-glob when scanning nested agent workspaces
The nested workspace/agents/<id> scan is gated behind if (!workspaceGlob), so passing a workspace-glob skips agent workspaces entirely. Because the earlier top-level loop only filters direct children of workspace/ (e.g., agents), a command like import-markdown <agent-id> cannot find workspace/agents/<agent-id>/MEMORY.md or memory/*.md, causing targeted imports to silently return no files in common agent-based layouts.
Useful? React with 👍 / 👎.
|
已開 Issue 追蹤:#484 |
|
✅ 已修復。問題出在 修改:將 nested scan 邏輯重構,現在當有明確 Commit: |
Review: fix(import-markdown): P0 missing return + register test in CIVerdict: request-changes | Confidence: 0.95 | Value: 45% The missing return fix and test registration are both needed. A few issues to address: Must Fix1. Build failure — rebase needed Build is red ( 2. Import-markdown test not confirmed running Verification only ran Should Fix3. // current (action handler summary)
const dedupEnabled = !!options.dryRun; // ← should be options.dedupThe actual dedup logic inside 4. Dedup only checks top-1 BM25 hit — can miss exact duplicates
Suggestion: either increase the limit (e.g. 5. Scope inference misses flat root-memory files
6. CLI help text says
Nit
Auto-reviewed by auto-pr-review-orchestrator | 7 rounds | Claude + Codex adversarial |
Should Fix from PR CortexReach#482 review: 3. Fix dedupEnabled option read in CLI summary line - cli.ts:655: !!options.dryRun -> !!options.dedup 4. Expand dedup BM25 search from top-1 to top-5 - cli.ts:682: bm25Search(text, 1, ...) -> bm25Search(text, 5, ...) - BM25 is lexical ranking, not exact match; top-1 can miss duplicates 5. Fallback scope for flat root-memory files - cli.ts:637: "shared" -> "global" - "shared" is not a valid scope (not in scopes.ts definitions) - flat workspace/memory/*.md are at workspace root level, semantics imply "global" (all agents accessible) 6. Fix CLI --scope help text to match actual behavior - cli.ts:1415: "default: global" -> "default: auto-discovered from workspace" - actual behavior: options.scope || discoveredScope (not global)
Review 回覆:Fix 3/4/5/6 全部已處理Fix 3 ✅ — dedupEnabled 讀錯選項
Fix 4 ✅ — Dedup top-1 → top-5
Fix 5 ✅ — Scope fallback 從 "shared" → "global"方向確認: 查看
flat root-memory ( 原本用 如果你認為應該 fallback 到某個特定 agent scope(例如只有一個 agent 的情況),請讓我知道,我可以調整方向。 Fix 6 ✅ — --scope help text 修正
|
Must Fix 回覆1. Build failure — rebase needed ✅ 已處理Branch 2. Import-markdown test 未確認在 CI 執行 ✅ 已處理測試結果:
Should Fix 回覆(3/4/5/6)也已在另一個 comment 說明以上,請確認。 |
AliceLJY
left a comment
There was a problem hiding this comment.
Review: PR #482 — fix(import-markdown): P0 missing return + register test in CI
Fixes two real issues from PR #426 review feedback. The extracted runImportMarkdown function is testable and the test suite covers the important edge cases.
Codex P2 assessment — "Apply workspace-glob when scanning nested agent workspaces"
Valid concern, and already addressed in the current code. The Codex comment noted that workspace-glob would skip agent workspaces because the nested scan was gated behind if (!workspaceGlob). The fix (lines ~137-153 in the diff) correctly handles both cases:
- When
workspaceGlobis set: scans only the matching agent viaagentEntries.find(e => e.isDirectory() && e.name === workspaceGlob) - When not set: scans all agent workspaces
This was also filed as Issue #484, which this PR resolves.
Code review findings
-
P0 fix verified —
return { imported, skipped, foundFiles }is now present at the end of the non-empty path (line 257 in diff). Without this, callers receivedundefined. -
Test registration — The
import-markdown.test.mjsis appended to thescripts.testchain inpackage.json. Correct. -
Jest to Node test runner migration — Clean conversion from
@jest/globalstonode:testbuilt-ins (describe,it,mock.fn()). Consistent with the rest of the repo. -
Indentation inconsistency — The function body of
runImportMarkdownstarts with extra indentation (the firstconst openclawHomeline at line 30 has ~8 spaces of leading whitespace, while the function signature is at the normal level). This looks like the function body was copy-pasted from a nested context (the original inlineactionhandler). Not a blocker but worth a cosmetic cleanup. -
Test calls
runImportMarkdown(ctx, options)with 2 args but function signature takes 3 — The function signature isrunImportMarkdown(ctx, workspaceGlob, options), but the tests call it asrunImportMarkdown(ctx, { openclawHome, workspaceGlob, ... }), passing options as the second argument. This meansworkspaceGlobreceives the entire options object (which is truthy), andoptionsisundefined. The tests pass because the function internally readsworkspaceGlobfrom the options object too (the.actionhandler passes it separately). Verify that the tests are actually exercising the exported function correctly — the 2-arg call signature suggests the tests may be testing a different code path than the CLI invocation. -
process.exit(1)in a library function —runImportMarkdowncallsprocess.exit(1)when the embedder is missing or the workspace directory is unreadable. For a function that's now exported and testable, throwing an error would be more appropriate than killing the process. The CLI action handler can catch and exit. Minor, follow-up material. -
Missing trailing newline —
package.jsonlost its trailing newline (\ No newline at end of file). Trivial but worth fixing.
Overall: the P0 fix and test registration are correct. Item 5 (call signature mismatch) is the main concern — please verify the tests call the function with the right arity.
LGTM with the above note.
…view
## Fixes applied
### P1 — process.exit(1) in library function (critical)
- `runImportMarkdown` now throws `Error` instead of calling `process.exit(1)`
when embedder is missing or workspace directory is unreadable.
- CLI handler (`registerMemoryCLI`) wraps the call with try/catch and
handles the error gracefully (still exits with code 1, but intentionally).
### P1 — CLI handler missing error boundary
- Added try/catch wrapper to the `import-markdown` action handler so that
errors thrown by `runImportMarkdown` are caught and reported cleanly
instead of bubbling up as unhandled promise rejections.
### P2 — Dedup check silent failure
- The `bm25Search` catch block now logs a `console.warn` instead of
silently continuing, making dedup failures visible in output.
### P3 — Function body indentation
- Normalized indentation: body lines were at column 0 / inconsistent 6-space
indent; now consistently 2-space indented (matching the function signature).
### P3 — Duplicate CLI output removed
- Removed redundant summary console.log from CLI handler; the summary is
printed only once inside `runImportMarkdown`.
### P3 — package.json trailing newline
- Added missing trailing newline to package.json (was ending with `}`).
## Regarding Codex Review P5 (test arity mismatch)
Codex noted: "Test calls `runImportMarkdown(ctx, {options})` with 2 args
but function signature takes 3 args."
Clarification: the test file has a **module-level adapter** at the bottom
(`async function runImportMarkdown(context, options = {})`) that correctly
reconstitutes the 3-argument call:
return importMarkdown(context, options.workspaceGlob ?? null, {...options});
This adapter is the intentional API design — the outer test helper accepts
an options-object style call and normalizes it to the production 3-arg
signature. The tests ARE exercising the production code path correctly.
No functional issue; the Codex concern was a false positive from reading
only the top-level call signature without following the adapter.
PR #482 — P5 Codex Review Clarification + P1/P2/P3 Fixes CommittedP5 — Test arity mismatch (Codex concern): NOT a real bugCodex noted that tests call The test file has an intentional adapter layer at module level: // Bottom of test file — adapter that normalizes call signature
async function runImportMarkdown(context, options = {}) {
return importMarkdown(
context,
options.workspaceGlob ?? null, // ← correctly passes as 2nd argument
{ dryRun: !!options.dryRun, scope: options.scope, ... }
);
}The outer test helper accepts an options-object style call and internally converts it to the correct 3-argument production signature. The adapter is specifically designed to bridge the simpler test API and the full production API. Tests are exercising the real production code path correctly. No change needed for this concern. P1/P2/P3 fixes — now committed in
|
| Severity | Issue | Fix |
|---|---|---|
| P1 | process.exit(1) in runImportMarkdown |
→ throw Error instead; CLI handler catches |
| P1 | No error boundary in CLI handler | → added try/catch wrapper |
| P2 | bm25Search catch was silently swallowed |
→ added console.warn logging |
| P3 | Function body indentation inconsistent | → normalized to 2-space |
| P3 | CLI handler duplicated summary output | → removed duplicate |
| P3 | package.json missing trailing newline |
→ added \n |
Commit: c1127ed (fix/import-markdown-rebase branch)
Revert all changes except the isOwnedByAgent fix (src/reflection-store.ts): - Remove import-markdown CLI (cli.ts) — tracked separately in PR CortexReach#426/CortexReach#482 - Remove autoRecallExcludeAgents config — tracked separately in PR CortexReach#516/CortexReach#521 - Remove idempotent register guard — separate feature request needed - Remove recallMode parsing — unrelated to CortexReach#448 - Remove dual-memory docs (README.md) — already merged in PR CortexReach#367 - Remove script mode changes — unrelated - Remove embedder/llm-client changes — unrelated - Restore deleted nvidia test file — unrelated to CortexReach#448 Only src/reflection-store.ts isOwnedByAgent fix remains.
Summary
Fixes two critical issues identified in PR #426 review:
returninrunImportMarkdown—cli.tsnon-empty path was missingreturn { imported, skipped, foundFiles }, causing caller to receiveundefinedtest/import-markdown/import-markdown.test.mjswas not included inscripts.testAdditional fixes discovered during implementation
@jest/globalsbut project usesnode --test; alljest.fn()replaced withnode:test'smock.fn()import("../../cli.ts")fails withERR_MODULE_NOT_FOUND; replaced with jiti factory pattern consistent with other tests in the repoChanges
cli.tsreturn { imported, skipped, foundFiles }in non-empty import pathpackage.jsonscripts.testtest/import-markdown/import-markdown.test.mjspackage-lock.jsonTesting
All 9 tests pass:
node --test test/import-markdown/import-markdown.test.mjs→ pass 9 / fail 0