Skip to content

fix(import-markdown): P0 missing return + register test in CI#479

Closed
jlin53882 wants to merge 12 commits intoCortexReach:masterfrom
jlin53882:fix/import-markdown-p0-return
Closed

fix(import-markdown): P0 missing return + register test in CI#479
jlin53882 wants to merge 12 commits intoCortexReach:masterfrom
jlin53882:fix/import-markdown-p0-return

Conversation

@jlin53882
Copy link
Copy Markdown
Contributor

Summary

Two P0 fixes for the import-markdown CLI command:

Fix 1: Missing return statement (cli.ts)

Added the missing
eturn { imported, skipped, foundFiles } after the non-dry-run console.log block. Without this return, callers would always receive undefined instead of the import result object.

Fix 2: Register import-markdown test in CI (package.json)

Added
ode --test test/import-markdown/import-markdown.test.mjs to the est script so it runs in CI.

Changes

  • cli.ts: Added return statement after import complete log
  • package.json: Registered est/import-markdown/import-markdown.test.mjs in test script

Testing

ode --test test/import-markdown/import-markdown.test.mjs — note: test file has pre-existing @jest/globals dependency issue (unrelated to this PR)

jlin53882 and others added 10 commits April 1, 2026 14:22
Add `memory-pro import-markdown` command to migrate existing Markdown memories
(MEMORY.md, memory/YYYY-MM-DD.md) into the plugin LanceDB store for semantic recall.

This addresses Issue CortexReach#344 by providing a migration path from the Markdown layer
to the plugin memory layer.
…fig options + tests

## 實作改善(相對於原本的 PR CortexReach#426)

### 新增 CLI 選項
- --dedup:啟用 scope-aware exact match 去重(避免重複匯入)
- --min-text-length <n>:設定最短文字長度門檻(預設 5)
- --importance <n>:設定匯入記憶的 importance 值(預設 0.7)

### Bug 修復
- UTF-8 BOM 處理:讀檔後主動移除 \ufeFF prefix
- CRLF 正規化:改用 split(/\r?\n/) 同時支援 CRLF 和 LF
- Bullet 格式擴展:從只支援 '- ' 擴展到支援 '- '、'* '、'+ ' 三種

### 新增測試
- test/import-markdown/import-markdown.test.mjs:完整單元測試
  - BOM handling
  - CRLF normalization
  - Extended bullet formats (dash/star/plus)
  - minTextLength 參數
  - importance 參數
  - Dedup logic(scope-aware exact match)
  - Dry-run mode
  - Continue on error

### 分析文件
- test/import-markdown/ANALYSIS.md:完整分析報告
  - 效益分析(真實檔案 655 筆記錄實測)
  - 3 個程式碼缺口分析
  - 建議的 5 個新 config 欄位
  - 功能條列式說明
- test/import-markdown/recall-benchmark.py:實際 LanceDB 查詢對比腳本
  - 實測結果:7/8 個關鍵字在 Markdown 有但 LanceDB 找不到
  - 證明 import-markdown 的實際價值

## 實測效果(真實記憶檔案)
- James 的 workspace:MEMORY.md(20 筆)+ 30 個 daily notes(633 筆)= 653 筆記錄
- 無 dedup:每次執行浪費 50%(重複匯入)
- 有 dedup:第二次執行 100% skip,節省 644 次 embedder API 呼叫
- 關鍵字對比:7/8 個測試關鍵字在 Markdown 有、LanceDB 無

## 建議新增的 Config(共 5 項,預設值 = 現在行為,向下相容)
- importMarkdown.dedup: boolean = false
- importMarkdown.defaultScope: string = global
- importMarkdown.minTextLength: number = 5
- importMarkdown.importanceDefault: number = 0.7
- importMarkdown.workspaceFilter: string[] = []

Closes: PR CortexReach#426 (CortexReach/memory-lancedb-pro)
P1 fixes:
- embedQuery -> embedPassage (lines 1001, 1171): imported memory content
  is passage/document, not a query. Using embedQuery with asymmetric
  providers (e.g. Jina) causes query-query comparison at recall time,
  degrading retrieval quality.

- metadata: JSON.stringify the importedFrom object (line 1178):
  MemoryEntry.metadata is typed as string in store.ts; passing a plain
  object silently fails or produces unparseable data.

Minor fixes:
- workspaceEntries type: string[] -> Dirent[] (matches readdir withFileTypes)
- Hoist await import('node:fs/promises') out of loops: single import at
  handler level replaces repeated per-iteration dynamic imports

Ref: CortexReach/pull/426
The const fsPromises declaration was inside the try block, making it
scoped to that block only. Subsequent fsPromises.stat() calls in
MEMORY.md and memory/ processing code were failing with
'fsPromises is not defined'. Move declaration to handler scope.
Scans the flat \workspace/memory/\ directory (directly under
workspace root, not inside any workspace subdirectory) and imports
entries with scope='memory'. This supports the actual OpenClaw
structure where memory files live directly in workspace/memory/.
Before scanning, read openclaw.json agents list to find the agent
whose workspace path matches the current workspaceDir. Use that agent's
id as workspaceScope for flat memory/ entries instead of defaulting to
'memory'. Falls back to 'shared' when no matching agent is found
(e.g. shared workspace with no dedicated agent).
Must fix:
- Flat memory scan: move before the mdFiles.length===0 early return so it
  is always reachable (not just when nested workspaces are empty)
- Tests: runImportMarkdown now uses embedPassage (not embedQuery) and
  JSON.stringify(metadata) to match production. Added embedPassage mock.
- Tests: setupWorkspace now creates files at workspace/<name>/ to match
  the actual path structure runImportMarkdown expects

Worth considering:
- Flat memory scan now skips when workspaceGlob is set, avoiding accidental
  root flat memory import when user specifies --workspace
- Removed dev artifacts: ANALYSIS.md and recall-benchmark.py contained
  personal absolute paths and are not suitable for repo commit
Before: --dry-run skipped dedup check entirely, so --dry-run --dedup
would overcount imports (items counted as imported even if dedup
would skip them).

After: dedup check runs regardless of dry-run mode. In dry-run,
items that would be skipped by dedup are counted as skipped,
not imported. Restores the dry-run console log message.
Must fix:
- Source scopes discovered but discarded: scanner now falls back to per-file
  discovered scope instead of collapsing all workspaces into "global".
  Prevents cross-workspace leakage and incorrect dedup across workspaces.
- Scanner only descended one level: now also scans workspace/agents/<id>/
  for nested agent workspaces (e.g. workspace/agents/theia/MEMORY.md).

Minor fixes:
- NaN guardrails: --min-text-length and --importance now use clampInt
  and Number.isFinite to prevent invalid values from silently passing.
- Tests reimplement import logic: runImportMarkdown is now exported from
  cli.ts and tests call the production handler directly instead of a
  standalone copy. Prevents logic drift between tests and production.

Refs: PR CortexReach#426 review feedback
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 458bbff537

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

*
* Run: node --experimental-vm-modules node_modules/.bin/jest test/import-markdown.test.mjs
*/
import { jest } from "@jest/globals";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove Jest-only import from node --test suite

The new test is wired into package.json via node --test, but this file imports @jest/globals and uses Jest APIs. This repository does not declare Jest in devDependencies, so the test run fails immediately with ERR_MODULE_NOT_FOUND before any assertions execute (I verified with node --test test/import-markdown/import-markdown.test.mjs). Registering this file in CI therefore breaks npm test until the test is rewritten to node:test APIs or Jest is added and invoked consistently.

Useful? React with 👍 / 👎.


beforeAll(async () => {
// We test the core logic directly instead of via CLI to avoid complex setup
const mod = await import("../cli.ts");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct CLI module path used by import-markdown test

The test loads the production CLI with import("../cli.ts"), but from test/import-markdown/ that resolves to test/cli.ts (which does not exist), not the repository root cli.ts. Once the runner issue above is fixed, this path will throw ERR_MODULE_NOT_FOUND and prevent the suite from exercising runImportMarkdown at all.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@AliceLJY AliceLJY left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这版还不能合:你把 test/import-markdown/import-markdown.test.mjs 接进了 npm test,但该文件当前直接 import { jest } from "@jest/globals",仓库里并没有这个依赖。我本地按 PR 描述执行了 node --test test/import-markdown/import-markdown.test.mjs,会直接报 ERR_MODULE_NOT_FOUND: Cannot find package "@jest/globals",所以现在这条 PR 会把主测试入口直接搞红。先把测试改成当前仓库实际使用的 runner / 依赖体系,或者补齐依赖并说明为什么要引入 Jest。

@jlin53882 jlin53882 closed this Apr 3, 2026
@jlin53882 jlin53882 reopened this Apr 3, 2026
@jlin53882
Copy link
Copy Markdown
Contributor Author

PR #482 是同一分支 rebase 到当前 master 的版本,已无冲突。请确认后关闭本 PR,或评论说明保留哪一个。

@rwmjhb
Copy link
Copy Markdown
Collaborator

rwmjhb commented Apr 3, 2026

Closing — superseded by PR #482 per author's request.

@rwmjhb rwmjhb closed this Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants