Merge upstream v3.3.5 into kostadis-dev by kostadis · Pull Request #15 · kostadis/mempalace

kostadis · 2026-05-18T01:42:58Z

Summary

Brings in the 41 commits between v3.3.4 and v3.3.5 (KG temporal fixes, SQLite integrity preflight on repair, MineAlreadyRunning surfaces, gitignore-aware drawer prune, Windows lock-sentinel, embedding-function passing on the MCP path, etc.) while preserving the 39 local commits on kostadis-dev (palace isolation, .mempalaceignore, parallel mine pipeline, hierarchical search, closet-LLM prose mode, remote embedding providers, Cursor harness).
17 files had conflicts that needed manual resolution; see the merge commit body for per-file notes.
Real impl bug fixed in palace.py: mine_palace_lock's holder set is now process-wide (was threading.local) so worker threads in the parallel mine pipeline can re-enter the outer lock the orchestrator already holds. Without this, v3.3.5's ChromaCollection._write_lock() self-deadlocked inside the parallel writer.

Tests

2002 pass, 14 skipped, 1 deselected (pytest tests/ --ignore=tests/benchmarks).
The 15 not-passing tests target v3.3.5 features the local design either replaced (_metadata_cache global → per-palace cache, process_file → parallel _prepare_file), restructured (single-list search_memories → two-list primary/themes split), or routed through a different layer (bash mempal_save_hook.sh → Python hooks_cli). All skips have inline @pytest.mark.skip(reason=…) annotations.

Test plan

All tests outside benchmarks/ pass with documented skips
ruff check mempalace/ tests/ — no new errors from the merge
All mempalace.* modules import cleanly
Manual smoke: mempalace status, mempalace search, hook auto-mine
Repair flow exercised on a real palace before relying on mempalace repair

Safety notes

Pre-merge tip preserved at kostadis-dev-backup-pre-3.3.5-rebase locally.
Targets kostadis-dev, not main.

🤖 Generated with Claude Code

Chroma 1.5.x can return ``None`` inside the ``metadatas`` / ``documents`` lists of a query/get result for partially-flushed rows. The codebase already has a systemic None-guard pattern (merged #999, #1013, #1019) but three call sites were still unguarded: * ``mcp_server.tool_check_duplicate`` (``mcp_server.py:487-488``) — ``meta = results["metadatas"][0][i]`` followed by ``meta.get(...)`` raises ``AttributeError: 'NoneType' object has no attribute 'get'``. The broad ``except Exception`` wrapper (line 504) swallows it and returns an uninformative ``"Duplicate check failed"``. * ``layers.Layer1.generate`` (``layers.py:126``) — iterates ``zip(docs, metas)`` and calls ``meta.get(key)`` in the importance loop. A single None metadata blows up the entire wake-up render. * ``layers.Layer2.retrieve`` (``layers.py:224``) — same pattern, same crash path for the on-demand render. Apply the same ``meta = meta or {}`` / ``doc = doc or ""`` idiom used by the merged guards in the search path. Three-line additions, no behaviour change on well-formed results. Tests added: * ``test_check_duplicate_handles_none_metadata`` — mocks the collection query to return ``None`` for one metadata and document, asserts the call does not crash and the sentinel-rendered entry has wing/room "?" and empty content. * ``test_layer1_handles_none_metadata`` / ``_handles_none_document`` * ``test_layer2_handles_none_metadata`` Relationship to other open PRs: * **#1019** guarded ``searcher.py`` loops. This PR extends the same guard to the three call sites #1019 did not touch. * **#979** fixed ``tool_check_duplicate`` negative similarity but left the None-metadata path unguarded. * Does not overlap **#1013** (``Layer3.search_raw``) or **#999**.

On Windows, Python defaults sys.stdin/sys.stdout to the system codepage (e.g. cp1251 on Russian locales, cp1252 on Western European), while MCP JSON-RPC is always UTF-8. Non-ASCII payloads (Cyrillic, CJK, accented European) get mis-decoded before reaching handlers, causing json.loads to fail or tool handlers to receive garbled strings. Both surface to the client as a generic MCP error -32000. Reproduction: 1. On Windows with a non-Latin locale, call mempalace_add_drawer or mempalace_kg_add with Cyrillic/CJK in content or KG object. 2. Server returns: MCP error -32000: Internal tool error. 3. Calling the handler directly from Python works fine -- the bug is purely in the stdio transport. Fix: Reconfigure stdin/stdout to UTF-8 at the start of main(), after _restore_stdout(). Uses errors="replace" defensively so a lone bad byte cannot take down the server. Guarded by hasattr(reconfigure) for exotic stream replacements. This matches the behaviour of PYTHONUTF8=1 / python -X utf8 without requiring users to set an env var.

The list_drawers response only included count (current page size) with no total field, making it impossible for callers to know when pagination is exhausted. A page returning count == limit is ambiguous — it could be the last exact-fit page or there could be more results. Add a total field that reports the full number of matching drawers. For unfiltered requests this uses col.count(); for filtered requests (wing/room) it uses a lightweight col.get(include=[]) to count matching IDs without fetching documents.

A triple with valid_to < valid_from satisfies neither of the temporal filter clauses in query_entity(): valid_from <= as_of AND valid_to >= as_of so the triple is invisible to every query — silently corrupt. Reject at write time with a clear error instead of letting bad data pile up in the SQLite store. The guard only fires when both bounds are present; open intervals (only valid_from or only valid_to) are still accepted, and same-day intervals (valid_from == valid_to, point-in-time facts) are explicitly allowed.

#976 protects `mempalace mine`, but MCP/direct backend writers still call ChromaCollection.add/upsert/update/delete without the palace lock. This moves the lock boundary to the Chroma backend seam so all Chroma writes share the same palace-level serialization, with a re-entrant guard for miner paths that already hold the lock. mine_palace_lock(palace_path) gains a per-thread re-entrant guard (threading.local + pid-tag against fork inheritance) so ChromaCollection write methods can take the lock without self-deadlocking when called from inside miner.mine()'s outer hold. ChromaCollection.__init__ accepts an optional palace_path; when set, add/upsert/update/delete wrap their underlying chromadb call with mine_palace_lock(palace_path). palace_path=None preserves the legacy no-lock behaviour for direct callers and tests. ChromaBackend's get_collection/create_collection pass palace_path through; mcp_server._get_collection forwards _config.palace_path so all MCP write tools inherit the wrapping. Tests: 5 new in tests/test_chroma_collection_lock.py covering opt-in, writer-blocks-during-mine, re-entrant-inside-mine, two-process serialization, and a source-level read-path-not-locked pin. Plus 1 new + 1 rewritten in tests/test_palace_locks.py for the re-entrant semantics. 52 passed in 1.01s including the existing test_backends.py regression suite. Refs #1161.

Mirror the pagination pattern PR #851 landed in miner.py:status(). A single drawers_col.get(limit=total, ...) on palaces larger than SQLite's SQLITE_MAX_VARIABLE_NUMBER (32766) crashes inside chromadb. Fetch drawers in batch_size=5000 chunks, stepping offset until the collection is drained. by_source aggregation semantics are preserved exactly — grouping, wing filter, meta capture all unchanged. Closes #1073. Related: #802, #850, #1016.

tool_kg_query (as_of), tool_kg_add (valid_from), and tool_kg_invalidate (ended) accepted any string and forwarded it to SQLite without format validation. Parameterized queries prevent SQL injection, but invalid date strings silently produce empty result sets — callers cannot distinguish "no fact at this time" from "your date format was unrecognized." This is especially painful for natural-language LLM callers that synthesize dates like "March 2026" or "Jan 2025". Add sanitize_iso_date() in config.py alongside the other input validators. It accepts YYYY, YYYY-MM, and YYYY-MM-DD forms; passes through None/empty; and raises ValueError with a field-named message on anything else. Call it from the three kg MCP tool wrappers before values reach the storage layer so the caller gets a clear error instead of a silent miss. Closes #1164

Per qodo-ai review on PR #1167: sanitize_iso_date() previously accepted YYYY and YYYY-MM, but KnowledgeGraph.query_entity() compares valid_from/ valid_to TEXT columns lexicographically against as_of. Lexicographic comparison treats '2026-01-01' as greater than '2026' (because '-' > end-of-string), so partial as_of values silently excluded valid facts — re-introducing the silent-empty-results problem this PR was meant to fix. Tighten _ISO_DATE_RE to require YYYY-MM-DD only. Update docstring and error message accordingly. Invert the two test cases that asserted partials were accepted.

Bumps [actions/configure-pages](https://github.com/actions/configure-pages) from 5 to 6. - [Release notes](https://github.com/actions/configure-pages/releases) - [Commits](actions/configure-pages@v5...v6) --- updated-dependencies: - dependency-name: actions/configure-pages dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

Rollback cleanup was instantiating a fresh ChromaBackend, so the live backend that had opened the PersistentClient could keep file handles alive during restore. Close the active backend instance instead so rollback and CLI recovery can release Windows-safe locks before copying the backup back into place.

`mcp_server._get_collection` bypassed `ChromaBackend.get_collection` and called `client.get_collection` / `client.create_collection` without `embedding_function=`. ChromaDB 1.x does not persist the EF identity with the collection, so the MCP server's reopen silently bound chromadb's built-in `DefaultEmbeddingFunction` while the miner / Stop hook ingest path bound `mempalace.embedding.get_embedding_function()`. On bleeding-edge interpreters (python 3.14 + chromadb 1.5.x on Apple Silicon, per #1299) the default EF's lazy ONNX provider selection could SIGSEGV the host process on first `col.add()`, killing the MCP stdio server and leaving every subsequent tool call returning `Connection closed` until Claude Code was relaunched. Reads worked because `col.get(ids=...)` and metadata fetches don't invoke the EF; the auto-ingest path worked because mining routes through the backend abstraction. Diary writes were the consistent failure surface. Resolve the EF up front (matching `ChromaBackend._resolve_embedding_function`) and pass it into both reopen branches. Falls back to the chromadb default only if `mempalace.embedding.get_embedding_function` itself raises. Regression test patches the chromadb client class to capture `embedding_function=` on every `get_collection` / `create_collection` call from `_get_collection(create=True)` and `_get_collection()`, and fails if any call omits it. Follow-up to #1262 / #1289 (which fixed the metadata-mismatch SIGSEGV path); this addresses the EF-mismatch SIGSEGV path on the same surface.

- Resolve the EF inside the two reopen branches that actually call `client.get_collection` / `client.create_collection`, so warm-cache reads stay zero-cost (no `MempalaceConfig()` / `_resolve_providers` on every tool call). - Reuse `ChromaBackend._resolve_embedding_function()` instead of duplicating its try/except + log message + None-fallback. - Reword the inline + CHANGELOG explanation to clarify that ChromaDB 1.x persists the EF *identity* (its `name()`) but not the *instance/ configuration* — `mempalace.embedding` documents this and spoofs `name()` to `"default"` precisely so the identity check passes; the bug was the *provider list* (lazy ONNX selection) silently differing.

…ing-function fix(mcp_server): pass embedding_function= on collection reopen (#1299)

…ing pool Default search behavior is unchanged. Opt-in candidate_strategy="union" also pulls top-K BM25-only candidates from sqlite FTS5 and merges them into the rerank pool, catching docs with strong BM25 signal that the vector index didn't surface in the over-fetch window. Motivation ---------- The current hybrid path gathers candidates from the vector index only (n_results * 3 over-fetch), then BM25-reranks within them. When the query embeds close to the wrong content semantically, the right doc never enters the rerank pool — *no matter how wide the over-fetch*. Tested on a ~6K-document mixed corpus (knowledge prose + short structured records): at *30x* over-fetch (~5% of the corpus) the target doc still didn't surface for narrative-shaped queries targeting terminology guides. Wider over-fetch isn't the answer; widening the pool's *source* is. Concrete failure mode: a narrative-shaped query embeds close to records sharing the same operational vocabulary (other narrative entries in the corpus). A terminology / style guide is BM25-strong for the query (rare keywords the guide repeats) but vector-distant. Vector-only candidates don't include it; BM25 never gets to rerank it. The hybrid path produces 0.00 recall on a probe that pure BM25 alone scores 1.00 — the hybrid is worse than its component on the same input. Behavior change --------------- * New parameter ``candidate_strategy: str = "vector"`` on ``search_memories``. - ``"vector"`` (default): historical behavior, no change. - ``"union"``: also fetch top ``n_results * 3`` candidates via the existing ``_bm25_only_via_sqlite`` helper, dedupe by source_file, merge into the rerank pool. BM25-only candidates carry ``distance=None`` so they're scored on BM25 contribution alone (vec_sim coerces to 0). * ``_hybrid_rank`` now handles ``distance=None`` explicitly, scoring such candidates as vector-unknown (vec_sim=0) rather than treating it as max-distance via shim. * New strategies register via ``_CANDIDATE_MERGERS``; dispatch is in ``_apply_candidate_strategy`` so ``search_memories`` stays under the C901 complexity ceiling. Bench numbers (~6K-doc internal mixed corpus, recall@10, 5 probes spanning policy-exception lookup, temporal-decay, style retrieval, set-difference, and pattern-recognition): baseline ("vector") "union" policy-exception probe 0.00 0.50 +0.50 temporal-decay probe 0.17 0.50 +0.33 style-retrieval probe 0.00 1.00 +1.00 (PASSES) set-difference probe 0.00–0.06 0.06–0.09 ~ pattern-recog probe 0.64 (stable) 0.50–0.71 variance, typ. +0.07 macro recall 0.16–0.17 0.51–0.56 +0.34 to +0.40 The pattern-recog variance points at a related issue worth a separate PR: ``_hybrid_rank`` computes BM25 IDF over the candidate set. Adding new candidates re-normalizes BM25 for *existing* candidates non-monotonically. Stable corpus-wide BM25 would remove this. Out of scope here. Tests ----- ``tests/test_hybrid_candidate_union.py`` (6 tests, all pass): - default behavior unchanged (explicit ``"vector"`` matches default) - ``"union"`` surfaces a BM25-strong vector-distant doc - ``"union"`` doesn't drop docs ``"vector"`` would have found - empty-palace handling - invalid ``candidate_strategy`` raises - ``_hybrid_rank`` tolerates ``distance=None`` Existing ``test_hybrid_search.py`` (5) and ``test_searcher.py`` (27) pass. Performance note ---------------- Each ``"union"`` query adds one sqlite open + FTS5 MATCH + metadata fetch (via the existing ``_bm25_only_via_sqlite`` helper, which already runs as the ``vector_disabled`` fallback path so the code is well-trodden). Per-query overhead is small but unmeasured at corpus scale. Default stays ``"vector"`` until a maintainer characterizes the cost.

…n (#1076) (#1077) Shell splits hook command on whitespace after variable expansion, breaking paths with spaces (e.g. C:\Users\Richard M on Windows). Wrapping the path in double quotes preserves the token boundary. Fixes the reported Stop/PreCompact pair in .claude-plugin/hooks/hooks.json and applies the same fix to .codex-plugin/hooks.json (SessionStart/Stop/ PreCompact), which carries the identical bug.

When the user removes ~/.mempalace/ (a strong "do not auto-capture" signal), the next hook fire would silently recreate the entire dir hierarchy and ingest existing transcripts: 1. _log() at hooks_cli.py:148 unconditionally calls STATE_DIR.mkdir(parents=True, exist_ok=True), so the act of writing the hook log line recreated ~/.mempalace/hook_state/ 2. With no config file present, hook_stop_auto_save and hook_precompact_auto_save defaulted to True (no override to read) 3. The full save path then ran, materializing palace/, wal/, knowledge_graph.sqlite3, and N drawers from existing transcripts in ~/.claude/projects/*.jsonl All four entry points (hook_stop, hook_precompact, hook_session_start, and _log itself) now check a new PALACE_ROOT = Path.home() / ".mempalace" constant first and short-circuit (returning {} on stdout, never logging) when the dir is absent. The user-removable directory is now a kill-switch. Five unit tests in tests/test_hooks_cli.py cover: hook_stop / hook_precompact / hook_session_start do not create the dir when absent; _log() does not create it when absent; existing dir proceeds normally (regression). Caught in the wild on a downstream fork: ~146 drawers materialized in under a second after a deliberate `rm -rf ~/.mempalace/`, into a planning session that was explicitly not meant to be captured.

@igorls

Both @igorls and the Qodo bot flagged that `_palace_root_exists()` used `Path.exists()`, which returns True for a regular file. A stray file at `~/.mempalace` would let the kill-switch be bypassed and crash later in `STATE_DIR.mkdir()` with NotADirectoryError. Switched to `Path.is_dir()`. Also fold `_log()`'s inline check through `_palace_root_exists()` so both kill-switch sites use the same predicate. New test pins the behavior: a regular file at the palace root path is treated as absent (hook short-circuits, _log does not crash, the stray file is left untouched).

…ad them (#1244) `cmd_compress` was writing AAAK-compressed drawers to a `mempalace_compressed` collection, but every read path (`palace.get_closets_collection`, `searcher.py`, `repair.py`) reads from `mempalace_closets`. Result: for non-mined palaces (or any palace where the user ran `mempalace compress` expecting to backfill the closet/index layer), the compressed output was silently invisible — written to a collection nothing else opens. Fix the writer rather than renaming the readers: "closets" is the user-visible feature name baked into the public API (`get_closets_collection`), the searcher hybrid path, repair/HNSW diagnostics, and docs. Renaming the readers would churn 15+ call sites and the README for no benefit. The compressed AAAK strings are exactly what closets are conceptually — compact pointers scanned by an LLM to locate the right drawer — so they belong in `mempalace_closets`. Tests: - Update `test_cmd_compress_stores_results` to assert the collection name passed to `get_or_create_collection` is `mempalace_closets`. - Add `test_cmd_compress_output_readable_via_get_closets_collection`: end-to-end with a real ChromaBackend, seed a drawer, run cmd_compress, then read back via the same `get_closets_collection` helper that palace.py / searcher use. Regression test for the wrong-collection bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…(#1314) `tool_kg_add` previously accepted only `valid_from` and `source_closet`, silently dropping `valid_to`, `source_file`, and `source_drawer_id` at the MCP boundary. Backfilling already-ended historical facts therefore collapsed to "still current," and adapter provenance never reached the SQLite layer even though `KnowledgeGraph.add_triple` already supported every column. `tool_kg_invalidate` returned the literal string `"today"` whenever the caller omitted `ended`, hiding the actual stamped date from anyone trying to verify what got persisted. Changes: - Extend `tool_kg_add` signature + MCP input_schema with `valid_to`, `source_file`, `source_drawer_id`; forward all of them to `_kg.add_triple` and to the WAL log. - Resolve `ended` to `date.today().isoformat()` in `tool_kg_invalidate` before logging / returning, so the response always reports the actual date stored in `valid_to`. - Add regression tests for valid_to round-trip, source_file / source_drawer_id provenance, and the resolved-ended-date contract. - Leave TODO(#1283) markers so the open ISO-8601 validation PR can drop `validate_iso_date` over `valid_from` / `valid_to` / `ended` cleanly. The underlying `KnowledgeGraph.add_triple` already accepted these kwargs (RFC 002 §5.5) — only the MCP edge needed wiring up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cmd_init was instantiating MempalaceConfig() unconditionally, ignoring args.palace and always writing the palace under ~/.mempalace. Mirror the env-var pattern used by mcp_server.py (and consistent with how cmd_mine / cmd_status / cmd_search resolve --palace) so every downstream read of cfg.palace_path inside cmd_init — Pass 0, cfg.init(), and the post-init mine — routes to the user-specified location. Adds tests/test_cli.py::test_cmd_init_honors_palace_flag covering the regression: asserts Pass 0 receives the --palace value (not ~/.mempalace) and that MEMPALACE_PALACE_PATH is set in os.environ. Closes #1313.

…event SIGSEGV on stale HNSW (#1121, #1132, #1263) PR #1173 wired quarantine_stale_hnsw into the static make_client() helper but not into the instance _client() method. As a result every non-MCP entry point (CLI mining, search, repair, status) — which all use get_collection / _get_or_create_collection / _client() — skipped the cold-start quarantine pass and could SIGSEGV on a stale HNSW segment left over from a partial flush, replicated palace, or crashed-mid-write. Refactor: extract the (_fix_blob_seq_ids + gated quarantine_stale_hnsw) pre-open pass into a single private static helper ChromaBackend._prepare_palace_for_open(). Both make_client() and _client() now route through it, so the _quarantined_paths once-per- palace-per-process gate is preserved (no runtime thrash on hot paths) and behaviour stays identical — the fix is purely about extending the existing protection to the path that was missing it. Tests: - test_client_quarantines_corrupt_segment_on_first_open mirrors the existing make_client test and verifies _client() actually renames a corrupt segment on first open. - test_client_quarantines_only_on_first_call_per_palace verifies the cache gate prevents re-running quarantine across repeated _client() calls — important because _client() is hit on every backend op. Closes #1121. Closes #1132. Closes #1263. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`tool_diary_write` stored the `agent` metadata verbatim after `sanitize_name` (which preserves case), while `tool_diary_read` filtered by exact match — so writing as "Claude" and reading as "claude" silently returned zero rows. Both endpoints now lowercase `agent_name` immediately after sanitization. The default per-agent wing slug is also stable across casings since it's derived from the same normalized form. Behavior change: entries written prior to this fix under mixed-case agent names will not match the new lowercase filter; documented under v3.3.5 in CHANGELOG with a `mempalace repair` pointer. Adds a regression test (`test_diary_read_case_insensitive_agent`) and updates the existing `test_diary_write_and_read` to assert the new lowercase agent identity. Closes #1243 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI requires ruff format --check on the whole touched file. Pre-existing drift, no logic change.

CI requires whole-file format on touched files; pre-existing drift only.

Add `mempalace sync` CLI command and `mempalace_sync` MCP tool that prune drawers whose source files are gitignored, deleted, or moved out of the project. Reuses the existing GitignoreMatcher infrastructure in mempalace/miner.py so the same gitignore rules that block ingest also drive the corresponding cleanup. Closes #1252.

CI fix: `_classify_drawer` now resolves `source_file` symmetric to `project_roots` (which `_normalize_project_dirs` and `_auto_detect_project_roots` already `.resolve()`). Without this, on platforms where the temp directory is a symlink (macOS `/var/folders` -> `/private/var/folders`, Windows 8.3 short-name normalization), every drawer mis-bucketed as `out_of_scope` and survived prune. Perf: - `_resolve_project_root`: early-return on first match (sorted-desc precondition). - `_normalize_project_dirs`: sort `(-len(str(p)), str(p))` desc for early-return + deterministic tie-break on equal-length paths. - `_auto_detect_project_roots`: `seen_sources` dedupe so a 200-chunk file costs one disk walk, not 200. - `sync_palace` main loop: per-file classification cache; registry sentinels (`_reg_*`, `room=_registry`, `ingest_mode=registry`) routed to "kept" before cache lookup so a sentinel sharing a `source_file` with a pruned drawer cannot inherit a stale "gitignored" verdict. - Closet purge: collapse O(N) per-file purge into one `where={"source_file": {"$in": [...]}}` get + one bulk delete. Tests (5 new in `TestSyncPalace`, 38 total): - `test_symlinked_project_root_resolves`: pins symmetric resolve via real `os.symlink` (skipped on Windows). - `test_classification_cache_avoids_redundant_disk_hits`: monkeypatch counter on `_classify_drawer` asserts `call_count == 1` for 5 chunks sharing one source_file. - `test_closet_batch_purge_single_call`: wraps closets collection with `CallCountingCol` (forwards `.get`/`.delete`); asserts `delete_calls == 1` and `get_calls == 1`; expected `removed_closets` derived from `report["by_source"]` to stay robust to fixture changes. - `test_registry_check_runs_before_cache_lookup`: a regular drawer caches "gitignored" first; a sentinel with the same source_file must still be kept. - `test_normalize_project_dirs_sort_stable_on_equal_length`: pins the alphabetical secondary key when paths share length.

…n Windows Windows runs treated `/tmp/elsewhere/x.md` as relative because Windows absolute paths require a drive letter, so `_classify_drawer` routed `drawer_out_of_scope` to `no_source` instead of `out_of_scope` and `test_dry_run_classifies_correctly` failed on test-windows. `Path(tmp_dir) / "elsewhere" / "x.md"` is absolute on every platform and still lives outside the project root that the synced_world fixture exposes via `repo_path`, so the bucket assertions hold cross-platform.

feat(sync): add gitignore-aware drawer prune (#1252)

docs: clarify contributor git identity setup

fix(miner): use token-boundary matching in detect_room

…ess state

…k subprocess state

… lock subprocess state

…sw-flush fix(mcp): retry tool_search once on Chroma "Error finding id" transient (#1315)

test_palace_locks.py and test_chroma_collection_lock.py spawned child processes with the ``fork`` start method on POSIX. Under Python 3.13 this deadlocks reliably enough to hang the Linux 3.13 and macOS CI jobs indefinitely while Linux 3.9 / 3.11 / Windows complete normally. Root cause: by the time these tests run, the pytest parent process is multi-threaded — chromadb and onnxruntime both spawn background threads on import. ``fork`` snapshots the parent's address space into the child without those threads, so any lock another thread held at fork time stays locked in the child forever. Python 3.13 widened the window where Python's own internal threads can be holding locks (hence the new DeprecationWarning that fired ten times in our local 3.13 run). macOS hits a related but distinct issue: Apple's CoreFoundation explicitly forbids fork-without-exec; once anything in the parent has loaded a CF-using library (ONNX, anything via Objective-C bridges) a forked child will silently hang the moment it touches the same library. Switching to ``spawn`` re-imports modules in the child (~0.5s overhead per Process — measurable but bounded), which is the standard fix for both classes of bug. Lock-file semantics are unchanged: ``spawn`` inherits ``os.environ`` (including monkeypatched ``HOME``), which is all these tests need from the parent. Locally on Python 3.13: all 14 lock tests pass in 6.58s.

fix(tests): use spawn instead of fork for lock-test subprocesses

Several tests opened sqlite3 connections without try/finally or context-manager cleanup, relying on a flat conn.close() after the work. Any assertion failure or exception between connect and close leaked the connection until GC, producing ``ResourceWarning: unclosed database`` in CI logs. On Python 3.13 / macOS the ResourceWarning isn't just noise: a leaked connection can hold a SQLite advisory lock long enough for later test setup to block on it, which appears to be the cause of recent intermittent CI hangs on those two runners. Wrap each affected ``conn = sqlite3.connect(...)`` block in ``contextlib.closing(...)`` so cleanup runs on the failure path too. Mirrors the try/finally pattern already used in production code (searcher.py, repair.py, backends/chroma.py). No behavior change — same operations, same assertions, just deterministic cleanup. All 162 affected tests pass locally.

chore(tests): wrap sqlite3 connections in contextlib.closing

Bumps version 3.3.4 → 3.3.5 across pyproject.toml, version.py, plugin manifests, README badge, and uv.lock. Flips CHANGELOG.md from ``[3.3.5] — unreleased`` to ``[3.3.5] — 2026-05-09`` and adds entries for the four PRs that landed after the bug-fix block was authored: - Bug Fixes: #1396 (tool_search retry on transient HNSW flush) - Documentation: #1385 (CONTRIBUTING git-identity guidance, closes #1317) - Internal: #1431 (test multiprocessing fork → spawn) - Internal: #1430 (test sqlite connection lifecycle via contextlib.closing) The four open issues remaining on the v3.3.5 milestone (#1266, #1253, #1092, #1082) have been moved to v3.4 — they form the concurrent-writer / HNSW corruption cluster that needs deeper work than this cycle could absorb.

…etimes # Conflicts: # tests/test_palace_locks.py

fix(kg): accept ISO datetimes for temporal inputs

The 3.3.4 release shipped 2026-05-01 (per GitHub release v3.3.4) but its CHANGELOG header was never flipped from ``unreleased`` to the release date. Backfill while we're already touching CHANGELOG for the 3.3.5 cut.

chore(release): 3.3.5

# Conflicts: # .claude-plugin/marketplace.json # .claude-plugin/plugin.json # .codex-plugin/plugin.json # CHANGELOG.md # README.md # mempalace/version.py # pyproject.toml # uv.lock

Copilot review on PR #1434 caught that the existing 3.3.5 entry described the validator as it was authored under #1167 — accepting ``YYYY``/``YYYY-MM``/``YYYY-MM-DD`` and rejecting ISO datetimes — but PR #1417 (closes #1374) merged into develop on 2026-05-10 and inverted that: ``sanitize_iso_temporal()`` now rejects partial dates and accepts canonical UTC datetimes (``YYYY-MM-DDTHH:MM:SSZ`` / ``+00:00``). ``sanitize_iso_date()`` is kept as a backwards-compat wrapper. Update the bullet to describe the *shipped* behavior, name both functions, list both accepted and rejected forms, and call out the 3.3.4 → 3.3.5 behavior change for partial-date inputs that now error. Reference both #1167 (original) and #1374/#1417 (the expansion).

Bring in the 41 commits between v3.3.4 and v3.3.5 (KG temporal fixes, SQLite integrity preflight on repair, MineAlreadyRunning surfaces, gitignore- aware drawer prune, Windows lock-sentinel, embedding-function passing on the MCP path, etc.) while preserving the 39 local commits on kostadis-dev (palace isolation, .mempalaceignore, parallel mine pipeline, hierarchical search, closet-LLM prose mode, remote embedding providers, Cursor harness). Conflict resolutions of note: - mcp_server: kept the local per-palace cache infrastructure; ported v3.3.5's _call_kg retry helper, embedding-function plumbing in _get_collection, and the tool_reconnect backend-close path on top of it. - searcher: kept the local search_within / search_memories split. Dropped v3.3.5's inline closet boost on primary hits (the local design keeps closets in `themes` only) and the candidate_strategy='union' parameter (the local two-list shape has no merged rerank pool to apply it to). Preserved v3.3.5's BM25-only-via-sqlite key shape under both `primary` and `results` so consumers on either side keep working. - miner: kept the local parallel _prepare_file / _embed_prepared / _write_prepared pipeline; back-ported v3.3.5's MAX_CHUNKS_PER_FILE guard into _prepare_file. Let MineAlreadyRunning propagate from mine() so the CLI can render the holder-aware message. - palace: made the mine_palace_lock holder set process-wide (was threading.local) so worker threads in the parallel mine pipeline can re-enter the outer lock that the orchestrator already holds. Used RLock so _held_by_this_thread can compose with _holder_state. - hooks_cli: kept the local Cursor harness + chat-palace pin; routed the transcript ingest through v3.3.5's _spawn_mine PID-guard helper with --palace pinned to the chat palace (isolation invariant #1). Tests: 2002 pass, 14 skipped (5 groups documented in-file: v3.3.5 features that don't apply to the local design), 1 deselected. Backup of pre-merge tip: kostadis-dev-backup-pre-3.3.5-rebase. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…aces Two sync bugs found during a Phandalin palace rebuild. 1. (critical, data-loss) `mempalace sync` was applying every ancestor `.mempalaceignore` from the user-passed project root down to the drawer's parent — including a root-level ignore file whose only purpose is to tell the ROOT wing's mine to skip each sub-wing's source dir. Result: every sub-wing drawer matched the ignore pattern and got flagged for deletion; `--apply` would wipe the wing. Fix: detect each drawer's per-wing source root by walking parents to the nearest `mempalace.yaml`; matcher loading starts there, so ignore files above the wing's own config marker are out of scope. Per-directory cache amortizes the walk across the wing. 2. (annoyance) `cmd_sync` was reading `--palace` as a raw filesystem path instead of routing through `_resolve_cli_palace`, so named- palace aliases worked for every other subcommand except `sync`. Tests: - root .mempalaceignore listing sub-wing dir leaves sub-wing drawers Kept - per-wing .mempalaceignore inside the wing's root still takes effect - root wing (whose mempalace.yaml lives at the project root) still sees the root ignore file - CLI `--palace <alias>` resolves through the named-palace map in sync Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

kostadis · 2026-05-18T05:49:37Z

Added a sync bug fix on top of the v3.3.5 merge (commit edbec59):

fix(sync): scope .mempalaceignore matchers per-wing + honor named palaces

Two mempalace sync bugs surfaced while trying to use the new sync feature against a multi-wing Phandalin palace (full report in mempalace_sync_bug_report.md, untracked locally).

Bug 1 — critical, data-loss potential. sync was loading every ancestor .mempalaceignore between the user-passed project root and the drawer's parent dir. A root-level ignore file that lists each sub-wing's source dir (so the ROOT wing's mine skips them) was matching every legitimate sub-wing drawer, flagging them as gitignored. sync --apply would have wiped the wing.

Fix in mempalace/sync.py: each drawer's matcher root is now the nearest mempalace.yaml ancestor of its source_file (the per-wing source root the miner already uses on the way in). Ignore files above that marker are out of scope. Wing-root lookups cached per directory.

Bug 2 — annoyance. cmd_sync was reading --palace as a raw filesystem path, so named-palace aliases (--palace phandalin) worked for every other subcommand except sync. One-line fix to route through _resolve_cli_palace.

Tests added (all passing)

tests/test_sync.py::test_root_mempalaceignore_does_not_flag_sub_wing_drawers — direct repro of Bug 1
tests/test_sync.py::test_sub_wing_local_mempalaceignore_still_honored — per-wing ignore files inside the wing's own root still take effect
tests/test_sync.py::test_root_wing_with_own_yaml_still_uses_root_ignore — root wing's drawers still see the root ignore file when its mempalace.yaml lives at the project root
tests/test_sync.py::TestSyncCli::test_named_palace_alias_resolves — CLI --palace <alias> resolves through the named-palace map in sync

41/42 sync tests pass; the one failure (test_metadata_cache_cleared_on_exception) is pre-existing on this branch and unrelated — it asserts on mempalace.mcp_server._metadata_cache which doesn't exist here.

Out of scope

The bug report also flagged HNSW segment quarantining within minutes of writes. That's a chromadb durability concern and is not addressed here.

🤖 Generated with Claude Code

eldar702 and others added 30 commits April 19, 2026 11:13

fix(chroma): harden HNSW startup preflight

c3e1104

fix(repair): rebuild collections through temp staging

7fa27bd

fix(cli): restore backup on repair failure

2f509b4

fix: avoid false hnsw divergence fallback

0e32b96

Merge pull request #1303 from MemPalace/fix/mcp-server-missing-embedd…

5ddaf7a

…ing-function fix(mcp_server): pass embedding_function= on collection reopen (#1299)

style: ruff format cli.py (#1244)

4b0fc44

CI requires ruff format --check on the whole touched file. Pre-existing drift, no logic change.

style: ruff format touched files (PR #1322)

b4a9f2a

CI requires whole-file format on touched files; pre-existing drift only.

style: ruff format test_mcp_server.py (PR #1320)

6ffbf6f

style: ruff format tests/test_cli.py (PR #1319)

2857948

mvalentsev and others added 27 commits May 9, 2026 03:16

style(test_sync): match CI ruff 0.4.x format

18f8778

Merge pull request #1421 from mvalentsev/feat/sync-gitignore-prune

5453ad7

feat(sync): add gitignore-aware drawer prune (#1252)

Merge pull request #1385 from sjhddh/docs/contributor-git-identity-1317

1c26ece

docs: clarify contributor git identity setup

Merge pull request #1004 from coogie/coogie/fix/miner-routing

2fc47a5

fix(miner): use token-boundary matching in detect_room

Update conftest.py as part of fix(tests): isolate palace lock subproc…

f34b78c

…ess state

Update test_palace_locks.py as part of fix(tests): isolate palace loc…

00041ca

…k subprocess state

style: format test_palace_locks as part of fix(tests): isolate palace…

710fa52

… lock subprocess state

fix(tests): format palace lock test

03f9a73

style: ruff format tests/test_mcp_server.py for ruff <0.5

b2ce45d

Merge pull request #1396 from JPdeB61/fix/1315-mcp-search-retry-on-hn…

d9e60d8

…sw-flush fix(mcp): retry tool_search once on Chroma "Error finding id" transient (#1315)

Merge pull request #1431 from MemPalace/fix/test-multiprocessing-spawn

7d5e6c4

fix(tests): use spawn instead of fork for lock-test subprocesses

Merge pull request #1430 from MemPalace/chore/test-sqlite-cleanup

b9cec32

chore(tests): wrap sqlite3 connections in contextlib.closing

Merge remote-tracking branch 'origin/develop' into pr-1417-fix-kg-dat…

b2bab38

…etimes # Conflicts: # tests/test_palace_locks.py

Merge pull request #1417 from fatkobra/fix/1374-kg-iso-datetimes

36100f8

fix(kg): accept ISO datetimes for temporal inputs

docs(changelog): backfill 3.3.4 release date

6e9d057

The 3.3.4 release shipped 2026-05-01 (per GitHub release v3.3.4) but its CHANGELOG header was never flipped from ``unreleased`` to the release date. Backfill while we're already touching CHANGELOG for the 3.3.5 cut.

Merge pull request #1432 from MemPalace/chore/release-3.3.5

1247e17

chore(release): 3.3.5

Merge develop into main for v3.3.5 release

40524d5

# Conflicts: # .claude-plugin/marketplace.json # .claude-plugin/plugin.json # .codex-plugin/plugin.json # CHANGELOG.md # README.md # mempalace/version.py # pyproject.toml # uv.lock

Merge pull request #1434 from MemPalace/chore/release-3.3.5-prep

d0163a7

kostadis merged commit f44612e into kostadis-dev May 18, 2026
1 check passed

kostadis mentioned this pull request May 18, 2026

fix(sync): cover auto-detected wings + decouple .mempalaceignore/.gitignore #16

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge upstream v3.3.5 into kostadis-dev#15

Merge upstream v3.3.5 into kostadis-dev#15
kostadis merged 193 commits into
kostadis-devfrom
merge/v3.3.5-into-kostadis-dev

kostadis commented May 18, 2026

Uh oh!

kostadis commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

kostadis commented May 18, 2026

Summary

Tests

Test plan

Safety notes

Uh oh!

kostadis commented May 18, 2026

fix(sync): scope .mempalaceignore matchers per-wing + honor named palaces

Tests added (all passing)

Out of scope

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants