Skip to content

fix: handle lone surrogates in MCP write tools#1235

Open
YuanYiZheXue wants to merge 1 commit into
MemPalace:developfrom
YuanYiZheXue:develop
Open

fix: handle lone surrogates in MCP write tools#1235
YuanYiZheXue wants to merge 1 commit into
MemPalace:developfrom
YuanYiZheXue:develop

Conversation

@YuanYiZheXue
Copy link
Copy Markdown

Problem

MCP write tools (mempalace_add_drawer, mempalace_diary_write) fail with UnicodeEncodeError when processing strings containing lone surrogates injected by MCP clients like WorkBuddy.

Solution

Add _clean() function that removes lone surrogates using surrogatepass/surrogateignore encoding before writing to ChromaDB.

Changes

  • tool_add_drawer: Clean content, wing, room, added_by before ChromaDB upsert
  • tool_diary_write: Clean entry, topic, agent_name, wing before ChromaDB add
  • SHA256 hash: Use surrogatepass encoding to avoid errors

Testing

mcp__mempalace__mempalace_add_drawer --wing test --room unicode --content "????"
mcp__mempalace__mempalace_diary_write --agent_name test --entry "SESSION:2026-04-27"

See docs/fix-lone-surrogate.md for full details.

@igorls igorls added area/mcp MCP server and tools bug Something isn't working labels May 2, 2026
@igorls igorls added this to the v3.3.5 milestone May 2, 2026
@igorls
Copy link
Copy Markdown
Member

igorls commented May 9, 2026

Thanks for the surrogate fix — \udc95-style lone surrogates from MCP clients are a real problem and the _clean() approach is the right shape. Unfortunately I can't merge this PR as-is; flagging the blockers so we can land the fix properly in a fresh PR.

Destructive rebase against an old base

The diff (+181 / -213 across 2 files) is much larger than the description suggests. Most of the deletions remove recent safety fixes that landed on develop after this branch forked:

These were intentional fixes for known crash modes; merging this PR would re-open all of them.

Hardcoded contributor paths

Two changes can't apply to any other user:

  • mcp_server.py adds logging.FileHandler(r\"C:\Users\SJC\mempalace_mcp.log\", encoding=\"utf-8\") — would crash on every non-SJC machine on Windows, and on Linux/macOS entirely
  • docs/fix-lone-surrogate.md references D:\ProgramData\Python312\Lib\site-packages\mempalace\mcp_server.py and D:\gitfork\mcp_server.py — local-machine paths that don't belong in repo docs

Path forward

The actual fix is small (~10 lines: _clean() helper + use it in tool_add_drawer and tool_diary_write + surrogatepass on the SHA256 hashes). Would you be willing to open a fresh PR against current develop with just those changes? Happy to review quickly.

Removing from the v3.3.5 milestone for now.

@igorls igorls removed this from the v3.3.5 milestone May 9, 2026
@YuanYiZheXue
Copy link
Copy Markdown
Author

YuanYiZheXue commented May 9, 2026 via email

@igorls igorls added this to the v3.3.6 milestone May 15, 2026
YuanYiZheXue pushed a commit to YuanYiZheXue/mempalace that referenced this pull request May 18, 2026
Add tests/test_clean_lone_surrogates.py covering:

Unit tests (TestCleanLoneSurrogates, 11 cases):
- _clean() passes normal ASCII and CJK strings unchanged
- lone surrogates (high/low, single/multiple) are replaced with U+FFFD
- real emoji (\U0001f600 astral code points) pass through unchanged
- empty string, all-surrogate string, SHA-256-hash-after-clean
- the specific \udcad surrogate observed in WorkBuddy production logs

Integration tests (TestLoneSurrogateCleaning, 6 cases):
- tool_add_drawer: surrogate in content and in metadata fields
- tool_check_duplicate: surrogate in query
- tool_search: surrogate in search query
- tool_update_drawer: surrogate in updated content
- tool_diary_write: surrogate in diary entry

Fix test environment issue:
- conftest.py redirects HOME to a temp dir, causing chromadb's
  ONNXMiniLM_L6_V2 to look for its ONNX model in the wrong location
  and trigger a 79 MB network download on every run.
- Fix: at module import time, recover the real USERPROFILE from
  conftest._original_env and patch ONNXMiniLM_L6_V2.DOWNLOAD_PATH
  before any ChromaDB collection fixture is invoked.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/mcp MCP server and tools bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants