What happened?
I ran mempalace repair --yes on a palace with ~196K drawers. The command crashed mid-rebuild and left the palace in a state with only 235 embeddings, no backup, and no way to recover without the drawers_export.json.
Two separate failures occurred in sequence.
Failure 1: FTS5 table not registered
Before repair would even start, it hit:
chromadb.errors.InternalError: Database error: error returned from database:
(code: 1) no such table: embedding_fulltext_search
The FTS5 shadow tables were present in sqlite_master but embedding_fulltext_search itself was not registered. Had to fix it manually:
DROP TABLE IF EXISTS embedding_fulltext_search_config;
DROP TABLE IF EXISTS embedding_fulltext_search_content;
DROP TABLE IF EXISTS embedding_fulltext_search_data;
DROP TABLE IF EXISTS embedding_fulltext_search_docsize;
DROP TABLE IF EXISTS embedding_fulltext_search_idx;
CREATE VIRTUAL TABLE embedding_fulltext_search USING fts5(string_value, tokenize='trigram');
INSERT INTO embedding_fulltext_search (rowid, string_value)
SELECT rowid, string_value FROM embedding_metadata;
(~2M rows, took a few minutes.)
Failure 2: Compaction crash mid-rebuild
After the FTS5 fix, repair extracted all drawers fine, then crashed:
chromadb.errors.InternalError: Error in compaction: Failed to apply logs to the metadata segment
State after the crash:
- The original collection was already deleted from SQLite before the crash hit.
- Only 235 embeddings had been re-inserted when it died.
palace.backup was overwritten at the start of the repair run, so it held the broken 235-embedding state, not the original.
- The only copy of the full 196K drawers was
drawers_export.json (560 MB).
What did you expect?
Either the repair succeeds, or it fails without destroying the original data. Deleting the SQLite collection before the new inserts are done means a mid-rebuild crash leaves nothing to fall back to.
How to reproduce:
- Palace with ~196K+ drawers on ChromaDB 1.5.8
mempalace repair --yes
- Observe crash with
Error in compaction: Failed to apply logs to the metadata segment
I suspect this is more likely to trigger on larger palaces due to how the Rust compaction backend handles big collections, but I haven't confirmed a minimum size threshold.
What actually fixed it:
# 1. Delete only the HNSW segment directory (leaves SQLite untouched)
rm -rf ~/.mempalace/palace/<uuid>/
# 2. Trigger HNSW recreation from SQLite
mempalace status
# 3. Re-import missing drawers using mempalace's ChromaBackend
# (so _HNSW_BLOAT_GUARD gets applied automatically)
python3 import_missing.py
The SQLite data was intact the whole time. Only the HNSW index was broken. Repair didn't need to touch SQLite at all.
Suggestions
Short term: --mode hnsw-only flag
Most "palace won't start" cases I've hit involve a corrupted or bloated HNSW index on top of intact SQLite. Deleting just the HNSW segment directory is enough:
rm -rf ~/.mempalace/palace/<uuid>/
mempalace status
A --mode hnsw-only option doing exactly this would cover the majority of cases safely. No re-embedding, no SQLite surgery, no data-loss risk.
Medium term: keep the previous backup until rebuild succeeds
repair overwrites palace.backup at the start of each run. Keeping palace.backup.prev until the new build completes would give users a real fallback if the rebuild fails.
Long term: catch the compaction error before touching SQLite
If ChromaDB throws Error in compaction, repair should abort before wiping the original collection data, not after.
Environment:
- OS: WSL2 / Debian (Linux x86_64)
- Python version: 3.12
- MemPalace version: 3.3.3
- ChromaDB version: 1.5.8
What happened?
I ran
mempalace repair --yeson a palace with ~196K drawers. The command crashed mid-rebuild and left the palace in a state with only 235 embeddings, no backup, and no way to recover without thedrawers_export.json.Two separate failures occurred in sequence.
Failure 1: FTS5 table not registered
Before repair would even start, it hit:
The FTS5 shadow tables were present in
sqlite_masterbutembedding_fulltext_searchitself was not registered. Had to fix it manually:(~2M rows, took a few minutes.)
Failure 2: Compaction crash mid-rebuild
After the FTS5 fix, repair extracted all drawers fine, then crashed:
State after the crash:
palace.backupwas overwritten at the start of the repair run, so it held the broken 235-embedding state, not the original.drawers_export.json(560 MB).What did you expect?
Either the repair succeeds, or it fails without destroying the original data. Deleting the SQLite collection before the new inserts are done means a mid-rebuild crash leaves nothing to fall back to.
How to reproduce:
mempalace repair --yesError in compaction: Failed to apply logs to the metadata segmentI suspect this is more likely to trigger on larger palaces due to how the Rust compaction backend handles big collections, but I haven't confirmed a minimum size threshold.
What actually fixed it:
The SQLite data was intact the whole time. Only the HNSW index was broken. Repair didn't need to touch SQLite at all.
Suggestions
Short term:
--mode hnsw-onlyflagMost "palace won't start" cases I've hit involve a corrupted or bloated HNSW index on top of intact SQLite. Deleting just the HNSW segment directory is enough:
A
--mode hnsw-onlyoption doing exactly this would cover the majority of cases safely. No re-embedding, no SQLite surgery, no data-loss risk.Medium term: keep the previous backup until rebuild succeeds
repairoverwritespalace.backupat the start of each run. Keepingpalace.backup.prevuntil the new build completes would give users a real fallback if the rebuild fails.Long term: catch the compaction error before touching SQLite
If ChromaDB throws
Error in compaction, repair should abort before wiping the original collection data, not after.Environment: