mempalace repair crashes mid-rebuild on ChromaDB 1.5.8, leaving palace unrecoverable

**What happened?**

I ran `mempalace repair --yes` on a palace with ~196K drawers. The command crashed mid-rebuild and left the palace in a state with only 235 embeddings, no backup, and no way to recover without the `drawers_export.json`.

Two separate failures occurred in sequence.

**Failure 1: FTS5 table not registered**

Before repair would even start, it hit:

```
chromadb.errors.InternalError: Database error: error returned from database:
(code: 1) no such table: embedding_fulltext_search
```

The FTS5 shadow tables were present in `sqlite_master` but `embedding_fulltext_search` itself was not registered. Had to fix it manually:

```sql
DROP TABLE IF EXISTS embedding_fulltext_search_config;
DROP TABLE IF EXISTS embedding_fulltext_search_content;
DROP TABLE IF EXISTS embedding_fulltext_search_data;
DROP TABLE IF EXISTS embedding_fulltext_search_docsize;
DROP TABLE IF EXISTS embedding_fulltext_search_idx;
CREATE VIRTUAL TABLE embedding_fulltext_search USING fts5(string_value, tokenize='trigram');
INSERT INTO embedding_fulltext_search (rowid, string_value)
  SELECT rowid, string_value FROM embedding_metadata;
```

(~2M rows, took a few minutes.)

**Failure 2: Compaction crash mid-rebuild**

After the FTS5 fix, repair extracted all drawers fine, then crashed:

```
chromadb.errors.InternalError: Error in compaction: Failed to apply logs to the metadata segment
```

State after the crash:
- The original collection was already deleted from SQLite before the crash hit.
- Only 235 embeddings had been re-inserted when it died.
- `palace.backup` was overwritten at the start of the repair run, so it held the broken 235-embedding state, not the original.
- The only copy of the full 196K drawers was `drawers_export.json` (560 MB).

**What did you expect?**

Either the repair succeeds, or it fails without destroying the original data. Deleting the SQLite collection before the new inserts are done means a mid-rebuild crash leaves nothing to fall back to.

**How to reproduce:**

1. Palace with ~196K+ drawers on ChromaDB 1.5.8
2. `mempalace repair --yes`
3. Observe crash with `Error in compaction: Failed to apply logs to the metadata segment`

I suspect this is more likely to trigger on larger palaces due to how the Rust compaction backend handles big collections, but I haven't confirmed a minimum size threshold.

**What actually fixed it:**

```bash
# 1. Delete only the HNSW segment directory (leaves SQLite untouched)
rm -rf ~/.mempalace/palace/<uuid>/

# 2. Trigger HNSW recreation from SQLite
mempalace status

# 3. Re-import missing drawers using mempalace's ChromaBackend
#    (so _HNSW_BLOAT_GUARD gets applied automatically)
python3 import_missing.py
```

The SQLite data was intact the whole time. Only the HNSW index was broken. Repair didn't need to touch SQLite at all.

**Suggestions**

**Short term: `--mode hnsw-only` flag**

Most "palace won't start" cases I've hit involve a corrupted or bloated HNSW index on top of intact SQLite. Deleting just the HNSW segment directory is enough:

```bash
rm -rf ~/.mempalace/palace/<uuid>/
mempalace status
```

A `--mode hnsw-only` option doing exactly this would cover the majority of cases safely. No re-embedding, no SQLite surgery, no data-loss risk.

**Medium term: keep the previous backup until rebuild succeeds**

`repair` overwrites `palace.backup` at the start of each run. Keeping `palace.backup.prev` until the new build completes would give users a real fallback if the rebuild fails.

**Long term: catch the compaction error before touching SQLite**

If ChromaDB throws `Error in compaction`, repair should abort before wiping the original collection data, not after.

**Environment:**
- OS: WSL2 / Debian (Linux x86_64)
- Python version: 3.12
- MemPalace version: 3.3.3
- ChromaDB version: 1.5.8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mempalace repair crashes mid-rebuild on ChromaDB 1.5.8, leaving palace unrecoverable #1238

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

mempalace repair crashes mid-rebuild on ChromaDB 1.5.8, leaving palace unrecoverable #1238

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions