Skip to content

feat: unified memory (v1.42.0)#125

Open
moekatib wants to merge 29 commits intomainfrom
feat/unified-memory
Open

feat: unified memory (v1.42.0)#125
moekatib wants to merge 29 commits intomainfrom
feat/unified-memory

Conversation

@moekatib
Copy link
Copy Markdown
Contributor

Summary

Folds the standalone @withone/mem package into the One CLI as a pluggable memory subsystem, and makes it the primary target for sync. Ships 18 commits on feat/unified-memory — every commit stands alone and every commit stays green.

What's new

  • Zero-config memory: one mem add note '{...}' on a fresh install auto-initializes pglite with sensible defaults. No mem init prerequisite.
  • Top-level OpenAI key: stored as config.openaiApiKey alongside apiKey, same precedence as ONE_SECRET (env > .onerc > project > global). Three equivalent setups: one init (prompt), one mem config set embedding.apiKey sk-..., or OPENAI_API_KEY=sk-....
  • Upgrade hints: every mem status / mem search / mem doctor response carries a structured _upgrade block when semantic search is available but off — agent can relay it to the user.
  • Agent-declared memory.searchable: profiles carry dot-paths that drive embedded + FTS text. Supports numeric indexes and [] wildcards (messages[].snippet). Preview via sync test --show-searchable before paying embedding cost.
  • Memory-primary sync: every sync run writes into mem_records. SQLite dual-write is kept only for profiles with an enrich phase (enrich still reads unenriched rows via SQL — follow-up slice rewrites that).
  • one mem sync full alias of one sync: same handlers, same options — single source of truth, one command tree.
  • Sync state moves to mem_sync_state table; legacy .one/sync/state/*.json files auto-migrated on first access.
  • sync query supports dotted --where paths (values.job_title[0].value like %Engineering%). sync sql retired — raw SQL can't safely span PGlite / Postgres / third-party backends.
  • Replace-semantics upsert (p_replace flag on mem_upsert_by_keys): synced rows replace data wholesale so fields removed upstream actually disappear from memory.
  • Test-isolation bug fix: CONFIG_DIR / CONFIG_FILE / PROJECTS_DIR resolved lazily via getters so tests that set $HOME in before() hooks can properly isolate.

Deprecations (non-breaking)

  • --to-memory flag on sync run — silent no-op (memory is now always written). Use --no-memory to opt out.
  • sync sql — errors with a pointer to mem search / mem list. No raw SQL surface in the memory subsystem.

Migration

  • Existing installs: one mem migrate reads legacy .one/sync/data/*.db files into the unified store. --cleanup deletes them after.
  • Legacy sync-state JSON files migrated on first access, then removed.
  • No config changes required — everything auto-detects.

Docs

skills/one/SKILL.md, one guide memory (new topic), one guide sync, README.md, and one --help all updated.

Test plan

  • 65/65 unit tests green (added 4 new wildcard extraction tests)
  • Sub-agent verification: 4 isolated + real-HOME agents covered zero-config, OpenAI key lifecycle, CLI surface + mem regression, and Gmail end-to-end
  • Gmail sync with enrich phase: 100 records list-synced + 100 enriched, landed in memory with full thread bodies
  • Attio sync (non-enrich): runs memory-only, no .db file created, 100 records in 1.7s
  • Semantic search ranks the right Attio people for "venture capital investor" and "head of engineering" after clean memory.searchable paths
  • Upgrade hint appears on FTS-only installs, disappears when key is set
  • Build + typecheck: no new TS errors (5 pre-existing unchanged)
  • Full Gmail / Fathom enrich-profile end-to-end on a second machine
  • one mem migrate against a real legacy .one/sync/data/ directory

Follow-up slices (separate PRs)

  1. Rewrite enrichPhase to query memory (data._enriched_at IS NULL) so SQLite can be dropped for enrich profiles too; better-sqlite3 becomes truly optional.
  2. Gmail passthrough q: URL-encoding bug — q: "category:primary" succeeds via actions execute but fails via sync run. Not scope for this branch.
  3. Legacy @withone/mem 2.0.0 deprecation release pointing at one mem.

🤖 Generated with Claude Code

moekatib and others added 26 commits April 21, 2026 16:19
Single-branch delivery plan for collapsing @withone/mem and one sync into
one CLI subsystem with a pluggable backend architecture. Default backend is
PGlite (embedded Postgres), with Postgres shipped as a first-party plugin
and third-party plugins supported via dynamic import.

Schema consolidates external_refs as a sources JSONB column on mem_records
keyed by prefixed source ids, and keeps mem_links separate for bidirectional
graph traversal.

Embeddings are optional, configured at one mem init, OpenAI key stored in
~/.one/config.json (mode 0600) alongside ONE_SECRET with the same
env/onerc/project/global precedence chain.
Wires the skeleton for folding @withone/mem and one sync into one CLI
subsystem with a pluggable backend architecture.

Contract + registry:
- MemBackend interface + MemBackendPlugin factory + BackendCapabilities
- plugins.ts registry with dynamic loader for third-party plugins declared
  in memory.plugins config

First-party plugins (stubbed):
- pglite: default, embedded Postgres, caps all true except concurrentWriters
- postgres: Supabase/Neon/self-hosted via node-pg, caps all true

Pure helpers implemented:
- schema.ts: full DDL + PL/pgSQL functions (mem_upsert_by_keys,
  mem_calculate_relevance, mem_hybrid_search, mem_enforce_key_uniqueness)
- canonical.ts: deterministic JSON + sha256 for content_hash
- scoring.ts: TS port of mem_calculate_relevance for in-memory ranking

Config:
- memory block in ~/.one/config.json alongside ONE_SECRET (mode 0600)
- OPENAI_API_KEY env > .onerc > project > global precedence
- per-backend config keyed by plugin name so new plugins compose in

CLI surface:
- `one mem` registered with all subcommands discoverable via --help
- `one mem status` fully working; returns config + registered plugin
  descriptors + capability matrix
- All other subcommands return a scaffolded-not-implemented note

Tests: canonical.test.ts + scoring.test.ts (20 assertions, all green)
Typecheck: clean on the new module (no new errors vs baseline)
The shared query layer (CoreBackend) implements MemBackend over a PgClient
abstraction. Both first-party plugins are now thin adapters:

- PGlite plugin: lazy-imports @electric-sql/pglite + the vector extension,
  routes no-param queries through exec() (PGlite rejects multi-statement
  SQL via query()), and falls back to an in-memory DB when dbPath is
  ":memory:". Single-writer; transactions reuse the outer client.
- Postgres plugin: lazy-imports node-pg, opens a Pool on the configured
  connection string, transactions take a dedicated client.

CoreBackend implements the full MemBackend surface:
- Records: insert, upsertByKeys (via mem_upsert_by_keys server function
  that merges data + unions tags/keys/sources into existing rows when
  keys overlap), getById (with optional link hydration), update, remove,
  archive/unarchive, list.
- Search: hybrid when embedding provided + vectorSearch:true; otherwise
  FTS-only fallback. trackAccess bumps access_count on returned rows.
- Context: relevance-ranked active records via mem_calculate_relevance.
- Graph: link/unlink/linked with bidirectional traversal semantics.
- Sources: JSONB map ops (addSource also extends the keys array so
  findBySource works via the keys GIN index).
- Sync state: upsert, getOne, listAll.
- Hot columns: partial expression indexes scoped by type.
- Maintenance: vacuum, stats.

Schema change: dropped pg_trgm from EXTENSIONS_SQL. The original mem schema
required it but nothing in the unified query layer uses trigram matching,
and PGlite doesn't ship pg_trgm by default. Vector + tsvector cover every
current query. The comment notes this can be added back via an optional
extension hook if fuzzy-match search lands.

Tests: 12-assertion live integration suite against a fresh in-memory PGlite
(ensureSchema + insert + upsertByKeys merge semantics + findBySource +
addSource + graph link/linked + FTS + context + archive/unarchive +
sync state + stats). All 57 tests green; typecheck clean on the memory
module with no new errors vs baseline.
…raph, admin)

The memory-layer CLI surface is now functional against a live PGlite backend.
Every command listed in docs/plans/unified-memory.md §6 either dispatches to
a working handler or prints a clear deprecation/migration note.

Embedding provider + orchestration:
- lib/memory/embedding.ts: OpenAI embeddings (single + batch), retry with
  backoff, content-hash gate, graceful fallback to null when provider=none
  or no API key. defaultSearchableText() extracts a capped string value
  from arbitrary JSON as the fallback for records with no profile-defined
  searchable template.
- lib/memory/runtime.ts: process-local backend singleton with lazy
  init+ensureSchema, plus addRecord/upsertRecord helpers that derive
  searchable_text, content_hash, and (optionally) embedding before calling
  the backend. Precedence: opts.embed → input.embed → config default.

Config + init + doctor:
- commands/mem/init.ts: interactive @clack/prompts flow + fully-flagged
  non-interactive mode. OpenAI key hidden-input, stored in
  ~/.one/config.json (mode 0600, same file as ONE_SECRET). Backend warmup
  (open + ensureSchema + version check) runs before returning success.
- commands/mem/config.ts: get/set/unset dot-path access with automatic
  secret redaction (embedding.apiKey, postgres.connectionString).
  --show-secrets opt-in to reveal.
- commands/mem/doctor.ts: 7-check health report (config, plugin resolve,
  backend open, schema version, stats, embedding reachable, capability
  consistency). Non-zero exit when any check fails.

Record operations:
- commands/mem/records.ts: add/get/update/archive/weight/flush/list,
  search (FTS-only default; --deep forces semantic when available),
  context (relevance-sorted), link/unlink/linked, sources/find-by-source.

Admin/IO:
- commands/mem/admin.ts: vacuum, reindex (re-embed under a new model).
- commands/mem/export.ts: JSONL export/import, idempotent via keys.
- commands/mem/migrate.ts: imports legacy .one/sync/data/*.db files into
  the unified store using each profile's idField + identityKey. Optional
  --cleanup removes legacy files after confirmation. Respects --dry-run.

Build: tsup externals extended to @electric-sql/pglite (+vector) and pg
so runtime WASM/pure-TS assets load from node_modules instead of being
inlined into the bundle.

Typecheck: clean on the memory module (5 pre-existing errors unchanged).
Tests: 57/57 green. Live smoke test verified init → add → search →
doctor → status pipeline against a throwaway HOME.
Opt-in dual-write integration between the sync runner and the unified
memory backend. Motivated by §9 of the plan but scoped conservatively:
synced rows continue landing in SQLite as they do today; when
`--to-memory` is passed, each page ALSO flows through `upsertRecord` into
mem_records. This lets the dual path be verified on real data before we
flip memory to primary.

- lib/sync/mem-writer.ts: writePageToMemory(profile, records) with
  prefixed source key (`<platform>/<model>:<id>`), optional identity
  key promoted to a second mem key (`email:...`, `phone:...`, `domain:...`,
  or `id:...` heuristic from the profile's identityKey dot path), tags
  = ['synced', platform], source entry with last_synced_at, and a
  per-record strip of any `_`-prefixed sync-internal fields. Errors are
  swallowed per-record so a single bad row never breaks the sync.
- lib/sync/types.ts: SyncRunOptions.toMemory flag.
- lib/sync/runner.ts: after the existing SQLite upsert, if toMemory is
  set, call writePageToMemory for the same page. Failures are logged
  but never fail the sync.
- lib/sync/index.ts: `one sync run --to-memory` surface.
- commands/mem.ts: updated the `one mem sync` placeholder to point at
  `one sync --to-memory` and note that canonical alias delegation is a
  follow-up slice.

Test: lib/sync/mem-writer.test.ts — 4 live-PGlite assertions covering
first-page write with prefixed keys + identity merge, idempotent re-run
(update not insert), skip-on-missing-id, and the underscore-field strip.
4/4 green; full suite 61/61.

Deliberately out-of-scope for this commit (tracked for follow-up):
- Physical folder move src/lib/sync/ → src/lib/memory/sync/. Pure rename,
  touches 3 external imports, zero behavioral impact, cleanly separable.
- Memory as the primary write target (replaces SQLite) — gated on the
  dual-write proving itself in Moe's destructive exploration pass.
- Full `one mem sync` alias delegation — waits on the above.
UX: memory auto-initializes on first `one mem` call with pglite defaults
— no `mem init` prerequisite. Picks `embedding.provider: openai` when an
OpenAI key is already resolvable (env / .onerc / config), else stays
`none`. One-line stderr breadcrumb on TTY; silent in --agent mode.

`one init` grows an optional skip-able OpenAI-key prompt, stored at the
top level as `config.openaiApiKey` (peer of `apiKey`). Full precedence
chain mirrors ONE_SECRET: env > .onerc > project config > global.
`mem config set embedding.apiKey` transparently redirects to the
top-level field and strips any stale value from the memory block.
Secret redaction on all read surfaces.

AX: sync profiles gain `memory.searchable: [dot-paths]`. Declared
fields drive the embedded + FTS text so agents produce clean, signal-
dense embeddings instead of the 90%-noise default walker output. No
declaration = fallback to `defaultSearchableText`.

`one sync test <platform>/<model> --show-searchable` previews the exact
text that would be embedded, with per-path resolution (✓ / empty) and
sample values. Agents iterate before paying the embedding cost.

Fix: CONFIG_DIR / CONFIG_FILE / PROJECTS_DIR now resolved lazily via
getters (were module-bound to os.homedir() at import time). Tests that
set process.env.HOME in before() hooks could not previously isolate —
this surfaced as the real ~/.one/config.json being overwritten with a
test stub during `npm test`.

Verified: Attio people sync with clean `memory.searchable` paths
produces 158-char embedded text (vs 2297 chars from the default
walker) and "venture capital investor" / "head of engineering"
queries return semantically coherent rankings.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
When memory is running in FTS-only mode (no OpenAI key, or provider still
`none`), every agent-facing response from `mem status`, `mem search`, and
`mem doctor` now includes a structured `_upgrade` block:

  {
    "capability": "semantic_search",
    "available": true,
    "currentMode": "fts_only",
    "how": "Add an OpenAI key: `one init` (then \"Add OpenAI key\"), or
            `one mem config set embedding.apiKey sk-...`",
    "benefit": "Ranks memories by meaning, not just keyword overlap..."
  }

The agent can now tell its user "semantic search is available as an
upgrade" — previously this capability was silently degraded, so agents
never mentioned it. Human TTY output gets a matching dim one-liner.

`mem search` also gains a top-level `searchMode` field ("fts_only" |
"hybrid") so the mode is inspectable without re-deriving it from config.

Hint only appears when the capability is actually off — zero noise for
already-configured installs.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two correctness fixes surfaced during the Attio dual-write derisking.

1. mem_upsert_by_keys learns a p_replace flag (default FALSE). When
   TRUE, the existing record's `data` is REPLACED wholesale by the
   incoming payload instead of shallow-merged. Sync callers pass TRUE
   so fields removed at the source actually disappear from memory;
   interactive `mem add` / `mem update` keep the default merge path
   because those are patches, not snapshots.

   Threaded through: MemBackend.upsertByKeys signature gains an
   UpsertOptions arg, postgres-core plugin passes it to the SQL
   function, runtime.upsertRecord forwards opts.replace, mem-writer
   sets replace: true on every sync page.

2. profiles/attio/attioPeople.json + attioCompanies.json were pointing
   at a stale composer action (::attio-people-list) whose ID fails
   base64-decode at the gateway. Swapped both to the passthrough
   "List an Object's Records" endpoint with:

   - pathVars: { object: "people" | "companies" }
   - resultsPath: "data"
   - identityKey: "values.email_addresses[0].email_address" /
                  "values.domains[0].domain"
   - transform: jq flattens id.record_id → id so the runner's
                bracket-access idField extraction works

   Verified: `one sync init attio attioPeople` now lands a working
   profile with all 6 test checks green, `sync run --to-memory` pulls
   100 records cleanly, find-by-source resolves to the right record
   with the identity key intact.

Also honors the feedback_sync_passthrough memory: built-ins no longer
ship a custom/composer actionId.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sync is structurally part of the unified memory subsystem now — its
hot path writes through upsertByKeys into mem_records, and the next
slice kills the SQLite fallback entirely. The folder relocation makes
that ownership explicit.

Pure rename. Internal relative imports rewritten:
  ../memory/X → ../X   (now siblings inside memory/)
  ../X        → ../../X (lib/ peers one level further out)

External importers fixed:
  src/index.ts                — registerSyncCommands path
  src/commands/mem/migrate.ts — readProfile, openDatabase, etc.

Zero behavior changes. All 61 tests green; `sync list`, `sync test`
still function.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Memory is now ALWAYS written on `sync run` — the `--to-memory` flag
was an opt-in during the dual-write derisking window and is now a
silent no-op retained for back-compat. Users who explicitly want the
old single-write behaviour pass `--no-memory`.

`one mem sync` is now a full alias for `one sync`. Same handlers,
same options, same profile format — only the command path differs.
registerSyncCommands was extracted into an inner registerSyncSubcommands
so it can be mounted on multiple parents without forking the
implementation.

Verified end-to-end:
  one mem sync run attio --models attioPeople --max-pages 1
  → 100 records, no opt-in flag needed

Verified back-compat:
  --to-memory still accepted (no-op, doesn't error)
  --no-memory skips memory writes for the minority case that wants it
  `one sync run` still works identically to `one mem sync run`

SQLite is still written in parallel this commit — the drop comes in
the follow-up (memory-only writes, sync state → mem_sync_state,
query commands read memory).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Bug surfaced by sub-agent verification of the unified-memory branch:
after `mem config set embedding.apiKey sk-...` on a store that had
auto-initialized earlier with `provider: none`, the upgrade hint kept
telling agents to "flip the provider on" even though the user had
clearly just opted into semantic search.

The core `setOpenAiApiKey` in lib/config.ts only persists the bytes.
memory/config.ts now wraps it with memory-aware semantics: when a
non-empty key is set AND a memory block exists AND provider is still
`none`, flip provider to `openai` in the same write.

Other paths:
- key cleared (`''`)   → leave provider alone (hint can legitimately
                         reappear; user is rotating credentials)
- memory block missing → auto-init on next `getBackend()` picks
                         `openai` natively, no help needed
- provider already openai → no-op

All setOpenAiApiKey call sites (one init's fresh-setup + existing-
config handler, mem init, mem config set) now go through the
memory-aware wrapper by importing from lib/memory/index.js.

Verified: setting a key flips provider and suppresses the upgrade
hint; unsetting brings back the "Add key" hint (not the provider
hint — that distinction matters so the agent tells the user the
right thing).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…c_state

Sync state now lives in the backend's mem_sync_state table alongside
the rest of the unified memory subsystem. Replaces the per-model JSON
files at .one/sync/state/<platform>/<model>.json (and the older
single-file sync_state.json).

state.ts becomes a thin async adapter over backend.getSyncState /
setSyncState / listSyncStates / (new) removeSyncState. Legacy files
are read-once, imported on first access, then deleted — no data loss
when upgrading an existing install.

Call sites updated to await the now-async API: runner.ts (resume
check, transition-to-syncing, per-page progress, final-idle, failure
branch, SIGINT/SIGTERM handler), sync list, sync remove, sync query.

Verified: `sync list` reads 5 profiles from mem_sync_state with
correct per-model status/lastSync; legacy JSON file + state dir
cleaned up on first access.

Backend surface gained MemBackend.removeSyncState(platform, model?)
with matching implementation in postgres-core.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Both query paths now go through the unified memory store:

`sync query <platform>/<model>`:
  - backend.list(type) pulls active records (capped at 10k per scan)
  - --where / --after / --before / --order-by filters run in TS over
    the record's `data` JSONB
  - dotted --where paths work (e.g. values.job_title[0].value like %X%)
    so nested payloads are filterable without pre-flattening
  - `--date-field` still supported; auto-detection picks one of the
    common timestamp keys (created_at, createdAt, updated_at, ...)
  - syncAge + lastSync sourced from mem_sync_state

`sync search <query>`:
  - listProfiles → set of (platform, model) pairs → types
  - backend.search(query, {type, queryEmbedding}) per type
  - Hybrid FTS + semantic when OpenAI key is configured; FTS-only
    otherwise. searchMode in the response reports which ran.
  - Embeds the query once and reuses across types to keep embedding
    spend bounded.

`sync sql` is deprecated: no safe universal raw-SQL surface spans
PGlite + Postgres + third-party plugins without leaking backend
specifics. Returns a pointer to `mem search` / `mem list` which work
against every backend.

Verified:
  sync query attio/attioPeople --where 'values.job_title[0].value like %Engineering%'
  → 5 correct engineers including "Head of Engineering", "VP of Engineering"

  sync search "engineering" --platform attio
  → searchMode: hybrid, 3 engineers ranked

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…rays

Groundwork for the runner's upcoming SQLite-write cutover. No behavior
change in this commit — writePageToMemory still returns the same insert/
update/skip counters, and existing callers (runner.ts dual-write path,
mem-writer.test.ts) pick up the defaults.

Added:
- MemWriteReport.sourceKeysSeen: string[]
    Accumulated across pages, this replaces the SQLite-backed seenIds
    set used by --full-refresh to reconcile deletions. In the next slice
    the runner archives any mem_records of this type whose source key
    didn't appear in the run.

- MemWriteReport.inserts / MemWriteReport.updates: Record<>[]
    Populated only when writePageToMemory is called with
    { capturePerAction: true }. Lets hook dispatch (onInsert, onUpdate,
    onChange) fire with the right per-record events without a second
    classify pass — currently that classification reads SQLite, which
    has to go.

Runner keeps dual-writing to SQLite for now because `enrichPhase`
(profiles/gmail/gmailThreads.json, profiles/fathom/meetings.json)
still reads unenriched rows from the SQLite table. Enrich rewrite +
runner cutover land together in a focused follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Running `sync run --to-memory` against Gmail surfaced three real gaps.
All three fixed + regression-tested.

1. `sync init <platform> <model> --config '{...}'` now seeds from the
   built-in profile when no on-disk draft exists. First-time users
   were hitting `Missing required field: actionId` because the merge
   base was empty. Now the base resolution walks existing-draft →
   built-in → empty, matching the docs.

2. Enrich phase now mirrors merged rows into the unified memory store.
   Before, list sync wrote a partial record to memory and enrich
   updated only SQLite — so `mem_records.data` held the pre-enrich
   shape (ids + snippets + historyId) while the full enriched bodies
   lived only in sqlite. EnrichContext gains `profile?: SyncProfile`,
   the runner threads the profile through, and enrich calls
   writePageToMemory at the end of each batch. Best-effort — logs on
   stderr if memory write fails, never aborts the sync.

3. `memory.searchable` paths support `[]` array wildcards:
      messages[].snippet
      messages[].payload.parts[].body.data
   Previously agents had to hard-index with .0. / .1. which didn't
   scale for Gmail-style message arrays. New `resolveWildcardPath`
   fans out each `[]` segment, concatenates the leaves. Four new
   unit tests cover single / mixed / nested wildcards + missing paths.

Tests: 65/65 pass (was 61; +4 wildcard cases).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Nine user-visible surfaces shipped on feat/unified-memory had no docs
yet. This sweep covers all of them so agents loading the skill and
humans running `--help` see the current shape of the world, not the
pre-branch one.

- skills/one/SKILL.md
  - Replaced "Local Data Sync" section with "Unified Memory" — zero-
    config auto-init, OpenAI key setup (three paths), `mem sync`
    alias, `memory.searchable` with `[]` wildcard examples, preview
    loop via `sync test --show-searchable`.

- src/lib/guide-content.ts
  - New GUIDE_MEMORY topic registered in TOPICS + getGuideContent +
    the `all` bundle. Covers records / graph / sources / sync-into-
    memory / diagnostics / admin / backends.
  - GUIDE_SYNC refreshed: init → declare searchable → preview → run
    flow; removed `sync install` and `sync sql` references (sql is
    deprecated with a pointer to mem surfaces); file-layout section
    shows ~/.one/mem.pglite and config.openaiApiKey.
  - GUIDE_OVERVIEW's sync section recast as "Memory + Sync".

- README.md
  - New `one mem` section with add/search/list/link + three key-setup
    paths.
  - `one sync` section merged with mem — documents the declare-then-
    preview workflow, `memory.searchable` paths, --no-memory flag,
    mem sync alias.
  - Commands table refreshed: drop `install`, drop `sql`, add --show-
    searchable note on test.

- src/commands/guide.ts
  - VALID_TOPICS gains 'memory' so `one guide memory` actually
    resolves (was returning Unknown topic).

- src/index.ts
  - Top-level --help gains a Memory section and refreshed Data Sync
    commands. Points at `one guide memory` for the full reference.

Tests: 65/65 still green. Rendered output verified via
`one --agent guide memory` → 4656 chars, title "One CLI — Agent
Guide: Memory".

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Runner now opens SQLite ONLY when the profile declares an enrich phase.
Every other profile (Attio, Stripe, Notion, Hacker News, Fathom
meetings without enrich, …) runs purely against memory — no .db file
created, no ensureTable / evolveSchema / classifyRecords / upsertRecords
/ rebuildFtsIndex calls, no countRecords polls.

Why partial: enrich still reads unenriched rows via SQL (`SELECT *
FROM <table> WHERE _enriched_at IS NULL`). Rewriting enrichPhase to
query memory instead is a dedicated slice (enrich is 468 lines with
its own concurrency/backoff/merge logic). Gated dual-write keeps
Gmail/Fathom-with-enrich working while the memory path proves out on
the common case.

Changes in runner.ts:
- `needsSqlite = !!profile.enrich` flag; `db` stays null otherwise.
- All db.* calls wrapped in `if (db) { ... }` guards or alternate
  memory-path branches.
- writePageToMemory is now the primary write (was opt-in); runs
  with `capturePerAction: hasHooks && !db` so hooks read the action
  flag straight from upsert instead of a second classify pass.
- --full-refresh: two paths. SQLite path (enrich profiles) runs the
  existing NOT IN delete. Memory path walks `backend.list(type)` and
  archives any record whose source key didn't appear in
  `seenSourceKeys`. Both run for enrich profiles so the stores stay
  in sync.
- totalRecords: from `countRecords(db, model)` when SQLite available,
  else from per-page `pageReport.inserted + updated` counter.
- Dry-run, enrich gating, FTS rebuild, final state write all
  null-check db before calling SQLite functions.

Verified: `sync run attio --models attioPeople` creates NO .db file,
lands 100 rows in memory in 1.7s (was 2+ with dual-write).
`sync list`, `sync query`, `sync search` still work unchanged. 65/65
tests green.

Follow-up: rewrite enrichPhase to query memory so SQLite can be
dropped for enrich profiles too, then the sync engine install step
(`better-sqlite3`) becomes optional entirely.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Covers 17 commits landing the unified-memory subsystem:

- Zero-config `one mem` (auto-inits on first use)
- OpenAI key top-level in config.json with full env > .onerc >
  project > global precedence; redirect from mem config set
- Structured `_upgrade` hint block in agent output when semantic
  search is available but off
- Agent-declared `memory.searchable` with `[]` wildcard support
- `sync test --show-searchable` preview loop
- Replace-semantics upsert (p_replace flag) for sync rows
- Stale Attio built-in profiles rewritten to passthrough
- src/lib/sync → src/lib/memory/sync (folder move)
- Memory is the primary sync target; --no-memory to skip; --to-memory
  back-compat no-op
- `one mem sync` full alias of `one sync`
- Sync state moved from .one/sync/state/*.json to mem_sync_state
- `sync query` + `sync search` read from mem_records; `sync sql`
  deprecated with pointer to mem commands
- Enrich phase mirrors merged rows to memory
- SQLite writes dropped for non-enrich profiles (enrich-only dual-
  write remains pending enrichPhase rewrite)
- Docs + skill + guide + --help refreshed to match

Non-breaking. Deprecations:
- --to-memory flag (silent no-op; memory is always written)
- `sync sql` (errors with pointer to `mem search` / `mem list`)

Tests: 65/65 (+4 new wildcard cases).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…nt data-loss fix)

Sync would stringify a nested-id record as "[object Object]" when a
profile's idField resolved to an object (Attio v2 returns id as
{workspace_id, object_id, record_id}). Every row landed under the
SAME memory key and the last one won — sync reported
"recordsSynced: 2024" while memory held 1 row. Silent and catastrophic.

Three-layer fix:

1. `sync test` — when the sample idField resolves to an object, emit a
   FAIL check with suggested dotted paths (`id.record_id`, `id.id`,
   etc). Auto-discovery now tries `id.record_id` / `id.id` as fallbacks
   after the scalar `id` / `_id` / `uuid` candidates, so a fresh
   `sync init` on a nested-id platform fixes itself.

2. `mem-writer` — resolves idField via getByDotPath (matches what
   identityKey already does), and hard-rejects object values with a
   skip+count rather than silent key collapse. If sync test is
   somehow bypassed, a run will still refuse to corrupt the store.

3. Built-in Attio profiles — dropped the jq transform workaround
   introduced in slice 1.5 and moved to `idField: "id.record_id"`
   directly. Cleaner, no external tool dependency, and the new
   profile format matches what agents would write by hand after
   reading the knowledge.

Runner's --full-refresh seenIds path updated to use getByDotPath too,
so SQLite-backed enrich profiles with nested ids also work.

Verified against the exact user repro: `sync run attio --models
attioCompanies --max-pages 1 --force` lands 100 DISTINCT records
(was 1). Every row has its real UUID as the source key.

65/65 tests green.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…rch / sync list

Four surfaces were returning items.length as the total. That lies past
the first page, breaks pagination-driven scripts, and makes agents
report wrong numbers to users.

Changes:

- Backend gains MemBackend.count(type, { status }) — one COUNT(*) query.
  Implemented in postgres-core; forwarded by both lazy plugin wrappers
  (pglite, postgres).

- mem list response grows `returned` (page size) and `total` (real
  backend count for the filter). Also `limit` and `offset` so agents
  can page deterministically. Human TTY output gains "N of M — pass
  --offset X to page" hint when there's more.

- sync query response grows `returned`, `total` (post --where filter),
  `totalRecordsOfType` (before any filters), and `limit`. Scripts can
  tell at a glance what percentage of the type matched.

- sync search returns `returned` + `total` (total across all searched
  types, pre page-cap).

- sync list no longer reports stale .db rowcounts as `totalRecords`.
  The record count is a real `backend.count(type)` per profile; the
  legacy SQLite footprint surfaces as a separate `legacyDbSize` field
  so the dashboard and reality agree, and a visible nudge to run
  `mem migrate --cleanup`.

Verified against attio/attioCompanies (100 real records):
  --limit 5   → returned: 5,   total: 100
  --limit 200 → returned: 100, total: 100
  sync list   → records=100 legacy=0 B (was "records=2024 dbSize=91.3MB")

Tests: 65/65 green.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The unified memory cutover retired `sync sql` on the basis that raw
SQL can't safely span PGlite / Postgres / third-party plugins — but
the first-party plugins ARE SQL, and the escape hatch for joins /
aggregates / JSONB path queries is real. Legacy CEO flows that used
`sync sql` silently broke when the command errored out.

Bringing it back capability-gated, with a shared read-only guard.

- MemBackend grows an optional `raw(sql, params?)` method. postgres-
  core implements it against its internal client. pglite + postgres
  plugin wrappers forward to it.
- BackendCapabilities gains `rawSql: boolean`. Both first-party
  plugins advertise true; third-party plugins that opt out get a
  clear error from the command layer.
- New `sql-guard.ts` validates incoming SQL: leading keyword must be
  SELECT / WITH / EXPLAIN; multi-statement input rejected; DDL/DML
  /session-control keywords (INSERT, UPDATE, DELETE, DROP, ALTER,
  CREATE, COPY, PRAGMA, VACUUM, ATTACH, SET SESSION, GRANT, CALL,
  etc.) blocked even inside CTEs. 10 unit tests cover edge cases.
- `one mem sql "<SELECT ...>"` — primary surface; returns columns,
  rows, rowCount.
- `one sync sql <platform>/<model> "<sql>"` — thin alias. Doesn't
  rewrite the query (that would need a real SQL parser) but nudges
  the agent on stderr when the expected `WHERE type = '...'` is
  missing so cross-type results aren't a surprise.

Verified against real memory:
  mem sql "SELECT type, COUNT(*) FROM mem_records GROUP BY type"
  → attio/attioCompanies=100, attio/attioPeople=100, gmail/gmailThreads=100

  mem sql "SELECT data->'values'->'name'->0->>'full_name' AS name
           FROM mem_records WHERE type='attio/attioPeople'
           AND data->'values'->'job_title'->0->>'value' ILIKE '%Engineering%'
           LIMIT 5"
  → 5 engineers by nested JSONB path

  mem sql "DELETE FROM mem_records"  → blocked by guard

Tests: 75/75 green (+10 guard tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… run

Before: users running the new unified-memory CLI against a machine that
had synced with the SQLite-era CLI ended up with two sources of truth.
Legacy .db stayed on disk and \`one\` from npm kept reading it, while
local-one wrote to memory only. Silently divergent.

Now: \`sync run\` checks for a legacy \`~/.one/sync/data/<platform>.db\`
before the runner executes. When the file exists AND memory has zero
records for any of this platform's target models, it auto-invokes
\`mem migrate --platform <plat> --yes\`:

- --agent mode: silent migrate, one-line stderr log with the detected
  size so the agent can relay to its user.
- TTY mode: interactive confirm (default yes). "Found legacy .db
  (91 MB) with no corresponding memory records. Migrate?"

Detection skip conditions:
- --dry-run skips the check (wouldn't persist anything anyway)
- ANY non-zero record count for this platform's types means memory
  has already absorbed the data; migrate is the user's call via
  \`mem migrate --cleanup\` when they're ready.

Keeps \`mem migrate\` as the explicit surface; auto-migrate is only a
first-use nudge that eliminates the "which number do I trust?" problem.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Full Attio test report surfaced four issues blocking the "rows in
memory → rows with embeddings" story. All fixed and verified against
2024 attio/attioCompanies + attio/attioPeople.

1. mem reindex JSONB read corruption. PGlite WASM "Unexpected token
   'a', \"active_until\": acti, ...\" errors because the previous
   reindex called backend.context({limit:5000}) + getById() per row,
   pulling the full `data` column into memory at scale. Fix: new
   lean backend.listForReindex({type, limit, offset}) returns ONLY
   id / type / searchable_text / content_hash / embedding_model.
   Writes go through new backend.updateEmbedding(id, vector, model)
   which UPDATEs only the embedding columns — no `data` round-trip.
   The data field was never actually used for embedding (searchable_
   text is the sole input).

2. --full-refresh "memory access out of bounds" during stale-delete.
   Same root cause — runner used backend.list(type, {limit:100_000})
   which pulled full JSONB for every row. Fix: new lean backend.
   listKeysByType(type) returns only {id, keys[]} for reconcile.
   Verified: --full-refresh completes cleanly (7.8s for attio), no
   WASM crash, deletedStale count accurate.

3. No per-run embed override. Added `sync run --embed` / `--no-embed`
   flags. Threads through SyncRunOptions → runner → writePageToMemory
   as embedOverride. Lets users backfill embeddings with one flag
   without editing the profile. Commander convention: `--embed` → true,
   `--no-embed` → false, absent → defer to profile + config default.

4. mem config set accepted typo keys silently. `mem config set
   embedOnSync true` (missing `defaults.` prefix) would write a
   no-op top-level field nothing reads, and `unset` couldn't clear
   it. Fix: KNOWN_KEYS allowlist + Levenshtein suggestion ("Did you
   mean `defaults.embedOnSync`?"). Set rejects unknown keys; unset
   accepts them (otherwise orphans would be stuck forever). Plus a
   `replace: true` flag on updateMemoryConfig so unset actually
   deletes — the merge-semantics default was re-adding the deleted
   key from the on-disk copy.

Also: mem reindex gains --type, --force, --limit, --batch flags for
scoping backfill to one platform/model and controlling OpenAI batch
pressure.

Tests: 75/75 still green. Reindex verified against real data:
considered 10, reembedded 10, skipped 0, no WASM errors.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…concile

Three issues surfaced in the follow-up Attio test of slice 2b-iii.

1. \`mem reindex\` did nothing useful without \`--force\`.

The listForReindex query ordered by \`updated_at DESC\` with no SQL-
level filter. Because updating a row's embedding also bumps
\`updated_at\` (via the BEFORE UPDATE trigger), the query always
returned the most-recently-embedded rows first — every iteration
saw only rows that already had the correct embedding, the inner
loop skipped all of them, and the outer loop terminated with
\`considered: N, reembedded: 0\`. Backfill was effectively broken.

Fixed: the SQL filter now returns ONLY rows needing work —
\`searchable_text IS NOT NULL AND searchable_text <> '' AND
(embedded_at IS NULL OR embedding_model IS DISTINCT FROM <target>)\`
— ordered \`embedded_at ASC NULLS FIRST, id ASC\`. Plumbed the target
model through as a new \`targetEmbeddingModel\` option, plus an
\`includeAlreadyEmbedded\` escape hatch for the \`--force\` path.

Empty-string searchable_text is excluded explicitly. Synced rows can
land with empty text when the profile's memory.searchable paths
resolve to nothing (e.g. an Attio contact with no name / title /
email — "unnamed" in the source). OpenAI rejects empty input, so
keeping them eligible would spin the loop forever on unembeddable
rows.

Admin loop now uses a fixed PAGE=500 scan size and keeps offset=0
(the SQL filter drains eligible rows as we embed them, so the next
page is always fresh — incrementing offset against a moving target
would miss rows).

2. \`sync run --embed\` wedged after ~700/2024 rows in \`__psynch_cvwait\`.

Node's \`fetch()\` has no default timeout; when a TCP connection is
accepted but never responds (mid-run TLS stall, rate-limit queue,
etc.) the embed call hangs indefinitely. AbortController + 30s
timeout wrapper (FETCH_TIMEOUT_MS) around both \`embed\` and
\`embedBatch\`. 30s is comfortably above p99 for the embeddings
endpoint; stalls now time out and retry within the existing 3-
attempt backoff loop.

3. \`--full-refresh\` left orphans (keys not starting with type prefix).

Reconcile pass only archived rows whose source key was in the type-
prefix set but not in seenSourceKeys. Rows from earlier buggy
versions that ended up with keys missing the prefix entirely were
never caught. Added a second archive criterion: if the row has NO
key starting with the \`<type>:\` prefix, it's an orphan — archive.
Plus the existing "source key not seen this run" path.

Tests: 75/75. Verified on real data:
  reindex --type attio/attioPeople (18 NULL rows, 34 empty)
  → considered 0 reembedded 0 (correctly excludes empty-text rows)
  reindex --force --limit 10
  → considered 10 reembedded 10

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…it rates

Both of the UX upgrades the most recent Attio test asked for.

1. `sync test --show-searchable` now samples 5 records, not 1.

The old single-sample preview left a `—` marker ambiguous: is the path
a typo, or does this specific record just not have that field? Real
data routinely has sparse fields (Attio industry set on ~1 in 5
companies) and the agent couldn't tell the two cases apart without
SQL.

Now each declared path shows `hits/total` plus a concrete sample:
  5/5  values.name[].value              → "SimplyWise"        (clean)
  1/5  values.industry[].option.title   → "Financial Services" (sparse, still real)
  0/5  values.nonsense[].foo            (no sample — typo or always absent)

SyncTestReport gains a `samples` array; buildSearchablePreview
aggregates across all of them. JSON response shape: `searchable.paths`
grows `hits` + `total` (was `found: bool`); TTY output gets three-way
markers (✓ / ~ / ✗) with rate. Config not changed for the default-
walker mode (nothing declared → walker preview over first sample,
with the tip now pointing at suggest-searchable instead of telling
the agent to hand-write dot-paths).

2. `sync suggest-searchable <platform>/<model>` — auto-ranked starter.

Walks the first-page records, collects every leaf string/number/
boolean path, scores each by:
  hitRate × log1p(avgLength) × signalPenalty × typePenalty × shortPenalty

where:
- hitRate is per-record (array wildcards can't inflate past 1.0)
- signalPenalty = (1 - noiseFraction)² — penalizes UUIDs / ISO
  timestamps / URLs / numeric strings (lat/long style) / known
  noise enum markers ("system", "text", "personal-name", ...)
- typePenalty = 0.05 boolean, 0.1 number, 0.5 mixed, 1.0 string
- shortPenalty = 0 for ≤2-char leaves (flags / codes), 1 else

Output: ranked list with {path, score, hitRate, avgLength,
noiseFraction, sampleValue} + a paste-ready `configPatch` the agent
drops straight into `sync init --config`.

Verified on real Attio companies:
  Top: values.description[].value  (189 chars, 100%, "SimplyWise: Organize...")
       values.domains[].domain     (13 chars,  100%, "simplywise.com")
       values.name[].value         (11 chars,  100%, "SimplyWise")
       values.categories[].option.title (100%, "Financial Services")
  Correctly drops: UUIDs, timestamps, latitude/longitude strings,
  is_archived boolean, actor_id enums.

--show-searchable text preview length dropped from 2727 chars (default
walker) to 223 chars on the same record once the suggested paths are
applied — every line signal, no UUIDs.

Docs refreshed (SKILL.md, guide sync) to describe the two-step
workflow: suggest → preview → run.

Tests: 82/82 (+7 new suggest-searchable cases covering long prose,
UUID/timestamp filter, boolean/number penalty, array hit-rate cap,
wildcard dot-path emission, empty-sample case, numeric-string filter).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two P0 data-correctness bugs caught by Moe's Attio / Fathom test
pass. The sync reports success; the data is wrong. Same trust-shape
as the original [object Object] silent data loss.

1. `--full-refresh` + `--max-pages` = data-loss command.

Reconcile archives any row whose source key wasn't in
`seenSourceKeys` this run. With pagination truncated (max-pages
cap, empty page, etc.) the keys of unfetched pages never land in
the set, so reconcile marks them `deleted_upstream`. Observed:
`sync run fathom --max-pages 3 --full-refresh` archived 57 valid
meetings after pulling only 30.

Fixed: track `paginationComplete` — true only when the loop exits
via natural exhaustion (empty records page on page>0, or the
profile's paginator returned no nextParams). Reconcile-by-absence
now requires `fullRefresh && pagesProcessed > 0 &&
paginationComplete`. On truncated runs we emit
`reconcileSkipped: true` + stderr warning + `deletedStale: 0`.
No silent damage.

2. `mem_upsert_by_keys` didn't un-archive on resurrection.

When --full-refresh re-pulled a row whose memory record was
archived (from a prior buggy reconcile), the upsert updated
`data / keys / sources / searchable_text` but left
`status = 'archived'`. No self-healing path — 1924 Attio rows
stayed stuck at archived across repeated --full-refresh runs.

Fixed: the UPDATE branch of the upsert SQL now also sets
`status = 'active'` and `archived_reason = NULL`. Semantics:
upsert-by-keys always produces an active row. Verified on live
data — 200 stuck rows resurrected across a 3-page partial refresh.

3. Surface `statusCounts: {active, archived}` in every sync run.

Silent damage was invisible from the happy-path output.
`deletedStale` only reports this-run archives. Now sync result
includes post-run counts; human output highlights imbalance in
red when archived > active. Agents can watch the numbers heal
as upsert-by-keys resurrects previously archived rows.

Tests: 83/83. Added an integration test asserting upsertByKeys
flips archived→active with archived_reason cleared.

Verified on live data:
  # before
  attio/attioCompanies  active: 100,  archived: 1925
  # after a 3-page --full-refresh --max-pages 3
  archived: 1724 (200 resurrected), reconcileSkipped: true,
  deletedStale: 0

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
#126 — `sync migrate --dry-run` always reported `inserted` on rows
that a real run would `updated`. Cosmetic but misleading after the
first live migrate. Now: the dry-run path probes `mem_records.keys
&& <candidate keys>` via the backend's raw-SQL escape hatch and
reports `updated` when the keys already exist. Requires a backend
with `rawSql: true` capability (both first-party plugins have it;
third-party plugins fall back to always-inserted, matching
pre-fix behaviour).

#127 — killing a mid-sync process corrupts PGlite. Two parts:

1. Graceful close in signalCleanup. The existing SIGINT/SIGTERM
   handler updated sync_state + released the filesystem lock but
   never touched the memory backend. PGlite is WASM-backed
   Postgres — if it doesn't get a chance to checkpoint its WAL
   before the process exits, the next `open()` aborts inside
   ensureSchema with `Aborted()`. The handler now also calls
   `backend.close()` under a 2s wall-clock cap (so a stuck close
   can't block the exit). SIGKILL / kernel panic still bypass
   this; no prevention is possible for uncatchable signals.

2. Clearer doctor diagnostic. When `mem doctor` hits `Aborted()`
   on the schema-apply check, it now appends the recovery path
   (delete `~/.one/mem.pglite`) instead of leaving the user to
   guess. The `postmaster.pid` holding a placeholder `-42` is
   normal for WASM PGlite, NOT a corruption signal — the Aborted
   comes from unflushed WAL. An earlier draft of this commit
   added stale-pid cleanup based on the negative-PID hypothesis;
   reverted because the heuristic would wrongly delete valid
   lockfiles.

Tests: 83/83.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
moekatib and others added 2 commits April 23, 2026 22:41
Closes #132.

Migrate used `row[idField]` for id resolution — a flat property
lookup, not a dotted path. Combined with the legacy SQLite layer
JSON-stringifying nested columns on INSERT (see sync/db.ts:
prepareValue), any profile with a dotted idField against a table
whose id column holds a stringified object silently dropped every
row. Attio companies: 2024/2024 skipped on an empty memory store,
while attioPeople (scalar idField "id") migrated cleanly.

Three changes:

1. `reviveStringifiedJson(row)` — rehydrates top-level JSON-
   stringified columns before id resolution. Matches the shape
   sync sees live. Conservative: only parses strings that start
   with `{` or `[` and only one level deep (legacy rows never
   nest further).

2. Id resolution now uses `getByDotPath(hydratedRow, idField)`,
   the same mechanism `sync run` uses. Hard-rejects nested
   objects the same way `sync test` / mem-writer do — better to
   skip visibly than stringify to `[object Object]` and collapse
   every row onto one key.

3. Report splits the `skipped` counter into `skippedUnresolvedId`
   (profile missing or idField doesn't resolve) and
   `skippedError` (upsert threw). Human output prints a warning
   when every row is unresolved, so a misconfigured profile
   can't hide. Per-row hint for the first 3 misses so the cause
   is obvious.

Tests: +6, 89 total. reviveStringifiedJson + dotted-path
resolution on the exact Attio companies shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Follow-up to #132. When a user re-migrates after changing
`idField` (the documented fix for #132's silent-drop bug) the
pre-fix cohort's rows have garbage sourceKeys AND no identity
keys in keys[] — getByDotPath couldn't resolve
`email_addresses[0].email_address` through the stringified JSON
of the legacy .db either. The post-fix upsert builds clean
sourceKey + identity keys, but with no overlap against the
pre-fix cohort's keys[], upsertByKeys inserts a duplicate.
Result: active count doubles, half the rows carry legible data
and half carry stringified JSON blobs.

Three changes, no new commands.

1. Identity-merge pre-pass per type. For each type with an
   `identityKey`, query existing active rows for
   `(id, keys, data->path->>...)` via backend.raw and build a
   `normalized-identity → {id, keys}` map. Cost is one SELECT
   per type (not per row); JSONB path projection avoids reading
   the full `data` column (the WASM-memory footgun we've hit
   before). When a legacy row's identity matches the map, its
   new keys array is folded with the existing row's keys so
   upsertByKeys overlaps and hits the update branch.
   `replace: true` ensures the clean hydrated payload wins over
   the old stringified shape.

   Path validation is strict (segments match /^\w+$/) because
   the path is inlined into SQL (column refs can't be
   parameterized). Bad input returns null and migrate falls
   back to plain key-overlap — no regression.

2. Split report counters. `mergedByIdentity` distinguishes
   healing merges from regular sourceKey updates. Totals object
   surfaces the same.

3. Doubling warning. `count(type, {status:'active'})` snapshotted
   before + after migrate per type. When post-migrate growth
   exceeds `inserted + 2`, print a stderr warning with an exact
   SQL probe to inspect: `jsonb_typeof(data->'id') GROUP BY t`
   distinguishes pre-fix rows (t='string') from post-fix rows
   (t='object') so the user can drop the ghost cohort.

Tests: 98, including a live in-memory PGlite check that
identity values round-trip through data->JSONB->map, archived
rows are excluded, and the SQL-injection guard rejects unsafe
paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant