feat: unified memory (v1.42.0) by moekatib · Pull Request #125 · withoneai/cli

moekatib · 2026-04-23T23:59:15Z

Summary

Folds the standalone @withone/mem package into the One CLI as a pluggable memory subsystem, and makes it the primary target for sync. Ships 18 commits on feat/unified-memory — every commit stands alone and every commit stays green.

What's new

Zero-config memory: one mem add note '{...}' on a fresh install auto-initializes pglite with sensible defaults. No mem init prerequisite.
Top-level OpenAI key: stored as config.openaiApiKey alongside apiKey, same precedence as ONE_SECRET (env > .onerc > project > global). Three equivalent setups: one init (prompt), one mem config set embedding.apiKey sk-..., or OPENAI_API_KEY=sk-....
Upgrade hints: every mem status / mem search / mem doctor response carries a structured _upgrade block when semantic search is available but off — agent can relay it to the user.
Agent-declared memory.searchable: profiles carry dot-paths that drive embedded + FTS text. Supports numeric indexes and [] wildcards (messages[].snippet). Preview via sync test --show-searchable before paying embedding cost.
Memory-primary sync: every sync run writes into mem_records. SQLite dual-write is kept only for profiles with an enrich phase (enrich still reads unenriched rows via SQL — follow-up slice rewrites that).
one mem sync full alias of one sync: same handlers, same options — single source of truth, one command tree.
Sync state moves to mem_sync_state table; legacy .one/sync/state/*.json files auto-migrated on first access.
sync query supports dotted --where paths (values.job_title[0].value like %Engineering%). sync sql retired — raw SQL can't safely span PGlite / Postgres / third-party backends.
Replace-semantics upsert (p_replace flag on mem_upsert_by_keys): synced rows replace data wholesale so fields removed upstream actually disappear from memory.
Test-isolation bug fix: CONFIG_DIR / CONFIG_FILE / PROJECTS_DIR resolved lazily via getters so tests that set $HOME in before() hooks can properly isolate.

Deprecations (non-breaking)

--to-memory flag on sync run — silent no-op (memory is now always written). Use --no-memory to opt out.
sync sql — errors with a pointer to mem search / mem list. No raw SQL surface in the memory subsystem.

Migration

Existing installs: one mem migrate reads legacy .one/sync/data/*.db files into the unified store. --cleanup deletes them after.
Legacy sync-state JSON files migrated on first access, then removed.
No config changes required — everything auto-detects.

Docs

skills/one/SKILL.md, one guide memory (new topic), one guide sync, README.md, and one --help all updated.

Test plan

65/65 unit tests green (added 4 new wildcard extraction tests)
Sub-agent verification: 4 isolated + real-HOME agents covered zero-config, OpenAI key lifecycle, CLI surface + mem regression, and Gmail end-to-end
Gmail sync with enrich phase: 100 records list-synced + 100 enriched, landed in memory with full thread bodies
Attio sync (non-enrich): runs memory-only, no .db file created, 100 records in 1.7s
Semantic search ranks the right Attio people for "venture capital investor" and "head of engineering" after clean memory.searchable paths
Upgrade hint appears on FTS-only installs, disappears when key is set
Build + typecheck: no new TS errors (5 pre-existing unchanged)
Full Gmail / Fathom enrich-profile end-to-end on a second machine
one mem migrate against a real legacy .one/sync/data/ directory

Follow-up slices (separate PRs)

Rewrite enrichPhase to query memory (data._enriched_at IS NULL) so SQLite can be dropped for enrich profiles too; better-sqlite3 becomes truly optional.
Gmail passthrough q: URL-encoding bug — q: "category:primary" succeeds via actions execute but fails via sync run. Not scope for this branch.
Legacy @withone/mem 2.0.0 deprecation release pointing at one mem.

🤖 Generated with Claude Code

Single-branch delivery plan for collapsing @withone/mem and one sync into one CLI subsystem with a pluggable backend architecture. Default backend is PGlite (embedded Postgres), with Postgres shipped as a first-party plugin and third-party plugins supported via dynamic import. Schema consolidates external_refs as a sources JSONB column on mem_records keyed by prefixed source ids, and keeps mem_links separate for bidirectional graph traversal. Embeddings are optional, configured at one mem init, OpenAI key stored in ~/.one/config.json (mode 0600) alongside ONE_SECRET with the same env/onerc/project/global precedence chain.

Wires the skeleton for folding @withone/mem and one sync into one CLI subsystem with a pluggable backend architecture. Contract + registry: - MemBackend interface + MemBackendPlugin factory + BackendCapabilities - plugins.ts registry with dynamic loader for third-party plugins declared in memory.plugins config First-party plugins (stubbed): - pglite: default, embedded Postgres, caps all true except concurrentWriters - postgres: Supabase/Neon/self-hosted via node-pg, caps all true Pure helpers implemented: - schema.ts: full DDL + PL/pgSQL functions (mem_upsert_by_keys, mem_calculate_relevance, mem_hybrid_search, mem_enforce_key_uniqueness) - canonical.ts: deterministic JSON + sha256 for content_hash - scoring.ts: TS port of mem_calculate_relevance for in-memory ranking Config: - memory block in ~/.one/config.json alongside ONE_SECRET (mode 0600) - OPENAI_API_KEY env > .onerc > project > global precedence - per-backend config keyed by plugin name so new plugins compose in CLI surface: - `one mem` registered with all subcommands discoverable via --help - `one mem status` fully working; returns config + registered plugin descriptors + capability matrix - All other subcommands return a scaffolded-not-implemented note Tests: canonical.test.ts + scoring.test.ts (20 assertions, all green) Typecheck: clean on the new module (no new errors vs baseline)

The shared query layer (CoreBackend) implements MemBackend over a PgClient abstraction. Both first-party plugins are now thin adapters: - PGlite plugin: lazy-imports @electric-sql/pglite + the vector extension, routes no-param queries through exec() (PGlite rejects multi-statement SQL via query()), and falls back to an in-memory DB when dbPath is ":memory:". Single-writer; transactions reuse the outer client. - Postgres plugin: lazy-imports node-pg, opens a Pool on the configured connection string, transactions take a dedicated client. CoreBackend implements the full MemBackend surface: - Records: insert, upsertByKeys (via mem_upsert_by_keys server function that merges data + unions tags/keys/sources into existing rows when keys overlap), getById (with optional link hydration), update, remove, archive/unarchive, list. - Search: hybrid when embedding provided + vectorSearch:true; otherwise FTS-only fallback. trackAccess bumps access_count on returned rows. - Context: relevance-ranked active records via mem_calculate_relevance. - Graph: link/unlink/linked with bidirectional traversal semantics. - Sources: JSONB map ops (addSource also extends the keys array so findBySource works via the keys GIN index). - Sync state: upsert, getOne, listAll. - Hot columns: partial expression indexes scoped by type. - Maintenance: vacuum, stats. Schema change: dropped pg_trgm from EXTENSIONS_SQL. The original mem schema required it but nothing in the unified query layer uses trigram matching, and PGlite doesn't ship pg_trgm by default. Vector + tsvector cover every current query. The comment notes this can be added back via an optional extension hook if fuzzy-match search lands. Tests: 12-assertion live integration suite against a fresh in-memory PGlite (ensureSchema + insert + upsertByKeys merge semantics + findBySource + addSource + graph link/linked + FTS + context + archive/unarchive + sync state + stats). All 57 tests green; typecheck clean on the memory module with no new errors vs baseline.

…raph, admin) The memory-layer CLI surface is now functional against a live PGlite backend. Every command listed in docs/plans/unified-memory.md §6 either dispatches to a working handler or prints a clear deprecation/migration note. Embedding provider + orchestration: - lib/memory/embedding.ts: OpenAI embeddings (single + batch), retry with backoff, content-hash gate, graceful fallback to null when provider=none or no API key. defaultSearchableText() extracts a capped string value from arbitrary JSON as the fallback for records with no profile-defined searchable template. - lib/memory/runtime.ts: process-local backend singleton with lazy init+ensureSchema, plus addRecord/upsertRecord helpers that derive searchable_text, content_hash, and (optionally) embedding before calling the backend. Precedence: opts.embed → input.embed → config default. Config + init + doctor: - commands/mem/init.ts: interactive @clack/prompts flow + fully-flagged non-interactive mode. OpenAI key hidden-input, stored in ~/.one/config.json (mode 0600, same file as ONE_SECRET). Backend warmup (open + ensureSchema + version check) runs before returning success. - commands/mem/config.ts: get/set/unset dot-path access with automatic secret redaction (embedding.apiKey, postgres.connectionString). --show-secrets opt-in to reveal. - commands/mem/doctor.ts: 7-check health report (config, plugin resolve, backend open, schema version, stats, embedding reachable, capability consistency). Non-zero exit when any check fails. Record operations: - commands/mem/records.ts: add/get/update/archive/weight/flush/list, search (FTS-only default; --deep forces semantic when available), context (relevance-sorted), link/unlink/linked, sources/find-by-source. Admin/IO: - commands/mem/admin.ts: vacuum, reindex (re-embed under a new model). - commands/mem/export.ts: JSONL export/import, idempotent via keys. - commands/mem/migrate.ts: imports legacy .one/sync/data/*.db files into the unified store using each profile's idField + identityKey. Optional --cleanup removes legacy files after confirmation. Respects --dry-run. Build: tsup externals extended to @electric-sql/pglite (+vector) and pg so runtime WASM/pure-TS assets load from node_modules instead of being inlined into the bundle. Typecheck: clean on the memory module (5 pre-existing errors unchanged). Tests: 57/57 green. Live smoke test verified init → add → search → doctor → status pipeline against a throwaway HOME.

Opt-in dual-write integration between the sync runner and the unified memory backend. Motivated by §9 of the plan but scoped conservatively: synced rows continue landing in SQLite as they do today; when `--to-memory` is passed, each page ALSO flows through `upsertRecord` into mem_records. This lets the dual path be verified on real data before we flip memory to primary. - lib/sync/mem-writer.ts: writePageToMemory(profile, records) with prefixed source key (`<platform>/<model>:<id>`), optional identity key promoted to a second mem key (`email:...`, `phone:...`, `domain:...`, or `id:...` heuristic from the profile's identityKey dot path), tags = ['synced', platform], source entry with last_synced_at, and a per-record strip of any `_`-prefixed sync-internal fields. Errors are swallowed per-record so a single bad row never breaks the sync. - lib/sync/types.ts: SyncRunOptions.toMemory flag. - lib/sync/runner.ts: after the existing SQLite upsert, if toMemory is set, call writePageToMemory for the same page. Failures are logged but never fail the sync. - lib/sync/index.ts: `one sync run --to-memory` surface. - commands/mem.ts: updated the `one mem sync` placeholder to point at `one sync --to-memory` and note that canonical alias delegation is a follow-up slice. Test: lib/sync/mem-writer.test.ts — 4 live-PGlite assertions covering first-page write with prefixed keys + identity merge, idempotent re-run (update not insert), skip-on-missing-id, and the underscore-field strip. 4/4 green; full suite 61/61. Deliberately out-of-scope for this commit (tracked for follow-up): - Physical folder move src/lib/sync/ → src/lib/memory/sync/. Pure rename, touches 3 external imports, zero behavioral impact, cleanly separable. - Memory as the primary write target (replaces SQLite) — gated on the dual-write proving itself in Moe's destructive exploration pass. - Full `one mem sync` alias delegation — waits on the above.

UX: memory auto-initializes on first `one mem` call with pglite defaults — no `mem init` prerequisite. Picks `embedding.provider: openai` when an OpenAI key is already resolvable (env / .onerc / config), else stays `none`. One-line stderr breadcrumb on TTY; silent in --agent mode. `one init` grows an optional skip-able OpenAI-key prompt, stored at the top level as `config.openaiApiKey` (peer of `apiKey`). Full precedence chain mirrors ONE_SECRET: env > .onerc > project config > global. `mem config set embedding.apiKey` transparently redirects to the top-level field and strips any stale value from the memory block. Secret redaction on all read surfaces. AX: sync profiles gain `memory.searchable: [dot-paths]`. Declared fields drive the embedded + FTS text so agents produce clean, signal- dense embeddings instead of the 90%-noise default walker output. No declaration = fallback to `defaultSearchableText`. `one sync test <platform>/<model> --show-searchable` previews the exact text that would be embedded, with per-path resolution (✓ / empty) and sample values. Agents iterate before paying the embedding cost. Fix: CONFIG_DIR / CONFIG_FILE / PROJECTS_DIR now resolved lazily via getters (were module-bound to os.homedir() at import time). Tests that set process.env.HOME in before() hooks could not previously isolate — this surfaced as the real ~/.one/config.json being overwritten with a test stub during `npm test`. Verified: Attio people sync with clean `memory.searchable` paths produces 158-char embedded text (vs 2297 chars from the default walker) and "venture capital investor" / "head of engineering" queries return semantically coherent rankings. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

When memory is running in FTS-only mode (no OpenAI key, or provider still `none`), every agent-facing response from `mem status`, `mem search`, and `mem doctor` now includes a structured `_upgrade` block: { "capability": "semantic_search", "available": true, "currentMode": "fts_only", "how": "Add an OpenAI key: `one init` (then \"Add OpenAI key\"), or `one mem config set embedding.apiKey sk-...`", "benefit": "Ranks memories by meaning, not just keyword overlap..." } The agent can now tell its user "semantic search is available as an upgrade" — previously this capability was silently degraded, so agents never mentioned it. Human TTY output gets a matching dim one-liner. `mem search` also gains a top-level `searchMode` field ("fts_only" | "hybrid") so the mode is inspectable without re-deriving it from config. Hint only appears when the capability is actually off — zero noise for already-configured installs. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Two correctness fixes surfaced during the Attio dual-write derisking. 1. mem_upsert_by_keys learns a p_replace flag (default FALSE). When TRUE, the existing record's `data` is REPLACED wholesale by the incoming payload instead of shallow-merged. Sync callers pass TRUE so fields removed at the source actually disappear from memory; interactive `mem add` / `mem update` keep the default merge path because those are patches, not snapshots. Threaded through: MemBackend.upsertByKeys signature gains an UpsertOptions arg, postgres-core plugin passes it to the SQL function, runtime.upsertRecord forwards opts.replace, mem-writer sets replace: true on every sync page. 2. profiles/attio/attioPeople.json + attioCompanies.json were pointing at a stale composer action (::attio-people-list) whose ID fails base64-decode at the gateway. Swapped both to the passthrough "List an Object's Records" endpoint with: - pathVars: { object: "people" | "companies" } - resultsPath: "data" - identityKey: "values.email_addresses[0].email_address" / "values.domains[0].domain" - transform: jq flattens id.record_id → id so the runner's bracket-access idField extraction works Verified: `one sync init attio attioPeople` now lands a working profile with all 6 test checks green, `sync run --to-memory` pulls 100 records cleanly, find-by-source resolves to the right record with the identity key intact. Also honors the feedback_sync_passthrough memory: built-ins no longer ship a custom/composer actionId. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Sync is structurally part of the unified memory subsystem now — its hot path writes through upsertByKeys into mem_records, and the next slice kills the SQLite fallback entirely. The folder relocation makes that ownership explicit. Pure rename. Internal relative imports rewritten: ../memory/X → ../X (now siblings inside memory/) ../X → ../../X (lib/ peers one level further out) External importers fixed: src/index.ts — registerSyncCommands path src/commands/mem/migrate.ts — readProfile, openDatabase, etc. Zero behavior changes. All 61 tests green; `sync list`, `sync test` still function. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Memory is now ALWAYS written on `sync run` — the `--to-memory` flag was an opt-in during the dual-write derisking window and is now a silent no-op retained for back-compat. Users who explicitly want the old single-write behaviour pass `--no-memory`. `one mem sync` is now a full alias for `one sync`. Same handlers, same options, same profile format — only the command path differs. registerSyncCommands was extracted into an inner registerSyncSubcommands so it can be mounted on multiple parents without forking the implementation. Verified end-to-end: one mem sync run attio --models attioPeople --max-pages 1 → 100 records, no opt-in flag needed Verified back-compat: --to-memory still accepted (no-op, doesn't error) --no-memory skips memory writes for the minority case that wants it `one sync run` still works identically to `one mem sync run` SQLite is still written in parallel this commit — the drop comes in the follow-up (memory-only writes, sync state → mem_sync_state, query commands read memory). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Bug surfaced by sub-agent verification of the unified-memory branch: after `mem config set embedding.apiKey sk-...` on a store that had auto-initialized earlier with `provider: none`, the upgrade hint kept telling agents to "flip the provider on" even though the user had clearly just opted into semantic search. The core `setOpenAiApiKey` in lib/config.ts only persists the bytes. memory/config.ts now wraps it with memory-aware semantics: when a non-empty key is set AND a memory block exists AND provider is still `none`, flip provider to `openai` in the same write. Other paths: - key cleared (`''`) → leave provider alone (hint can legitimately reappear; user is rotating credentials) - memory block missing → auto-init on next `getBackend()` picks `openai` natively, no help needed - provider already openai → no-op All setOpenAiApiKey call sites (one init's fresh-setup + existing- config handler, mem init, mem config set) now go through the memory-aware wrapper by importing from lib/memory/index.js. Verified: setting a key flips provider and suppresses the upgrade hint; unsetting brings back the "Add key" hint (not the provider hint — that distinction matters so the agent tells the user the right thing). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…c_state Sync state now lives in the backend's mem_sync_state table alongside the rest of the unified memory subsystem. Replaces the per-model JSON files at .one/sync/state/<platform>/<model>.json (and the older single-file sync_state.json). state.ts becomes a thin async adapter over backend.getSyncState / setSyncState / listSyncStates / (new) removeSyncState. Legacy files are read-once, imported on first access, then deleted — no data loss when upgrading an existing install. Call sites updated to await the now-async API: runner.ts (resume check, transition-to-syncing, per-page progress, final-idle, failure branch, SIGINT/SIGTERM handler), sync list, sync remove, sync query. Verified: `sync list` reads 5 profiles from mem_sync_state with correct per-model status/lastSync; legacy JSON file + state dir cleaned up on first access. Backend surface gained MemBackend.removeSyncState(platform, model?) with matching implementation in postgres-core. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Both query paths now go through the unified memory store: `sync query <platform>/<model>`: - backend.list(type) pulls active records (capped at 10k per scan) - --where / --after / --before / --order-by filters run in TS over the record's `data` JSONB - dotted --where paths work (e.g. values.job_title[0].value like %X%) so nested payloads are filterable without pre-flattening - `--date-field` still supported; auto-detection picks one of the common timestamp keys (created_at, createdAt, updated_at, ...) - syncAge + lastSync sourced from mem_sync_state `sync search <query>`: - listProfiles → set of (platform, model) pairs → types - backend.search(query, {type, queryEmbedding}) per type - Hybrid FTS + semantic when OpenAI key is configured; FTS-only otherwise. searchMode in the response reports which ran. - Embeds the query once and reuses across types to keep embedding spend bounded. `sync sql` is deprecated: no safe universal raw-SQL surface spans PGlite + Postgres + third-party plugins without leaking backend specifics. Returns a pointer to `mem search` / `mem list` which work against every backend. Verified: sync query attio/attioPeople --where 'values.job_title[0].value like %Engineering%' → 5 correct engineers including "Head of Engineering", "VP of Engineering" sync search "engineering" --platform attio → searchMode: hybrid, 3 engineers ranked Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…rays Groundwork for the runner's upcoming SQLite-write cutover. No behavior change in this commit — writePageToMemory still returns the same insert/ update/skip counters, and existing callers (runner.ts dual-write path, mem-writer.test.ts) pick up the defaults. Added: - MemWriteReport.sourceKeysSeen: string[] Accumulated across pages, this replaces the SQLite-backed seenIds set used by --full-refresh to reconcile deletions. In the next slice the runner archives any mem_records of this type whose source key didn't appear in the run. - MemWriteReport.inserts / MemWriteReport.updates: Record<>[] Populated only when writePageToMemory is called with { capturePerAction: true }. Lets hook dispatch (onInsert, onUpdate, onChange) fire with the right per-record events without a second classify pass — currently that classification reads SQLite, which has to go. Runner keeps dual-writing to SQLite for now because `enrichPhase` (profiles/gmail/gmailThreads.json, profiles/fathom/meetings.json) still reads unenriched rows from the SQLite table. Enrich rewrite + runner cutover land together in a focused follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Running `sync run --to-memory` against Gmail surfaced three real gaps. All three fixed + regression-tested. 1. `sync init <platform> <model> --config '{...}'` now seeds from the built-in profile when no on-disk draft exists. First-time users were hitting `Missing required field: actionId` because the merge base was empty. Now the base resolution walks existing-draft → built-in → empty, matching the docs. 2. Enrich phase now mirrors merged rows into the unified memory store. Before, list sync wrote a partial record to memory and enrich updated only SQLite — so `mem_records.data` held the pre-enrich shape (ids + snippets + historyId) while the full enriched bodies lived only in sqlite. EnrichContext gains `profile?: SyncProfile`, the runner threads the profile through, and enrich calls writePageToMemory at the end of each batch. Best-effort — logs on stderr if memory write fails, never aborts the sync. 3. `memory.searchable` paths support `[]` array wildcards: messages[].snippet messages[].payload.parts[].body.data Previously agents had to hard-index with .0. / .1. which didn't scale for Gmail-style message arrays. New `resolveWildcardPath` fans out each `[]` segment, concatenates the leaves. Four new unit tests cover single / mixed / nested wildcards + missing paths. Tests: 65/65 pass (was 61; +4 wildcard cases). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Nine user-visible surfaces shipped on feat/unified-memory had no docs yet. This sweep covers all of them so agents loading the skill and humans running `--help` see the current shape of the world, not the pre-branch one. - skills/one/SKILL.md - Replaced "Local Data Sync" section with "Unified Memory" — zero- config auto-init, OpenAI key setup (three paths), `mem sync` alias, `memory.searchable` with `[]` wildcard examples, preview loop via `sync test --show-searchable`. - src/lib/guide-content.ts - New GUIDE_MEMORY topic registered in TOPICS + getGuideContent + the `all` bundle. Covers records / graph / sources / sync-into- memory / diagnostics / admin / backends. - GUIDE_SYNC refreshed: init → declare searchable → preview → run flow; removed `sync install` and `sync sql` references (sql is deprecated with a pointer to mem surfaces); file-layout section shows ~/.one/mem.pglite and config.openaiApiKey. - GUIDE_OVERVIEW's sync section recast as "Memory + Sync". - README.md - New `one mem` section with add/search/list/link + three key-setup paths. - `one sync` section merged with mem — documents the declare-then- preview workflow, `memory.searchable` paths, --no-memory flag, mem sync alias. - Commands table refreshed: drop `install`, drop `sql`, add --show- searchable note on test. - src/commands/guide.ts - VALID_TOPICS gains 'memory' so `one guide memory` actually resolves (was returning Unknown topic). - src/index.ts - Top-level --help gains a Memory section and refreshed Data Sync commands. Points at `one guide memory` for the full reference. Tests: 65/65 still green. Rendered output verified via `one --agent guide memory` → 4656 chars, title "One CLI — Agent Guide: Memory". Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Runner now opens SQLite ONLY when the profile declares an enrich phase. Every other profile (Attio, Stripe, Notion, Hacker News, Fathom meetings without enrich, …) runs purely against memory — no .db file created, no ensureTable / evolveSchema / classifyRecords / upsertRecords / rebuildFtsIndex calls, no countRecords polls. Why partial: enrich still reads unenriched rows via SQL (`SELECT * FROM <table> WHERE _enriched_at IS NULL`). Rewriting enrichPhase to query memory instead is a dedicated slice (enrich is 468 lines with its own concurrency/backoff/merge logic). Gated dual-write keeps Gmail/Fathom-with-enrich working while the memory path proves out on the common case. Changes in runner.ts: - `needsSqlite = !!profile.enrich` flag; `db` stays null otherwise. - All db.* calls wrapped in `if (db) { ... }` guards or alternate memory-path branches. - writePageToMemory is now the primary write (was opt-in); runs with `capturePerAction: hasHooks && !db` so hooks read the action flag straight from upsert instead of a second classify pass. - --full-refresh: two paths. SQLite path (enrich profiles) runs the existing NOT IN delete. Memory path walks `backend.list(type)` and archives any record whose source key didn't appear in `seenSourceKeys`. Both run for enrich profiles so the stores stay in sync. - totalRecords: from `countRecords(db, model)` when SQLite available, else from per-page `pageReport.inserted + updated` counter. - Dry-run, enrich gating, FTS rebuild, final state write all null-check db before calling SQLite functions. Verified: `sync run attio --models attioPeople` creates NO .db file, lands 100 rows in memory in 1.7s (was 2+ with dual-write). `sync list`, `sync query`, `sync search` still work unchanged. 65/65 tests green. Follow-up: rewrite enrichPhase to query memory so SQLite can be dropped for enrich profiles too, then the sync engine install step (`better-sqlite3`) becomes optional entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Covers 17 commits landing the unified-memory subsystem: - Zero-config `one mem` (auto-inits on first use) - OpenAI key top-level in config.json with full env > .onerc > project > global precedence; redirect from mem config set - Structured `_upgrade` hint block in agent output when semantic search is available but off - Agent-declared `memory.searchable` with `[]` wildcard support - `sync test --show-searchable` preview loop - Replace-semantics upsert (p_replace flag) for sync rows - Stale Attio built-in profiles rewritten to passthrough - src/lib/sync → src/lib/memory/sync (folder move) - Memory is the primary sync target; --no-memory to skip; --to-memory back-compat no-op - `one mem sync` full alias of `one sync` - Sync state moved from .one/sync/state/*.json to mem_sync_state - `sync query` + `sync search` read from mem_records; `sync sql` deprecated with pointer to mem commands - Enrich phase mirrors merged rows to memory - SQLite writes dropped for non-enrich profiles (enrich-only dual- write remains pending enrichPhase rewrite) - Docs + skill + guide + --help refreshed to match Non-breaking. Deprecations: - --to-memory flag (silent no-op; memory is always written) - `sync sql` (errors with pointer to `mem search` / `mem list`) Tests: 65/65 (+4 new wildcard cases). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…nt data-loss fix) Sync would stringify a nested-id record as "[object Object]" when a profile's idField resolved to an object (Attio v2 returns id as {workspace_id, object_id, record_id}). Every row landed under the SAME memory key and the last one won — sync reported "recordsSynced: 2024" while memory held 1 row. Silent and catastrophic. Three-layer fix: 1. `sync test` — when the sample idField resolves to an object, emit a FAIL check with suggested dotted paths (`id.record_id`, `id.id`, etc). Auto-discovery now tries `id.record_id` / `id.id` as fallbacks after the scalar `id` / `_id` / `uuid` candidates, so a fresh `sync init` on a nested-id platform fixes itself. 2. `mem-writer` — resolves idField via getByDotPath (matches what identityKey already does), and hard-rejects object values with a skip+count rather than silent key collapse. If sync test is somehow bypassed, a run will still refuse to corrupt the store. 3. Built-in Attio profiles — dropped the jq transform workaround introduced in slice 1.5 and moved to `idField: "id.record_id"` directly. Cleaner, no external tool dependency, and the new profile format matches what agents would write by hand after reading the knowledge. Runner's --full-refresh seenIds path updated to use getByDotPath too, so SQLite-backed enrich profiles with nested ids also work. Verified against the exact user repro: `sync run attio --models attioCompanies --max-pages 1 --force` lands 100 DISTINCT records (was 1). Every row has its real UUID as the source key. 65/65 tests green. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…rch / sync list Four surfaces were returning items.length as the total. That lies past the first page, breaks pagination-driven scripts, and makes agents report wrong numbers to users. Changes: - Backend gains MemBackend.count(type, { status }) — one COUNT(*) query. Implemented in postgres-core; forwarded by both lazy plugin wrappers (pglite, postgres). - mem list response grows `returned` (page size) and `total` (real backend count for the filter). Also `limit` and `offset` so agents can page deterministically. Human TTY output gains "N of M — pass --offset X to page" hint when there's more. - sync query response grows `returned`, `total` (post --where filter), `totalRecordsOfType` (before any filters), and `limit`. Scripts can tell at a glance what percentage of the type matched. - sync search returns `returned` + `total` (total across all searched types, pre page-cap). - sync list no longer reports stale .db rowcounts as `totalRecords`. The record count is a real `backend.count(type)` per profile; the legacy SQLite footprint surfaces as a separate `legacyDbSize` field so the dashboard and reality agree, and a visible nudge to run `mem migrate --cleanup`. Verified against attio/attioCompanies (100 real records): --limit 5 → returned: 5, total: 100 --limit 200 → returned: 100, total: 100 sync list → records=100 legacy=0 B (was "records=2024 dbSize=91.3MB") Tests: 65/65 green. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

The unified memory cutover retired `sync sql` on the basis that raw SQL can't safely span PGlite / Postgres / third-party plugins — but the first-party plugins ARE SQL, and the escape hatch for joins / aggregates / JSONB path queries is real. Legacy CEO flows that used `sync sql` silently broke when the command errored out. Bringing it back capability-gated, with a shared read-only guard. - MemBackend grows an optional `raw(sql, params?)` method. postgres- core implements it against its internal client. pglite + postgres plugin wrappers forward to it. - BackendCapabilities gains `rawSql: boolean`. Both first-party plugins advertise true; third-party plugins that opt out get a clear error from the command layer. - New `sql-guard.ts` validates incoming SQL: leading keyword must be SELECT / WITH / EXPLAIN; multi-statement input rejected; DDL/DML /session-control keywords (INSERT, UPDATE, DELETE, DROP, ALTER, CREATE, COPY, PRAGMA, VACUUM, ATTACH, SET SESSION, GRANT, CALL, etc.) blocked even inside CTEs. 10 unit tests cover edge cases. - `one mem sql "<SELECT ...>"` — primary surface; returns columns, rows, rowCount. - `one sync sql <platform>/<model> "<sql>"` — thin alias. Doesn't rewrite the query (that would need a real SQL parser) but nudges the agent on stderr when the expected `WHERE type = '...'` is missing so cross-type results aren't a surprise. Verified against real memory: mem sql "SELECT type, COUNT(*) FROM mem_records GROUP BY type" → attio/attioCompanies=100, attio/attioPeople=100, gmail/gmailThreads=100 mem sql "SELECT data->'values'->'name'->0->>'full_name' AS name FROM mem_records WHERE type='attio/attioPeople' AND data->'values'->'job_title'->0->>'value' ILIKE '%Engineering%' LIMIT 5" → 5 engineers by nested JSONB path mem sql "DELETE FROM mem_records" → blocked by guard Tests: 75/75 green (+10 guard tests). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

… run Before: users running the new unified-memory CLI against a machine that had synced with the SQLite-era CLI ended up with two sources of truth. Legacy .db stayed on disk and \`one\` from npm kept reading it, while local-one wrote to memory only. Silently divergent. Now: \`sync run\` checks for a legacy \`~/.one/sync/data/<platform>.db\` before the runner executes. When the file exists AND memory has zero records for any of this platform's target models, it auto-invokes \`mem migrate --platform <plat> --yes\`: - --agent mode: silent migrate, one-line stderr log with the detected size so the agent can relay to its user. - TTY mode: interactive confirm (default yes). "Found legacy .db (91 MB) with no corresponding memory records. Migrate?" Detection skip conditions: - --dry-run skips the check (wouldn't persist anything anyway) - ANY non-zero record count for this platform's types means memory has already absorbed the data; migrate is the user's call via \`mem migrate --cleanup\` when they're ready. Keeps \`mem migrate\` as the explicit surface; auto-migrate is only a first-use nudge that eliminates the "which number do I trust?" problem. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Full Attio test report surfaced four issues blocking the "rows in memory → rows with embeddings" story. All fixed and verified against 2024 attio/attioCompanies + attio/attioPeople. 1. mem reindex JSONB read corruption. PGlite WASM "Unexpected token 'a', \"active_until\": acti, ...\" errors because the previous reindex called backend.context({limit:5000}) + getById() per row, pulling the full `data` column into memory at scale. Fix: new lean backend.listForReindex({type, limit, offset}) returns ONLY id / type / searchable_text / content_hash / embedding_model. Writes go through new backend.updateEmbedding(id, vector, model) which UPDATEs only the embedding columns — no `data` round-trip. The data field was never actually used for embedding (searchable_ text is the sole input). 2. --full-refresh "memory access out of bounds" during stale-delete. Same root cause — runner used backend.list(type, {limit:100_000}) which pulled full JSONB for every row. Fix: new lean backend. listKeysByType(type) returns only {id, keys[]} for reconcile. Verified: --full-refresh completes cleanly (7.8s for attio), no WASM crash, deletedStale count accurate. 3. No per-run embed override. Added `sync run --embed` / `--no-embed` flags. Threads through SyncRunOptions → runner → writePageToMemory as embedOverride. Lets users backfill embeddings with one flag without editing the profile. Commander convention: `--embed` → true, `--no-embed` → false, absent → defer to profile + config default. 4. mem config set accepted typo keys silently. `mem config set embedOnSync true` (missing `defaults.` prefix) would write a no-op top-level field nothing reads, and `unset` couldn't clear it. Fix: KNOWN_KEYS allowlist + Levenshtein suggestion ("Did you mean `defaults.embedOnSync`?"). Set rejects unknown keys; unset accepts them (otherwise orphans would be stuck forever). Plus a `replace: true` flag on updateMemoryConfig so unset actually deletes — the merge-semantics default was re-adding the deleted key from the on-disk copy. Also: mem reindex gains --type, --force, --limit, --batch flags for scoping backfill to one platform/model and controlling OpenAI batch pressure. Tests: 75/75 still green. Reindex verified against real data: considered 10, reembedded 10, skipped 0, no WASM errors. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…concile Three issues surfaced in the follow-up Attio test of slice 2b-iii. 1. \`mem reindex\` did nothing useful without \`--force\`. The listForReindex query ordered by \`updated_at DESC\` with no SQL- level filter. Because updating a row's embedding also bumps \`updated_at\` (via the BEFORE UPDATE trigger), the query always returned the most-recently-embedded rows first — every iteration saw only rows that already had the correct embedding, the inner loop skipped all of them, and the outer loop terminated with \`considered: N, reembedded: 0\`. Backfill was effectively broken. Fixed: the SQL filter now returns ONLY rows needing work — \`searchable_text IS NOT NULL AND searchable_text <> '' AND (embedded_at IS NULL OR embedding_model IS DISTINCT FROM <target>)\` — ordered \`embedded_at ASC NULLS FIRST, id ASC\`. Plumbed the target model through as a new \`targetEmbeddingModel\` option, plus an \`includeAlreadyEmbedded\` escape hatch for the \`--force\` path. Empty-string searchable_text is excluded explicitly. Synced rows can land with empty text when the profile's memory.searchable paths resolve to nothing (e.g. an Attio contact with no name / title / email — "unnamed" in the source). OpenAI rejects empty input, so keeping them eligible would spin the loop forever on unembeddable rows. Admin loop now uses a fixed PAGE=500 scan size and keeps offset=0 (the SQL filter drains eligible rows as we embed them, so the next page is always fresh — incrementing offset against a moving target would miss rows). 2. \`sync run --embed\` wedged after ~700/2024 rows in \`__psynch_cvwait\`. Node's \`fetch()\` has no default timeout; when a TCP connection is accepted but never responds (mid-run TLS stall, rate-limit queue, etc.) the embed call hangs indefinitely. AbortController + 30s timeout wrapper (FETCH_TIMEOUT_MS) around both \`embed\` and \`embedBatch\`. 30s is comfortably above p99 for the embeddings endpoint; stalls now time out and retry within the existing 3- attempt backoff loop. 3. \`--full-refresh\` left orphans (keys not starting with type prefix). Reconcile pass only archived rows whose source key was in the type- prefix set but not in seenSourceKeys. Rows from earlier buggy versions that ended up with keys missing the prefix entirely were never caught. Added a second archive criterion: if the row has NO key starting with the \`<type>:\` prefix, it's an orphan — archive. Plus the existing "source key not seen this run" path. Tests: 75/75. Verified on real data: reindex --type attio/attioPeople (18 NULL rows, 34 empty) → considered 0 reembedded 0 (correctly excludes empty-text rows) reindex --force --limit 10 → considered 10 reembedded 10 Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…it rates Both of the UX upgrades the most recent Attio test asked for. 1. `sync test --show-searchable` now samples 5 records, not 1. The old single-sample preview left a `—` marker ambiguous: is the path a typo, or does this specific record just not have that field? Real data routinely has sparse fields (Attio industry set on ~1 in 5 companies) and the agent couldn't tell the two cases apart without SQL. Now each declared path shows `hits/total` plus a concrete sample: 5/5 values.name[].value → "SimplyWise" (clean) 1/5 values.industry[].option.title → "Financial Services" (sparse, still real) 0/5 values.nonsense[].foo (no sample — typo or always absent) SyncTestReport gains a `samples` array; buildSearchablePreview aggregates across all of them. JSON response shape: `searchable.paths` grows `hits` + `total` (was `found: bool`); TTY output gets three-way markers (✓ / ~ / ✗) with rate. Config not changed for the default- walker mode (nothing declared → walker preview over first sample, with the tip now pointing at suggest-searchable instead of telling the agent to hand-write dot-paths). 2. `sync suggest-searchable <platform>/<model>` — auto-ranked starter. Walks the first-page records, collects every leaf string/number/ boolean path, scores each by: hitRate × log1p(avgLength) × signalPenalty × typePenalty × shortPenalty where: - hitRate is per-record (array wildcards can't inflate past 1.0) - signalPenalty = (1 - noiseFraction)² — penalizes UUIDs / ISO timestamps / URLs / numeric strings (lat/long style) / known noise enum markers ("system", "text", "personal-name", ...) - typePenalty = 0.05 boolean, 0.1 number, 0.5 mixed, 1.0 string - shortPenalty = 0 for ≤2-char leaves (flags / codes), 1 else Output: ranked list with {path, score, hitRate, avgLength, noiseFraction, sampleValue} + a paste-ready `configPatch` the agent drops straight into `sync init --config`. Verified on real Attio companies: Top: values.description[].value (189 chars, 100%, "SimplyWise: Organize...") values.domains[].domain (13 chars, 100%, "simplywise.com") values.name[].value (11 chars, 100%, "SimplyWise") values.categories[].option.title (100%, "Financial Services") Correctly drops: UUIDs, timestamps, latitude/longitude strings, is_archived boolean, actor_id enums. --show-searchable text preview length dropped from 2727 chars (default walker) to 223 chars on the same record once the suggested paths are applied — every line signal, no UUIDs. Docs refreshed (SKILL.md, guide sync) to describe the two-step workflow: suggest → preview → run. Tests: 82/82 (+7 new suggest-searchable cases covering long prose, UUID/timestamp filter, boolean/number penalty, array hit-rate cap, wildcard dot-path emission, empty-sample case, numeric-string filter). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Two P0 data-correctness bugs caught by Moe's Attio / Fathom test pass. The sync reports success; the data is wrong. Same trust-shape as the original [object Object] silent data loss. 1. `--full-refresh` + `--max-pages` = data-loss command. Reconcile archives any row whose source key wasn't in `seenSourceKeys` this run. With pagination truncated (max-pages cap, empty page, etc.) the keys of unfetched pages never land in the set, so reconcile marks them `deleted_upstream`. Observed: `sync run fathom --max-pages 3 --full-refresh` archived 57 valid meetings after pulling only 30. Fixed: track `paginationComplete` — true only when the loop exits via natural exhaustion (empty records page on page>0, or the profile's paginator returned no nextParams). Reconcile-by-absence now requires `fullRefresh && pagesProcessed > 0 && paginationComplete`. On truncated runs we emit `reconcileSkipped: true` + stderr warning + `deletedStale: 0`. No silent damage. 2. `mem_upsert_by_keys` didn't un-archive on resurrection. When --full-refresh re-pulled a row whose memory record was archived (from a prior buggy reconcile), the upsert updated `data / keys / sources / searchable_text` but left `status = 'archived'`. No self-healing path — 1924 Attio rows stayed stuck at archived across repeated --full-refresh runs. Fixed: the UPDATE branch of the upsert SQL now also sets `status = 'active'` and `archived_reason = NULL`. Semantics: upsert-by-keys always produces an active row. Verified on live data — 200 stuck rows resurrected across a 3-page partial refresh. 3. Surface `statusCounts: {active, archived}` in every sync run. Silent damage was invisible from the happy-path output. `deletedStale` only reports this-run archives. Now sync result includes post-run counts; human output highlights imbalance in red when archived > active. Agents can watch the numbers heal as upsert-by-keys resurrects previously archived rows. Tests: 83/83. Added an integration test asserting upsertByKeys flips archived→active with archived_reason cleared. Verified on live data: # before attio/attioCompanies active: 100, archived: 1925 # after a 3-page --full-refresh --max-pages 3 archived: 1724 (200 resurrected), reconcileSkipped: true, deletedStale: 0 Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

#126 — `sync migrate --dry-run` always reported `inserted` on rows that a real run would `updated`. Cosmetic but misleading after the first live migrate. Now: the dry-run path probes `mem_records.keys && <candidate keys>` via the backend's raw-SQL escape hatch and reports `updated` when the keys already exist. Requires a backend with `rawSql: true` capability (both first-party plugins have it; third-party plugins fall back to always-inserted, matching pre-fix behaviour). #127 — killing a mid-sync process corrupts PGlite. Two parts: 1. Graceful close in signalCleanup. The existing SIGINT/SIGTERM handler updated sync_state + released the filesystem lock but never touched the memory backend. PGlite is WASM-backed Postgres — if it doesn't get a chance to checkpoint its WAL before the process exits, the next `open()` aborts inside ensureSchema with `Aborted()`. The handler now also calls `backend.close()` under a 2s wall-clock cap (so a stuck close can't block the exit). SIGKILL / kernel panic still bypass this; no prevention is possible for uncatchable signals. 2. Clearer doctor diagnostic. When `mem doctor` hits `Aborted()` on the schema-apply check, it now appends the recovery path (delete `~/.one/mem.pglite`) instead of leaving the user to guess. The `postmaster.pid` holding a placeholder `-42` is normal for WASM PGlite, NOT a corruption signal — the Aborted comes from unflushed WAL. An earlier draft of this commit added stale-pid cleanup based on the negative-PID hypothesis; reverted because the heuristic would wrongly delete valid lockfiles. Tests: 83/83. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Closes #132. Migrate used `row[idField]` for id resolution — a flat property lookup, not a dotted path. Combined with the legacy SQLite layer JSON-stringifying nested columns on INSERT (see sync/db.ts: prepareValue), any profile with a dotted idField against a table whose id column holds a stringified object silently dropped every row. Attio companies: 2024/2024 skipped on an empty memory store, while attioPeople (scalar idField "id") migrated cleanly. Three changes: 1. `reviveStringifiedJson(row)` — rehydrates top-level JSON- stringified columns before id resolution. Matches the shape sync sees live. Conservative: only parses strings that start with `{` or `[` and only one level deep (legacy rows never nest further). 2. Id resolution now uses `getByDotPath(hydratedRow, idField)`, the same mechanism `sync run` uses. Hard-rejects nested objects the same way `sync test` / mem-writer do — better to skip visibly than stringify to `[object Object]` and collapse every row onto one key. 3. Report splits the `skipped` counter into `skippedUnresolvedId` (profile missing or idField doesn't resolve) and `skippedError` (upsert threw). Human output prints a warning when every row is unresolved, so a misconfigured profile can't hide. Per-row hint for the first 3 misses so the cause is obvious. Tests: +6, 89 total. reviveStringifiedJson + dotted-path resolution on the exact Attio companies shape. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

Follow-up to #132. When a user re-migrates after changing `idField` (the documented fix for #132's silent-drop bug) the pre-fix cohort's rows have garbage sourceKeys AND no identity keys in keys[] — getByDotPath couldn't resolve `email_addresses[0].email_address` through the stringified JSON of the legacy .db either. The post-fix upsert builds clean sourceKey + identity keys, but with no overlap against the pre-fix cohort's keys[], upsertByKeys inserts a duplicate. Result: active count doubles, half the rows carry legible data and half carry stringified JSON blobs. Three changes, no new commands. 1. Identity-merge pre-pass per type. For each type with an `identityKey`, query existing active rows for `(id, keys, data->path->>...)` via backend.raw and build a `normalized-identity → {id, keys}` map. Cost is one SELECT per type (not per row); JSONB path projection avoids reading the full `data` column (the WASM-memory footgun we've hit before). When a legacy row's identity matches the map, its new keys array is folded with the existing row's keys so upsertByKeys overlaps and hits the update branch. `replace: true` ensures the clean hydrated payload wins over the old stringified shape. Path validation is strict (segments match /^\w+$/) because the path is inlined into SQL (column refs can't be parameterized). Bad input returns null and migrate falls back to plain key-overlap — no regression. 2. Split report counters. `mergedByIdentity` distinguishes healing merges from regular sourceKey updates. Totals object surfaces the same. 3. Doubling warning. `count(type, {status:'active'})` snapshotted before + after migrate per type. When post-migrate growth exceeds `inserted + 2`, print a stderr warning with an exact SQL probe to inspect: `jsonb_typeof(data->'id') GROUP BY t` distinguishes pre-fix rows (t='string') from post-fix rows (t='object') so the user can drop the ghost cohort. Tests: 98, including a live in-memory PGlite check that identity values round-trip through data->JSONB->map, archived rows are excluded, and the SQL-injection guard rejects unsafe paths. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

moekatib and others added 26 commits April 21, 2026 16:19

This was referenced Apr 24, 2026

sync migration dry-run reports wrong verb (inserted vs updated) #126

Closed

SIGTERM during sync corrupts pglite (negative postmaster.pid, ensureSchema aborts) #127

Closed

moekatib mentioned this pull request Apr 24, 2026

[Sync] identityKeys (plural) — multiple cross-platform identity keys per record #128

Open

moekatib and others added 2 commits April 23, 2026 22:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: unified memory (v1.42.0)#125

feat: unified memory (v1.42.0)#125
moekatib wants to merge 29 commits intomainfrom
feat/unified-memory

moekatib commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

moekatib commented Apr 23, 2026

Summary

What's new

Deprecations (non-breaking)

Migration

Docs

Test plan

Follow-up slices (separate PRs)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant