Open
Conversation
Single-branch delivery plan for collapsing @withone/mem and one sync into one CLI subsystem with a pluggable backend architecture. Default backend is PGlite (embedded Postgres), with Postgres shipped as a first-party plugin and third-party plugins supported via dynamic import. Schema consolidates external_refs as a sources JSONB column on mem_records keyed by prefixed source ids, and keeps mem_links separate for bidirectional graph traversal. Embeddings are optional, configured at one mem init, OpenAI key stored in ~/.one/config.json (mode 0600) alongside ONE_SECRET with the same env/onerc/project/global precedence chain.
Wires the skeleton for folding @withone/mem and one sync into one CLI subsystem with a pluggable backend architecture. Contract + registry: - MemBackend interface + MemBackendPlugin factory + BackendCapabilities - plugins.ts registry with dynamic loader for third-party plugins declared in memory.plugins config First-party plugins (stubbed): - pglite: default, embedded Postgres, caps all true except concurrentWriters - postgres: Supabase/Neon/self-hosted via node-pg, caps all true Pure helpers implemented: - schema.ts: full DDL + PL/pgSQL functions (mem_upsert_by_keys, mem_calculate_relevance, mem_hybrid_search, mem_enforce_key_uniqueness) - canonical.ts: deterministic JSON + sha256 for content_hash - scoring.ts: TS port of mem_calculate_relevance for in-memory ranking Config: - memory block in ~/.one/config.json alongside ONE_SECRET (mode 0600) - OPENAI_API_KEY env > .onerc > project > global precedence - per-backend config keyed by plugin name so new plugins compose in CLI surface: - `one mem` registered with all subcommands discoverable via --help - `one mem status` fully working; returns config + registered plugin descriptors + capability matrix - All other subcommands return a scaffolded-not-implemented note Tests: canonical.test.ts + scoring.test.ts (20 assertions, all green) Typecheck: clean on the new module (no new errors vs baseline)
The shared query layer (CoreBackend) implements MemBackend over a PgClient abstraction. Both first-party plugins are now thin adapters: - PGlite plugin: lazy-imports @electric-sql/pglite + the vector extension, routes no-param queries through exec() (PGlite rejects multi-statement SQL via query()), and falls back to an in-memory DB when dbPath is ":memory:". Single-writer; transactions reuse the outer client. - Postgres plugin: lazy-imports node-pg, opens a Pool on the configured connection string, transactions take a dedicated client. CoreBackend implements the full MemBackend surface: - Records: insert, upsertByKeys (via mem_upsert_by_keys server function that merges data + unions tags/keys/sources into existing rows when keys overlap), getById (with optional link hydration), update, remove, archive/unarchive, list. - Search: hybrid when embedding provided + vectorSearch:true; otherwise FTS-only fallback. trackAccess bumps access_count on returned rows. - Context: relevance-ranked active records via mem_calculate_relevance. - Graph: link/unlink/linked with bidirectional traversal semantics. - Sources: JSONB map ops (addSource also extends the keys array so findBySource works via the keys GIN index). - Sync state: upsert, getOne, listAll. - Hot columns: partial expression indexes scoped by type. - Maintenance: vacuum, stats. Schema change: dropped pg_trgm from EXTENSIONS_SQL. The original mem schema required it but nothing in the unified query layer uses trigram matching, and PGlite doesn't ship pg_trgm by default. Vector + tsvector cover every current query. The comment notes this can be added back via an optional extension hook if fuzzy-match search lands. Tests: 12-assertion live integration suite against a fresh in-memory PGlite (ensureSchema + insert + upsertByKeys merge semantics + findBySource + addSource + graph link/linked + FTS + context + archive/unarchive + sync state + stats). All 57 tests green; typecheck clean on the memory module with no new errors vs baseline.
…raph, admin) The memory-layer CLI surface is now functional against a live PGlite backend. Every command listed in docs/plans/unified-memory.md §6 either dispatches to a working handler or prints a clear deprecation/migration note. Embedding provider + orchestration: - lib/memory/embedding.ts: OpenAI embeddings (single + batch), retry with backoff, content-hash gate, graceful fallback to null when provider=none or no API key. defaultSearchableText() extracts a capped string value from arbitrary JSON as the fallback for records with no profile-defined searchable template. - lib/memory/runtime.ts: process-local backend singleton with lazy init+ensureSchema, plus addRecord/upsertRecord helpers that derive searchable_text, content_hash, and (optionally) embedding before calling the backend. Precedence: opts.embed → input.embed → config default. Config + init + doctor: - commands/mem/init.ts: interactive @clack/prompts flow + fully-flagged non-interactive mode. OpenAI key hidden-input, stored in ~/.one/config.json (mode 0600, same file as ONE_SECRET). Backend warmup (open + ensureSchema + version check) runs before returning success. - commands/mem/config.ts: get/set/unset dot-path access with automatic secret redaction (embedding.apiKey, postgres.connectionString). --show-secrets opt-in to reveal. - commands/mem/doctor.ts: 7-check health report (config, plugin resolve, backend open, schema version, stats, embedding reachable, capability consistency). Non-zero exit when any check fails. Record operations: - commands/mem/records.ts: add/get/update/archive/weight/flush/list, search (FTS-only default; --deep forces semantic when available), context (relevance-sorted), link/unlink/linked, sources/find-by-source. Admin/IO: - commands/mem/admin.ts: vacuum, reindex (re-embed under a new model). - commands/mem/export.ts: JSONL export/import, idempotent via keys. - commands/mem/migrate.ts: imports legacy .one/sync/data/*.db files into the unified store using each profile's idField + identityKey. Optional --cleanup removes legacy files after confirmation. Respects --dry-run. Build: tsup externals extended to @electric-sql/pglite (+vector) and pg so runtime WASM/pure-TS assets load from node_modules instead of being inlined into the bundle. Typecheck: clean on the memory module (5 pre-existing errors unchanged). Tests: 57/57 green. Live smoke test verified init → add → search → doctor → status pipeline against a throwaway HOME.
Opt-in dual-write integration between the sync runner and the unified memory backend. Motivated by §9 of the plan but scoped conservatively: synced rows continue landing in SQLite as they do today; when `--to-memory` is passed, each page ALSO flows through `upsertRecord` into mem_records. This lets the dual path be verified on real data before we flip memory to primary. - lib/sync/mem-writer.ts: writePageToMemory(profile, records) with prefixed source key (`<platform>/<model>:<id>`), optional identity key promoted to a second mem key (`email:...`, `phone:...`, `domain:...`, or `id:...` heuristic from the profile's identityKey dot path), tags = ['synced', platform], source entry with last_synced_at, and a per-record strip of any `_`-prefixed sync-internal fields. Errors are swallowed per-record so a single bad row never breaks the sync. - lib/sync/types.ts: SyncRunOptions.toMemory flag. - lib/sync/runner.ts: after the existing SQLite upsert, if toMemory is set, call writePageToMemory for the same page. Failures are logged but never fail the sync. - lib/sync/index.ts: `one sync run --to-memory` surface. - commands/mem.ts: updated the `one mem sync` placeholder to point at `one sync --to-memory` and note that canonical alias delegation is a follow-up slice. Test: lib/sync/mem-writer.test.ts — 4 live-PGlite assertions covering first-page write with prefixed keys + identity merge, idempotent re-run (update not insert), skip-on-missing-id, and the underscore-field strip. 4/4 green; full suite 61/61. Deliberately out-of-scope for this commit (tracked for follow-up): - Physical folder move src/lib/sync/ → src/lib/memory/sync/. Pure rename, touches 3 external imports, zero behavioral impact, cleanly separable. - Memory as the primary write target (replaces SQLite) — gated on the dual-write proving itself in Moe's destructive exploration pass. - Full `one mem sync` alias delegation — waits on the above.
UX: memory auto-initializes on first `one mem` call with pglite defaults — no `mem init` prerequisite. Picks `embedding.provider: openai` when an OpenAI key is already resolvable (env / .onerc / config), else stays `none`. One-line stderr breadcrumb on TTY; silent in --agent mode. `one init` grows an optional skip-able OpenAI-key prompt, stored at the top level as `config.openaiApiKey` (peer of `apiKey`). Full precedence chain mirrors ONE_SECRET: env > .onerc > project config > global. `mem config set embedding.apiKey` transparently redirects to the top-level field and strips any stale value from the memory block. Secret redaction on all read surfaces. AX: sync profiles gain `memory.searchable: [dot-paths]`. Declared fields drive the embedded + FTS text so agents produce clean, signal- dense embeddings instead of the 90%-noise default walker output. No declaration = fallback to `defaultSearchableText`. `one sync test <platform>/<model> --show-searchable` previews the exact text that would be embedded, with per-path resolution (✓ / empty) and sample values. Agents iterate before paying the embedding cost. Fix: CONFIG_DIR / CONFIG_FILE / PROJECTS_DIR now resolved lazily via getters (were module-bound to os.homedir() at import time). Tests that set process.env.HOME in before() hooks could not previously isolate — this surfaced as the real ~/.one/config.json being overwritten with a test stub during `npm test`. Verified: Attio people sync with clean `memory.searchable` paths produces 158-char embedded text (vs 2297 chars from the default walker) and "venture capital investor" / "head of engineering" queries return semantically coherent rankings. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
When memory is running in FTS-only mode (no OpenAI key, or provider still
`none`), every agent-facing response from `mem status`, `mem search`, and
`mem doctor` now includes a structured `_upgrade` block:
{
"capability": "semantic_search",
"available": true,
"currentMode": "fts_only",
"how": "Add an OpenAI key: `one init` (then \"Add OpenAI key\"), or
`one mem config set embedding.apiKey sk-...`",
"benefit": "Ranks memories by meaning, not just keyword overlap..."
}
The agent can now tell its user "semantic search is available as an
upgrade" — previously this capability was silently degraded, so agents
never mentioned it. Human TTY output gets a matching dim one-liner.
`mem search` also gains a top-level `searchMode` field ("fts_only" |
"hybrid") so the mode is inspectable without re-deriving it from config.
Hint only appears when the capability is actually off — zero noise for
already-configured installs.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two correctness fixes surfaced during the Attio dual-write derisking.
1. mem_upsert_by_keys learns a p_replace flag (default FALSE). When
TRUE, the existing record's `data` is REPLACED wholesale by the
incoming payload instead of shallow-merged. Sync callers pass TRUE
so fields removed at the source actually disappear from memory;
interactive `mem add` / `mem update` keep the default merge path
because those are patches, not snapshots.
Threaded through: MemBackend.upsertByKeys signature gains an
UpsertOptions arg, postgres-core plugin passes it to the SQL
function, runtime.upsertRecord forwards opts.replace, mem-writer
sets replace: true on every sync page.
2. profiles/attio/attioPeople.json + attioCompanies.json were pointing
at a stale composer action (::attio-people-list) whose ID fails
base64-decode at the gateway. Swapped both to the passthrough
"List an Object's Records" endpoint with:
- pathVars: { object: "people" | "companies" }
- resultsPath: "data"
- identityKey: "values.email_addresses[0].email_address" /
"values.domains[0].domain"
- transform: jq flattens id.record_id → id so the runner's
bracket-access idField extraction works
Verified: `one sync init attio attioPeople` now lands a working
profile with all 6 test checks green, `sync run --to-memory` pulls
100 records cleanly, find-by-source resolves to the right record
with the identity key intact.
Also honors the feedback_sync_passthrough memory: built-ins no longer
ship a custom/composer actionId.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sync is structurally part of the unified memory subsystem now — its hot path writes through upsertByKeys into mem_records, and the next slice kills the SQLite fallback entirely. The folder relocation makes that ownership explicit. Pure rename. Internal relative imports rewritten: ../memory/X → ../X (now siblings inside memory/) ../X → ../../X (lib/ peers one level further out) External importers fixed: src/index.ts — registerSyncCommands path src/commands/mem/migrate.ts — readProfile, openDatabase, etc. Zero behavior changes. All 61 tests green; `sync list`, `sync test` still function. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Memory is now ALWAYS written on `sync run` — the `--to-memory` flag was an opt-in during the dual-write derisking window and is now a silent no-op retained for back-compat. Users who explicitly want the old single-write behaviour pass `--no-memory`. `one mem sync` is now a full alias for `one sync`. Same handlers, same options, same profile format — only the command path differs. registerSyncCommands was extracted into an inner registerSyncSubcommands so it can be mounted on multiple parents without forking the implementation. Verified end-to-end: one mem sync run attio --models attioPeople --max-pages 1 → 100 records, no opt-in flag needed Verified back-compat: --to-memory still accepted (no-op, doesn't error) --no-memory skips memory writes for the minority case that wants it `one sync run` still works identically to `one mem sync run` SQLite is still written in parallel this commit — the drop comes in the follow-up (memory-only writes, sync state → mem_sync_state, query commands read memory). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Bug surfaced by sub-agent verification of the unified-memory branch:
after `mem config set embedding.apiKey sk-...` on a store that had
auto-initialized earlier with `provider: none`, the upgrade hint kept
telling agents to "flip the provider on" even though the user had
clearly just opted into semantic search.
The core `setOpenAiApiKey` in lib/config.ts only persists the bytes.
memory/config.ts now wraps it with memory-aware semantics: when a
non-empty key is set AND a memory block exists AND provider is still
`none`, flip provider to `openai` in the same write.
Other paths:
- key cleared (`''`) → leave provider alone (hint can legitimately
reappear; user is rotating credentials)
- memory block missing → auto-init on next `getBackend()` picks
`openai` natively, no help needed
- provider already openai → no-op
All setOpenAiApiKey call sites (one init's fresh-setup + existing-
config handler, mem init, mem config set) now go through the
memory-aware wrapper by importing from lib/memory/index.js.
Verified: setting a key flips provider and suppresses the upgrade
hint; unsetting brings back the "Add key" hint (not the provider
hint — that distinction matters so the agent tells the user the
right thing).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…c_state Sync state now lives in the backend's mem_sync_state table alongside the rest of the unified memory subsystem. Replaces the per-model JSON files at .one/sync/state/<platform>/<model>.json (and the older single-file sync_state.json). state.ts becomes a thin async adapter over backend.getSyncState / setSyncState / listSyncStates / (new) removeSyncState. Legacy files are read-once, imported on first access, then deleted — no data loss when upgrading an existing install. Call sites updated to await the now-async API: runner.ts (resume check, transition-to-syncing, per-page progress, final-idle, failure branch, SIGINT/SIGTERM handler), sync list, sync remove, sync query. Verified: `sync list` reads 5 profiles from mem_sync_state with correct per-model status/lastSync; legacy JSON file + state dir cleaned up on first access. Backend surface gained MemBackend.removeSyncState(platform, model?) with matching implementation in postgres-core. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Both query paths now go through the unified memory store:
`sync query <platform>/<model>`:
- backend.list(type) pulls active records (capped at 10k per scan)
- --where / --after / --before / --order-by filters run in TS over
the record's `data` JSONB
- dotted --where paths work (e.g. values.job_title[0].value like %X%)
so nested payloads are filterable without pre-flattening
- `--date-field` still supported; auto-detection picks one of the
common timestamp keys (created_at, createdAt, updated_at, ...)
- syncAge + lastSync sourced from mem_sync_state
`sync search <query>`:
- listProfiles → set of (platform, model) pairs → types
- backend.search(query, {type, queryEmbedding}) per type
- Hybrid FTS + semantic when OpenAI key is configured; FTS-only
otherwise. searchMode in the response reports which ran.
- Embeds the query once and reuses across types to keep embedding
spend bounded.
`sync sql` is deprecated: no safe universal raw-SQL surface spans
PGlite + Postgres + third-party plugins without leaking backend
specifics. Returns a pointer to `mem search` / `mem list` which work
against every backend.
Verified:
sync query attio/attioPeople --where 'values.job_title[0].value like %Engineering%'
→ 5 correct engineers including "Head of Engineering", "VP of Engineering"
sync search "engineering" --platform attio
→ searchMode: hybrid, 3 engineers ranked
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…rays
Groundwork for the runner's upcoming SQLite-write cutover. No behavior
change in this commit — writePageToMemory still returns the same insert/
update/skip counters, and existing callers (runner.ts dual-write path,
mem-writer.test.ts) pick up the defaults.
Added:
- MemWriteReport.sourceKeysSeen: string[]
Accumulated across pages, this replaces the SQLite-backed seenIds
set used by --full-refresh to reconcile deletions. In the next slice
the runner archives any mem_records of this type whose source key
didn't appear in the run.
- MemWriteReport.inserts / MemWriteReport.updates: Record<>[]
Populated only when writePageToMemory is called with
{ capturePerAction: true }. Lets hook dispatch (onInsert, onUpdate,
onChange) fire with the right per-record events without a second
classify pass — currently that classification reads SQLite, which
has to go.
Runner keeps dual-writing to SQLite for now because `enrichPhase`
(profiles/gmail/gmailThreads.json, profiles/fathom/meetings.json)
still reads unenriched rows from the SQLite table. Enrich rewrite +
runner cutover land together in a focused follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Running `sync run --to-memory` against Gmail surfaced three real gaps.
All three fixed + regression-tested.
1. `sync init <platform> <model> --config '{...}'` now seeds from the
built-in profile when no on-disk draft exists. First-time users
were hitting `Missing required field: actionId` because the merge
base was empty. Now the base resolution walks existing-draft →
built-in → empty, matching the docs.
2. Enrich phase now mirrors merged rows into the unified memory store.
Before, list sync wrote a partial record to memory and enrich
updated only SQLite — so `mem_records.data` held the pre-enrich
shape (ids + snippets + historyId) while the full enriched bodies
lived only in sqlite. EnrichContext gains `profile?: SyncProfile`,
the runner threads the profile through, and enrich calls
writePageToMemory at the end of each batch. Best-effort — logs on
stderr if memory write fails, never aborts the sync.
3. `memory.searchable` paths support `[]` array wildcards:
messages[].snippet
messages[].payload.parts[].body.data
Previously agents had to hard-index with .0. / .1. which didn't
scale for Gmail-style message arrays. New `resolveWildcardPath`
fans out each `[]` segment, concatenates the leaves. Four new
unit tests cover single / mixed / nested wildcards + missing paths.
Tests: 65/65 pass (was 61; +4 wildcard cases).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Nine user-visible surfaces shipped on feat/unified-memory had no docs
yet. This sweep covers all of them so agents loading the skill and
humans running `--help` see the current shape of the world, not the
pre-branch one.
- skills/one/SKILL.md
- Replaced "Local Data Sync" section with "Unified Memory" — zero-
config auto-init, OpenAI key setup (three paths), `mem sync`
alias, `memory.searchable` with `[]` wildcard examples, preview
loop via `sync test --show-searchable`.
- src/lib/guide-content.ts
- New GUIDE_MEMORY topic registered in TOPICS + getGuideContent +
the `all` bundle. Covers records / graph / sources / sync-into-
memory / diagnostics / admin / backends.
- GUIDE_SYNC refreshed: init → declare searchable → preview → run
flow; removed `sync install` and `sync sql` references (sql is
deprecated with a pointer to mem surfaces); file-layout section
shows ~/.one/mem.pglite and config.openaiApiKey.
- GUIDE_OVERVIEW's sync section recast as "Memory + Sync".
- README.md
- New `one mem` section with add/search/list/link + three key-setup
paths.
- `one sync` section merged with mem — documents the declare-then-
preview workflow, `memory.searchable` paths, --no-memory flag,
mem sync alias.
- Commands table refreshed: drop `install`, drop `sql`, add --show-
searchable note on test.
- src/commands/guide.ts
- VALID_TOPICS gains 'memory' so `one guide memory` actually
resolves (was returning Unknown topic).
- src/index.ts
- Top-level --help gains a Memory section and refreshed Data Sync
commands. Points at `one guide memory` for the full reference.
Tests: 65/65 still green. Rendered output verified via
`one --agent guide memory` → 4656 chars, title "One CLI — Agent
Guide: Memory".
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Runner now opens SQLite ONLY when the profile declares an enrich phase.
Every other profile (Attio, Stripe, Notion, Hacker News, Fathom
meetings without enrich, …) runs purely against memory — no .db file
created, no ensureTable / evolveSchema / classifyRecords / upsertRecords
/ rebuildFtsIndex calls, no countRecords polls.
Why partial: enrich still reads unenriched rows via SQL (`SELECT *
FROM <table> WHERE _enriched_at IS NULL`). Rewriting enrichPhase to
query memory instead is a dedicated slice (enrich is 468 lines with
its own concurrency/backoff/merge logic). Gated dual-write keeps
Gmail/Fathom-with-enrich working while the memory path proves out on
the common case.
Changes in runner.ts:
- `needsSqlite = !!profile.enrich` flag; `db` stays null otherwise.
- All db.* calls wrapped in `if (db) { ... }` guards or alternate
memory-path branches.
- writePageToMemory is now the primary write (was opt-in); runs
with `capturePerAction: hasHooks && !db` so hooks read the action
flag straight from upsert instead of a second classify pass.
- --full-refresh: two paths. SQLite path (enrich profiles) runs the
existing NOT IN delete. Memory path walks `backend.list(type)` and
archives any record whose source key didn't appear in
`seenSourceKeys`. Both run for enrich profiles so the stores stay
in sync.
- totalRecords: from `countRecords(db, model)` when SQLite available,
else from per-page `pageReport.inserted + updated` counter.
- Dry-run, enrich gating, FTS rebuild, final state write all
null-check db before calling SQLite functions.
Verified: `sync run attio --models attioPeople` creates NO .db file,
lands 100 rows in memory in 1.7s (was 2+ with dual-write).
`sync list`, `sync query`, `sync search` still work unchanged. 65/65
tests green.
Follow-up: rewrite enrichPhase to query memory so SQLite can be
dropped for enrich profiles too, then the sync engine install step
(`better-sqlite3`) becomes optional entirely.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Covers 17 commits landing the unified-memory subsystem: - Zero-config `one mem` (auto-inits on first use) - OpenAI key top-level in config.json with full env > .onerc > project > global precedence; redirect from mem config set - Structured `_upgrade` hint block in agent output when semantic search is available but off - Agent-declared `memory.searchable` with `[]` wildcard support - `sync test --show-searchable` preview loop - Replace-semantics upsert (p_replace flag) for sync rows - Stale Attio built-in profiles rewritten to passthrough - src/lib/sync → src/lib/memory/sync (folder move) - Memory is the primary sync target; --no-memory to skip; --to-memory back-compat no-op - `one mem sync` full alias of `one sync` - Sync state moved from .one/sync/state/*.json to mem_sync_state - `sync query` + `sync search` read from mem_records; `sync sql` deprecated with pointer to mem commands - Enrich phase mirrors merged rows to memory - SQLite writes dropped for non-enrich profiles (enrich-only dual- write remains pending enrichPhase rewrite) - Docs + skill + guide + --help refreshed to match Non-breaking. Deprecations: - --to-memory flag (silent no-op; memory is always written) - `sync sql` (errors with pointer to `mem search` / `mem list`) Tests: 65/65 (+4 new wildcard cases). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…nt data-loss fix)
Sync would stringify a nested-id record as "[object Object]" when a
profile's idField resolved to an object (Attio v2 returns id as
{workspace_id, object_id, record_id}). Every row landed under the
SAME memory key and the last one won — sync reported
"recordsSynced: 2024" while memory held 1 row. Silent and catastrophic.
Three-layer fix:
1. `sync test` — when the sample idField resolves to an object, emit a
FAIL check with suggested dotted paths (`id.record_id`, `id.id`,
etc). Auto-discovery now tries `id.record_id` / `id.id` as fallbacks
after the scalar `id` / `_id` / `uuid` candidates, so a fresh
`sync init` on a nested-id platform fixes itself.
2. `mem-writer` — resolves idField via getByDotPath (matches what
identityKey already does), and hard-rejects object values with a
skip+count rather than silent key collapse. If sync test is
somehow bypassed, a run will still refuse to corrupt the store.
3. Built-in Attio profiles — dropped the jq transform workaround
introduced in slice 1.5 and moved to `idField: "id.record_id"`
directly. Cleaner, no external tool dependency, and the new
profile format matches what agents would write by hand after
reading the knowledge.
Runner's --full-refresh seenIds path updated to use getByDotPath too,
so SQLite-backed enrich profiles with nested ids also work.
Verified against the exact user repro: `sync run attio --models
attioCompanies --max-pages 1 --force` lands 100 DISTINCT records
(was 1). Every row has its real UUID as the source key.
65/65 tests green.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…rch / sync list
Four surfaces were returning items.length as the total. That lies past
the first page, breaks pagination-driven scripts, and makes agents
report wrong numbers to users.
Changes:
- Backend gains MemBackend.count(type, { status }) — one COUNT(*) query.
Implemented in postgres-core; forwarded by both lazy plugin wrappers
(pglite, postgres).
- mem list response grows `returned` (page size) and `total` (real
backend count for the filter). Also `limit` and `offset` so agents
can page deterministically. Human TTY output gains "N of M — pass
--offset X to page" hint when there's more.
- sync query response grows `returned`, `total` (post --where filter),
`totalRecordsOfType` (before any filters), and `limit`. Scripts can
tell at a glance what percentage of the type matched.
- sync search returns `returned` + `total` (total across all searched
types, pre page-cap).
- sync list no longer reports stale .db rowcounts as `totalRecords`.
The record count is a real `backend.count(type)` per profile; the
legacy SQLite footprint surfaces as a separate `legacyDbSize` field
so the dashboard and reality agree, and a visible nudge to run
`mem migrate --cleanup`.
Verified against attio/attioCompanies (100 real records):
--limit 5 → returned: 5, total: 100
--limit 200 → returned: 100, total: 100
sync list → records=100 legacy=0 B (was "records=2024 dbSize=91.3MB")
Tests: 65/65 green.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
The unified memory cutover retired `sync sql` on the basis that raw
SQL can't safely span PGlite / Postgres / third-party plugins — but
the first-party plugins ARE SQL, and the escape hatch for joins /
aggregates / JSONB path queries is real. Legacy CEO flows that used
`sync sql` silently broke when the command errored out.
Bringing it back capability-gated, with a shared read-only guard.
- MemBackend grows an optional `raw(sql, params?)` method. postgres-
core implements it against its internal client. pglite + postgres
plugin wrappers forward to it.
- BackendCapabilities gains `rawSql: boolean`. Both first-party
plugins advertise true; third-party plugins that opt out get a
clear error from the command layer.
- New `sql-guard.ts` validates incoming SQL: leading keyword must be
SELECT / WITH / EXPLAIN; multi-statement input rejected; DDL/DML
/session-control keywords (INSERT, UPDATE, DELETE, DROP, ALTER,
CREATE, COPY, PRAGMA, VACUUM, ATTACH, SET SESSION, GRANT, CALL,
etc.) blocked even inside CTEs. 10 unit tests cover edge cases.
- `one mem sql "<SELECT ...>"` — primary surface; returns columns,
rows, rowCount.
- `one sync sql <platform>/<model> "<sql>"` — thin alias. Doesn't
rewrite the query (that would need a real SQL parser) but nudges
the agent on stderr when the expected `WHERE type = '...'` is
missing so cross-type results aren't a surprise.
Verified against real memory:
mem sql "SELECT type, COUNT(*) FROM mem_records GROUP BY type"
→ attio/attioCompanies=100, attio/attioPeople=100, gmail/gmailThreads=100
mem sql "SELECT data->'values'->'name'->0->>'full_name' AS name
FROM mem_records WHERE type='attio/attioPeople'
AND data->'values'->'job_title'->0->>'value' ILIKE '%Engineering%'
LIMIT 5"
→ 5 engineers by nested JSONB path
mem sql "DELETE FROM mem_records" → blocked by guard
Tests: 75/75 green (+10 guard tests).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
… run Before: users running the new unified-memory CLI against a machine that had synced with the SQLite-era CLI ended up with two sources of truth. Legacy .db stayed on disk and \`one\` from npm kept reading it, while local-one wrote to memory only. Silently divergent. Now: \`sync run\` checks for a legacy \`~/.one/sync/data/<platform>.db\` before the runner executes. When the file exists AND memory has zero records for any of this platform's target models, it auto-invokes \`mem migrate --platform <plat> --yes\`: - --agent mode: silent migrate, one-line stderr log with the detected size so the agent can relay to its user. - TTY mode: interactive confirm (default yes). "Found legacy .db (91 MB) with no corresponding memory records. Migrate?" Detection skip conditions: - --dry-run skips the check (wouldn't persist anything anyway) - ANY non-zero record count for this platform's types means memory has already absorbed the data; migrate is the user's call via \`mem migrate --cleanup\` when they're ready. Keeps \`mem migrate\` as the explicit surface; auto-migrate is only a first-use nudge that eliminates the "which number do I trust?" problem. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Full Attio test report surfaced four issues blocking the "rows in
memory → rows with embeddings" story. All fixed and verified against
2024 attio/attioCompanies + attio/attioPeople.
1. mem reindex JSONB read corruption. PGlite WASM "Unexpected token
'a', \"active_until\": acti, ...\" errors because the previous
reindex called backend.context({limit:5000}) + getById() per row,
pulling the full `data` column into memory at scale. Fix: new
lean backend.listForReindex({type, limit, offset}) returns ONLY
id / type / searchable_text / content_hash / embedding_model.
Writes go through new backend.updateEmbedding(id, vector, model)
which UPDATEs only the embedding columns — no `data` round-trip.
The data field was never actually used for embedding (searchable_
text is the sole input).
2. --full-refresh "memory access out of bounds" during stale-delete.
Same root cause — runner used backend.list(type, {limit:100_000})
which pulled full JSONB for every row. Fix: new lean backend.
listKeysByType(type) returns only {id, keys[]} for reconcile.
Verified: --full-refresh completes cleanly (7.8s for attio), no
WASM crash, deletedStale count accurate.
3. No per-run embed override. Added `sync run --embed` / `--no-embed`
flags. Threads through SyncRunOptions → runner → writePageToMemory
as embedOverride. Lets users backfill embeddings with one flag
without editing the profile. Commander convention: `--embed` → true,
`--no-embed` → false, absent → defer to profile + config default.
4. mem config set accepted typo keys silently. `mem config set
embedOnSync true` (missing `defaults.` prefix) would write a
no-op top-level field nothing reads, and `unset` couldn't clear
it. Fix: KNOWN_KEYS allowlist + Levenshtein suggestion ("Did you
mean `defaults.embedOnSync`?"). Set rejects unknown keys; unset
accepts them (otherwise orphans would be stuck forever). Plus a
`replace: true` flag on updateMemoryConfig so unset actually
deletes — the merge-semantics default was re-adding the deleted
key from the on-disk copy.
Also: mem reindex gains --type, --force, --limit, --batch flags for
scoping backfill to one platform/model and controlling OpenAI batch
pressure.
Tests: 75/75 still green. Reindex verified against real data:
considered 10, reembedded 10, skipped 0, no WASM errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…concile Three issues surfaced in the follow-up Attio test of slice 2b-iii. 1. \`mem reindex\` did nothing useful without \`--force\`. The listForReindex query ordered by \`updated_at DESC\` with no SQL- level filter. Because updating a row's embedding also bumps \`updated_at\` (via the BEFORE UPDATE trigger), the query always returned the most-recently-embedded rows first — every iteration saw only rows that already had the correct embedding, the inner loop skipped all of them, and the outer loop terminated with \`considered: N, reembedded: 0\`. Backfill was effectively broken. Fixed: the SQL filter now returns ONLY rows needing work — \`searchable_text IS NOT NULL AND searchable_text <> '' AND (embedded_at IS NULL OR embedding_model IS DISTINCT FROM <target>)\` — ordered \`embedded_at ASC NULLS FIRST, id ASC\`. Plumbed the target model through as a new \`targetEmbeddingModel\` option, plus an \`includeAlreadyEmbedded\` escape hatch for the \`--force\` path. Empty-string searchable_text is excluded explicitly. Synced rows can land with empty text when the profile's memory.searchable paths resolve to nothing (e.g. an Attio contact with no name / title / email — "unnamed" in the source). OpenAI rejects empty input, so keeping them eligible would spin the loop forever on unembeddable rows. Admin loop now uses a fixed PAGE=500 scan size and keeps offset=0 (the SQL filter drains eligible rows as we embed them, so the next page is always fresh — incrementing offset against a moving target would miss rows). 2. \`sync run --embed\` wedged after ~700/2024 rows in \`__psynch_cvwait\`. Node's \`fetch()\` has no default timeout; when a TCP connection is accepted but never responds (mid-run TLS stall, rate-limit queue, etc.) the embed call hangs indefinitely. AbortController + 30s timeout wrapper (FETCH_TIMEOUT_MS) around both \`embed\` and \`embedBatch\`. 30s is comfortably above p99 for the embeddings endpoint; stalls now time out and retry within the existing 3- attempt backoff loop. 3. \`--full-refresh\` left orphans (keys not starting with type prefix). Reconcile pass only archived rows whose source key was in the type- prefix set but not in seenSourceKeys. Rows from earlier buggy versions that ended up with keys missing the prefix entirely were never caught. Added a second archive criterion: if the row has NO key starting with the \`<type>:\` prefix, it's an orphan — archive. Plus the existing "source key not seen this run" path. Tests: 75/75. Verified on real data: reindex --type attio/attioPeople (18 NULL rows, 34 empty) → considered 0 reembedded 0 (correctly excludes empty-text rows) reindex --force --limit 10 → considered 10 reembedded 10 Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
…it rates
Both of the UX upgrades the most recent Attio test asked for.
1. `sync test --show-searchable` now samples 5 records, not 1.
The old single-sample preview left a `—` marker ambiguous: is the path
a typo, or does this specific record just not have that field? Real
data routinely has sparse fields (Attio industry set on ~1 in 5
companies) and the agent couldn't tell the two cases apart without
SQL.
Now each declared path shows `hits/total` plus a concrete sample:
5/5 values.name[].value → "SimplyWise" (clean)
1/5 values.industry[].option.title → "Financial Services" (sparse, still real)
0/5 values.nonsense[].foo (no sample — typo or always absent)
SyncTestReport gains a `samples` array; buildSearchablePreview
aggregates across all of them. JSON response shape: `searchable.paths`
grows `hits` + `total` (was `found: bool`); TTY output gets three-way
markers (✓ / ~ / ✗) with rate. Config not changed for the default-
walker mode (nothing declared → walker preview over first sample,
with the tip now pointing at suggest-searchable instead of telling
the agent to hand-write dot-paths).
2. `sync suggest-searchable <platform>/<model>` — auto-ranked starter.
Walks the first-page records, collects every leaf string/number/
boolean path, scores each by:
hitRate × log1p(avgLength) × signalPenalty × typePenalty × shortPenalty
where:
- hitRate is per-record (array wildcards can't inflate past 1.0)
- signalPenalty = (1 - noiseFraction)² — penalizes UUIDs / ISO
timestamps / URLs / numeric strings (lat/long style) / known
noise enum markers ("system", "text", "personal-name", ...)
- typePenalty = 0.05 boolean, 0.1 number, 0.5 mixed, 1.0 string
- shortPenalty = 0 for ≤2-char leaves (flags / codes), 1 else
Output: ranked list with {path, score, hitRate, avgLength,
noiseFraction, sampleValue} + a paste-ready `configPatch` the agent
drops straight into `sync init --config`.
Verified on real Attio companies:
Top: values.description[].value (189 chars, 100%, "SimplyWise: Organize...")
values.domains[].domain (13 chars, 100%, "simplywise.com")
values.name[].value (11 chars, 100%, "SimplyWise")
values.categories[].option.title (100%, "Financial Services")
Correctly drops: UUIDs, timestamps, latitude/longitude strings,
is_archived boolean, actor_id enums.
--show-searchable text preview length dropped from 2727 chars (default
walker) to 223 chars on the same record once the suggested paths are
applied — every line signal, no UUIDs.
Docs refreshed (SKILL.md, guide sync) to describe the two-step
workflow: suggest → preview → run.
Tests: 82/82 (+7 new suggest-searchable cases covering long prose,
UUID/timestamp filter, boolean/number penalty, array hit-rate cap,
wildcard dot-path emission, empty-sample case, numeric-string filter).
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two P0 data-correctness bugs caught by Moe's Attio / Fathom test
pass. The sync reports success; the data is wrong. Same trust-shape
as the original [object Object] silent data loss.
1. `--full-refresh` + `--max-pages` = data-loss command.
Reconcile archives any row whose source key wasn't in
`seenSourceKeys` this run. With pagination truncated (max-pages
cap, empty page, etc.) the keys of unfetched pages never land in
the set, so reconcile marks them `deleted_upstream`. Observed:
`sync run fathom --max-pages 3 --full-refresh` archived 57 valid
meetings after pulling only 30.
Fixed: track `paginationComplete` — true only when the loop exits
via natural exhaustion (empty records page on page>0, or the
profile's paginator returned no nextParams). Reconcile-by-absence
now requires `fullRefresh && pagesProcessed > 0 &&
paginationComplete`. On truncated runs we emit
`reconcileSkipped: true` + stderr warning + `deletedStale: 0`.
No silent damage.
2. `mem_upsert_by_keys` didn't un-archive on resurrection.
When --full-refresh re-pulled a row whose memory record was
archived (from a prior buggy reconcile), the upsert updated
`data / keys / sources / searchable_text` but left
`status = 'archived'`. No self-healing path — 1924 Attio rows
stayed stuck at archived across repeated --full-refresh runs.
Fixed: the UPDATE branch of the upsert SQL now also sets
`status = 'active'` and `archived_reason = NULL`. Semantics:
upsert-by-keys always produces an active row. Verified on live
data — 200 stuck rows resurrected across a 3-page partial refresh.
3. Surface `statusCounts: {active, archived}` in every sync run.
Silent damage was invisible from the happy-path output.
`deletedStale` only reports this-run archives. Now sync result
includes post-run counts; human output highlights imbalance in
red when archived > active. Agents can watch the numbers heal
as upsert-by-keys resurrects previously archived rows.
Tests: 83/83. Added an integration test asserting upsertByKeys
flips archived→active with archived_reason cleared.
Verified on live data:
# before
attio/attioCompanies active: 100, archived: 1925
# after a 3-page --full-refresh --max-pages 3
archived: 1724 (200 resurrected), reconcileSkipped: true,
deletedStale: 0
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This was referenced Apr 24, 2026
#126 — `sync migrate --dry-run` always reported `inserted` on rows that a real run would `updated`. Cosmetic but misleading after the first live migrate. Now: the dry-run path probes `mem_records.keys && <candidate keys>` via the backend's raw-SQL escape hatch and reports `updated` when the keys already exist. Requires a backend with `rawSql: true` capability (both first-party plugins have it; third-party plugins fall back to always-inserted, matching pre-fix behaviour). #127 — killing a mid-sync process corrupts PGlite. Two parts: 1. Graceful close in signalCleanup. The existing SIGINT/SIGTERM handler updated sync_state + released the filesystem lock but never touched the memory backend. PGlite is WASM-backed Postgres — if it doesn't get a chance to checkpoint its WAL before the process exits, the next `open()` aborts inside ensureSchema with `Aborted()`. The handler now also calls `backend.close()` under a 2s wall-clock cap (so a stuck close can't block the exit). SIGKILL / kernel panic still bypass this; no prevention is possible for uncatchable signals. 2. Clearer doctor diagnostic. When `mem doctor` hits `Aborted()` on the schema-apply check, it now appends the recovery path (delete `~/.one/mem.pglite`) instead of leaving the user to guess. The `postmaster.pid` holding a placeholder `-42` is normal for WASM PGlite, NOT a corruption signal — the Aborted comes from unflushed WAL. An earlier draft of this commit added stale-pid cleanup based on the negative-PID hypothesis; reverted because the heuristic would wrongly delete valid lockfiles. Tests: 83/83. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This was referenced Apr 24, 2026
Closes #132. Migrate used `row[idField]` for id resolution — a flat property lookup, not a dotted path. Combined with the legacy SQLite layer JSON-stringifying nested columns on INSERT (see sync/db.ts: prepareValue), any profile with a dotted idField against a table whose id column holds a stringified object silently dropped every row. Attio companies: 2024/2024 skipped on an empty memory store, while attioPeople (scalar idField "id") migrated cleanly. Three changes: 1. `reviveStringifiedJson(row)` — rehydrates top-level JSON- stringified columns before id resolution. Matches the shape sync sees live. Conservative: only parses strings that start with `{` or `[` and only one level deep (legacy rows never nest further). 2. Id resolution now uses `getByDotPath(hydratedRow, idField)`, the same mechanism `sync run` uses. Hard-rejects nested objects the same way `sync test` / mem-writer do — better to skip visibly than stringify to `[object Object]` and collapse every row onto one key. 3. Report splits the `skipped` counter into `skippedUnresolvedId` (profile missing or idField doesn't resolve) and `skippedError` (upsert threw). Human output prints a warning when every row is unresolved, so a misconfigured profile can't hide. Per-row hint for the first 3 misses so the cause is obvious. Tests: +6, 89 total. reviveStringifiedJson + dotted-path resolution on the exact Attio companies shape. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Follow-up to #132. When a user re-migrates after changing `idField` (the documented fix for #132's silent-drop bug) the pre-fix cohort's rows have garbage sourceKeys AND no identity keys in keys[] — getByDotPath couldn't resolve `email_addresses[0].email_address` through the stringified JSON of the legacy .db either. The post-fix upsert builds clean sourceKey + identity keys, but with no overlap against the pre-fix cohort's keys[], upsertByKeys inserts a duplicate. Result: active count doubles, half the rows carry legible data and half carry stringified JSON blobs. Three changes, no new commands. 1. Identity-merge pre-pass per type. For each type with an `identityKey`, query existing active rows for `(id, keys, data->path->>...)` via backend.raw and build a `normalized-identity → {id, keys}` map. Cost is one SELECT per type (not per row); JSONB path projection avoids reading the full `data` column (the WASM-memory footgun we've hit before). When a legacy row's identity matches the map, its new keys array is folded with the existing row's keys so upsertByKeys overlaps and hits the update branch. `replace: true` ensures the clean hydrated payload wins over the old stringified shape. Path validation is strict (segments match /^\w+$/) because the path is inlined into SQL (column refs can't be parameterized). Bad input returns null and migrate falls back to plain key-overlap — no regression. 2. Split report counters. `mergedByIdentity` distinguishes healing merges from regular sourceKey updates. Totals object surfaces the same. 3. Doubling warning. `count(type, {status:'active'})` snapshotted before + after migrate per type. When post-migrate growth exceeds `inserted + 2`, print a stderr warning with an exact SQL probe to inspect: `jsonb_typeof(data->'id') GROUP BY t` distinguishes pre-fix rows (t='string') from post-fix rows (t='object') so the user can drop the ghost cohort. Tests: 98, including a live in-memory PGlite check that identity values round-trip through data->JSONB->map, archived rows are excluded, and the SQL-injection guard rejects unsafe paths. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Folds the standalone
@withone/mempackage into the One CLI as a pluggable memory subsystem, and makes it the primary target for sync. Ships 18 commits onfeat/unified-memory— every commit stands alone and every commit stays green.What's new
one mem add note '{...}'on a fresh install auto-initializes pglite with sensible defaults. Nomem initprerequisite.config.openaiApiKeyalongsideapiKey, same precedence asONE_SECRET(env >.onerc> project > global). Three equivalent setups:one init(prompt),one mem config set embedding.apiKey sk-..., orOPENAI_API_KEY=sk-....mem status/mem search/mem doctorresponse carries a structured_upgradeblock when semantic search is available but off — agent can relay it to the user.memory.searchable: profiles carry dot-paths that drive embedded + FTS text. Supports numeric indexes and[]wildcards (messages[].snippet). Preview viasync test --show-searchablebefore paying embedding cost.sync runwrites intomem_records. SQLite dual-write is kept only for profiles with an enrich phase (enrich still reads unenriched rows via SQL — follow-up slice rewrites that).one mem syncfull alias ofone sync: same handlers, same options — single source of truth, one command tree.mem_sync_statetable; legacy.one/sync/state/*.jsonfiles auto-migrated on first access.sync querysupports dotted--wherepaths (values.job_title[0].value like %Engineering%).sync sqlretired — raw SQL can't safely span PGlite / Postgres / third-party backends.p_replaceflag onmem_upsert_by_keys): synced rows replacedatawholesale so fields removed upstream actually disappear from memory.CONFIG_DIR/CONFIG_FILE/PROJECTS_DIRresolved lazily via getters so tests that set$HOMEinbefore()hooks can properly isolate.Deprecations (non-breaking)
--to-memoryflag onsync run— silent no-op (memory is now always written). Use--no-memoryto opt out.sync sql— errors with a pointer tomem search/mem list. No raw SQL surface in the memory subsystem.Migration
one mem migratereads legacy.one/sync/data/*.dbfiles into the unified store.--cleanupdeletes them after.Docs
skills/one/SKILL.md,one guide memory(new topic),one guide sync,README.md, andone --helpall updated.Test plan
.dbfile created, 100 records in 1.7smemory.searchablepathsone mem migrateagainst a real legacy.one/sync/data/directoryFollow-up slices (separate PRs)
enrichPhaseto query memory (data._enriched_at IS NULL) so SQLite can be dropped for enrich profiles too;better-sqlite3becomes truly optional.q:URL-encoding bug —q: "category:primary"succeeds viaactions executebut fails viasync run. Not scope for this branch.@withone/mem2.0.0 deprecation release pointing atone mem.🤖 Generated with Claude Code