Vibe coded with Agent: Gaho at the mission control of FAR.
Semantic memory architecture for OpenClaw.
Traditional AI memory treats MEMORY.md as a monolith — one file that has to contain everything, because retrieval is keyword/grep. The bigger it gets, the more you prune it. You're constantly fighting context limits.
This stack flips that model.
With local embeddings (nomic-embed-text, 768-dim, 8192-token context), every file in memory/ is searchable by meaning — not just keywords. That changes the architecture completely:
MEMORY.mdbecomes a slim index — just pointers, under 5KB- Detailed context lives in typed subdirectories (
daily/,projects/,contacts/,decisions/) - Files backlink to each other through semantic proximity — query "what GPU does the cluster use?" and the embedding layer surfaces
projects/local-compute.mdeven if it never says "GPU" explicitly - Daily logs are append-only archives, never curated — the embeddings do the retrieval work
- Project files are living docs, updated in place — one file per project, always current
The result: memory that scales. 500 files, millisecond search, no cloud, no per-query cost, nothing leaves the machine.
See ARCHITECTURE.md for full design rationale.
This is the part that makes it real: the entire embedding pipeline runs on your local machine, using Apple Silicon (or any machine running Ollama).
nomic-embed-text is a 274MB model — small enough to keep loaded in memory permanently with keep_alive: -1. At 274MB it leaves plenty of headroom alongside generation models. With keep_alive set to -1, it never unloads between heartbeats — no cold start when the agent goes to search.
What that means in practice:
- Zero API cost for memory search — no tokens, no billing, no rate limits
- Zero privacy exposure — your memory files, your hardware, your network
- Works offline — the agent's long-term memory functions completely air-gapped
- No contention with generation models — 274MB is trivial overhead on any modern GPU or unified memory setup
The LaunchAgent included in this repo warms the model on boot so it's ready before OpenClaw's first heartbeat. It's the difference between "semantic memory" and "semantic memory that actually works every morning."
This stack plugs into OpenClaw's built-in memory_search tool. OpenClaw's memory layer supports local Ollama as a provider — configure it to point at your local instance and the two-tier directory structure, and searches route to the local model automatically.
OpenClaw's openclaw.json config schema for this looks like:
"memorySearch": {
"enabled": true,
"provider": "ollama",
"remote": {
"baseUrl": "http://127.0.0.1:11434"
},
"model": "nomic-embed-text:latest"
}That's the live schema — verified against a running instance. The enabled flag is required; without it the memory tool falls back to a different provider.
Every memory_search call the agent makes — for recall, for context, for decisions — hits the local model. No cloud hop.
This stack pairs well with contextgraph by Rich DeVaul — an OpenClaw plugin for smarter context window management.
Standard LLM context is a flat sliding window: recent messages in, old messages out, unrelated topics blended together. ContextGraph replaces that with a DAG-based retrieval system. Every message gets tagged; context assembly pulls from two layers — a recency slice and a topic-matched slice — so the model sees relevant history even if it happened 500 messages ago.
Where the two projects complement each other:
- This stack: "What do I remember about X?" → searches long-term
memory/files by embedding - ContextGraph: "What did we talk about near X?" → retrieves relevant conversation history by topic tag
They operate at different layers and don't overlap. Running both gives the agent coherent long-term memory and smarter in-context recall.
Setup notes from our deployment are in the Optional: ContextGraph Plugin section below.
- Two-tier memory structure — daily logs in
memory/daily/, living docs inmemory/projects/,contacts/,decisions/ - Local semantic search —
nomic-embed-textvia Ollama. 768-dim, 8192-token context, Apache 2.0 - Boot resilience — LaunchAgent ensures the embedding model survives reboots
- Health checks —
memory_health.shvalidates structure + model on every heartbeat
Designed for macOS + Ollama. Scripts are portable (tested on macOS 15, Ubuntu 24.04).
- OpenClaw installed and running
- Ollama installed
- macOS (for LaunchAgents) or adapt
launchagents/for systemd on Linux
cd ~/.openclaw/workspace
git clone https://github.com/your-org/openclaw-memory-stack.gitOr copy the scripts/ directory into your existing skills/sysadmin/scripts/.
ollama pull nomic-embed-textVerify it works:
curl -s http://localhost:11434/api/embeddings \
-d '{"model":"nomic-embed-text","prompt":"test","keep_alive":-1}' \
| python3 -c "import json,sys; e=json.load(sys.stdin)['embedding']; print(f'ok — {len(e)} dims')"Expected: ok — 768 dims
openclaw config set memorySearch.provider ollama
openclaw config set memorySearch.model nomic-embed-text:latest
openclaw config set memorySearch.remote.baseUrl http://127.0.0.1:11434Or add to openclaw.json directly:
{
"memorySearch": {
"provider": "ollama",
"model": "nomic-embed-text:latest",
"remote": {
"baseUrl": "http://127.0.0.1:11434"
}
}
}Then restart:
openclaw gateway restartEdit launchagents/ai.openclaw.embeddings.plist — replace WORKSPACE_PATH with your actual workspace path.
Then install:
cp launchagents/ai.openclaw.embeddings.plist ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/ai.openclaw.embeddings.plistIf you have daily files at memory/YYYY-MM-DD.md (flat root), migrate them:
bash scripts/migrate_memory.sh --dry-run # preview
bash scripts/migrate_memory.sh # executebash scripts/memory_reindex.shOr wait for the next heartbeat — OpenClaw auto-indexes on access.
bash scripts/memory_health.shExpected output:
[OK] No unmigrated daily files at root
[OK] nomic-embed-text model available
[OK] MEMORY.md size: X.XKB (slim)
Status: HEALTHY
memory/
├── daily/ ← append-only daily logs (YYYY-MM-DD.md)
├── projects/ ← living docs per project (updated in place)
├── contacts/ ← people, preferences, relationships
└── decisions/ ← significant choices + reasoning (dated, immutable)
MEMORY.md ← slim index only — pointers to subdirs, <5KB
Key principle: MEMORY.md is an index, not a knowledge base. Detailed context lives in the subdirectories. The embedding layer makes it all findable.
Add this block to your HEARTBEAT.md to run the health check every heartbeat:
## Memory Health (always run)
```bash
bash /path/to/scripts/memory_health.shAlert if: exit code is 1 (unmigrated files, model unavailable, MEMORY.md >5KB).
---
## Scripts
| Script | Purpose |
|---|---|
| `scripts/ensure_embedding_model.sh` | Pull nomic-embed-text if missing, warm it |
| `scripts/memory_health.sh` | Validate memory structure + model health |
| `scripts/migrate_memory.sh` | One-shot migration of flat daily files → `daily/` |
| `scripts/memory_reindex.sh` | Trigger OpenClaw embedding reindex |
All scripts respect `WORKSPACE_DIR` and `MEMORY_DIR` environment variables. Default: `~/.openclaw/workspace`.
---
## Optional: ContextGraph Plugin
For graph-based context (conversation history as a semantic graph), see **[contextgraph](https://github.com/rdevaul/contextgraph)** by Rich DeVaul. It integrates as an OpenClaw `contextEngine` plugin alongside this embedding stack.
Setup notes from our deployment:
- Install: `openclaw plugins install --link` from the plugin dir
- Config: `plugins.slots.contextEngine = contextgraph`, `plugins.allow = ["contextgraph"]`
- Run: uvicorn on `:8300` (LaunchAgent template in `launchagents/ai.openclaw.contextgraph.plist.template`)
- Known issue: `deap` is an undeclared dependency — `pip install deap` before starting
- Known issue: `server.py __main__` defaults to `:8350`; run uvicorn explicitly with `--port 8300`
- Graph mode is OFF by default — enable per-session with `/graph on`
---
## Design Notes
### Why local embeddings?
- No per-query API cost
- Privacy: your memory files never leave the machine
- nomic-embed-text at 274MB is small enough to always keep loaded
- 8192-token context handles full files without chunking
### Why two tiers?
Daily files are append-only raw logs — they shouldn't be curated or edited. Project/contact/decision files are living docs that should be maintained. Mixing them creates friction.
### What about MEMORY.md?
With semantic search, MEMORY.md doesn't need to be comprehensive. It should be a slim index — pointers to where real context lives. The health check enforces a 5KB cap.
---
## Status
Built and deployed: 2026-03-16
Platform: macOS 15 (Mac Mini M4), OpenClaw 2026.3.13
Tested: memory_health.sh, migrate_memory.sh, reindex — all passing