The following is mostly AI directions written by my agent thinking everyone will work the same way I did. They won't. So there's a lot of assumptions being made here. I'll come in with edit - not really, etc. to try and right the ship as I can.
A local "second brain" that passively captures, classifies, embeds, and stores context from AI interactions, then exposes that memory through a chat interface (Open WebUI) and MCP tools for Claude Code. It's also supposed to be discoverable and searchable by claude code and able to be linked to that automatically on startup. I mean you could just ask claude code to spin up a project and do that, but I'd like to have some instructions for the casuals, too. This is the real utility of this tool IMHO: to be able to have your agents search relevant data from your ever-evolving memory database.
Everything runs on your local workstation. No cloud services required (though you can optionally connect external APIs for premium chat models).
βββββββββββββββββββ ββββββββββββββββ ββββββββββββββββββββ
β Open WebUI ββββββΆβ Pipeline ββββββΆβ Capture API β
β localhost:3000 β β Filter β β localhost:8100 β
β (chat with AI) β β (auto-hook) β β β
βββββββββββββββββββ ββββββββββββββββ ββββββββββ¬ββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββ βββββββββββββββ ββββββββββββ
β Ollama β β PostgreSQL β β Ollama β
β Classifierβ β + pgvector β β Embedder β
β qwen3:8b β β port 5433 β βmxbai-emb β
βββββββββββββ ββββββββ¬ββββββββ ββββββββββββ
β
ββββββββββββββββββββββββββΌβββββββββββββββββ
βΌ βΌ βΌ
βββββββββββββ βββββββββββββββ ββββββββββββ
β MCP Serverβ β Brain Search β β Context β
β(Claude Code)β β (semantic) β βInjection β
β 4 tools β β β β(into chat)β
βββββββββββββ ββββββββββββββββ ββββββββββββ
Data flow: You chat β pipeline filter captures your message β classifier decides if it's worth storing β metadata extractor + embedder run concurrently β stored in PostgreSQL β searchable via MCP tools or auto-injected into future chat context.
- Python 3.11+
- Docker Desktop (or Docker Engine)
- Ollama β Install from ollama.com
git clone https://github.com/YOUR_USERNAME/open-brain.git
cd open-brain
cp .env.example .env
cp docker/.env.example docker/.envEdit .env and docker/.env to set your preferred models, database password, and optional API keys.
So you don't need these ones, gpt-oss:20b is kinda bad at this even if it's great at chat, IMHO, but really just use a good, fast one (like qwen3:8b and a small task one like mxbai-embed-large - all others are just if you want to chat with the WebUI window about what it's recorded about you
# Embedding model (1024 dimensions)
ollama pull mxbai-embed-large
# Classifier and extractor model
ollama pull qwen3:8bFirstly I think you need to make sure Ollama is up - so run that and that will make it available to consume. Pretty sure you could reconfigure this to work with other local LLM hosting tools, too, but this is the way I'm using it now - subject to change
cd docker
docker compose up -dOr just open docker from the search bar - it will do what it needs to do once configured This starts four services:
| Service | Container | Port | Purpose |
|---|---|---|---|
| PostgreSQL + pgvector | open_brain_db |
5433 | Vector storage and search |
| Open WebUI | open_brain_webui |
3000 | Chat interface |
| Pipelines | open_brain_pipelines |
9099 | Message capture filter |
| Capture API | open_brain_capture |
8100 | Classification, extraction, embedding |
Verify everything is healthy:
docker compose psMaybe do this before you startup the whole stack... thinking my agent messed this up, but whatever...
pip install -r requirements.txtGo to http://localhost:3000 and create your account. This is a local-only account, so you can put whatever you want in the login β Open WebUI runs entirely on your machine.
Every message you send in Open WebUI is automatically intercepted by the pipeline filter. You don't need to do anything special β just chat naturally. The system:
- Classifies the text β only substantive content is stored (decisions, observations, action items, references, project notes). Chit-chat, greetings, and system noise are filtered out.
- Extracts metadata β entry type, topics, people, projects, action items
- Generates embeddings β 1024-dim vectors via mxbai-embed-large for semantic search
- Detects duplicates β skips near-identical entries (cosine similarity > 0.98)
When you send a message, the filter also searches the brain for relevant memories and injects them into the conversation context. The LLM sees your past decisions and knowledge without you having to remind it.
| Stored | Filtered |
|---|---|
| "I decided to use Redis for caching because..." | "hello" |
| "The deployment is scheduled for Friday" | "thanks!" |
| "We chose Rust over Go for performance reasons" | "ok sounds good" |
| "TODO: update the API docs before release" | "what time is it?" |
Open WebUI supports external APIs alongside local Ollama models. To add an OpenAI-compatible API (NVIDIA, OpenAI, etc.):
-
Add your API key to
docker/.env:NVIDIA_API_KEY=nvapi-your-key-here
-
The docker-compose.yml already configures the NVIDIA endpoint. For other providers, edit the
OPENAI_API_BASE_URLSandOPENAI_API_KEYSenvironment variables (semicolon-separated for multiple). -
Restart Open WebUI:
cd docker && docker compose up -d open-webui
Premium models appear in the model dropdown alongside your local Ollama models.
Open Brain exposes four MCP tools that Claude Code can use from any project directory:
| Tool | Purpose |
|---|---|
capture_text_tool |
Store an entry into the brain |
search_brain_tool |
Semantic search over memories |
recent_entries |
Browse recent entries by time or type |
brain_stats |
Entry counts, top topics, storage overview |
Add the MCP server to ~/.claude.json (replace the path with where you cloned this repo):
{
"mcpServers": {
"open-brain": {
"command": "python",
"args": ["/path/to/open-brain/mcp_server/server.py"],
"scope": "user"
}
}
}Or add a .mcp.json to any project root for project-level access:
{
"mcpServers": {
"open-brain": {
"command": "python",
"args": ["/path/to/open-brain/mcp_server/server.py"]
}
}
}A status line indicator shows brain status at the bottom of every Claude Code session:
- π§ Brain online (green) β Docker + Ollama both running
- π§ Ollama not running (yellow) β containers up but no LLM
- π§ Docker not running (yellow) β Ollama up but no database
- π§ Brain offline (red) β both down
The emojis don't display in the terminal, FYI - not yet anyway, but LMK if you fix it somehow. The other parts do work, however
Configured in
~/.claude/settings.jsonand~/.claude/statusline.sh.
To store text from your own code:
from capture.pipeline import capture
result = await capture(
"We decided to use Rust for the CLI because...",
source_client="my_app"
)
# {"stored": True, "id": "uuid-here", "reason": "Stored as decision (confidence 0.95)"}All models are configured via environment variables in .env:
OLLAMA_EMBEDDING_MODEL=mxbai-embed-large
OLLAMA_CLASSIFIER_MODEL=qwen3:8b
OLLAMA_EXTRACTOR_MODEL=qwen3:8b| Model | Dimensions | Size | Notes |
|---|---|---|---|
mxbai-embed-large |
1024 | 669 MB | Best quality, recommended |
nomic-embed-text |
768 | 274 MB | Lighter alternative (requires schema change) |
| Model | Size | Notes |
|---|---|---|
qwen3:8b |
5.2 GB | Good balance of speed and accuracy |
llama3.1:8b |
4.7 GB | Solid alternative |
mistral:7b |
4.1 GB | Fastest of the three |
If you change the embedding model to one with a different dimension, you must update db/schema.sql (change vector(1024)) and recreate the database volume β this deletes all stored entries.
open-brain/
βββ docker/
β βββ docker-compose.yml # Full stack: DB, WebUI, Pipelines, Capture API
β βββ Dockerfile.capture # Capture API container image
β βββ .env # Docker-specific secrets (API keys)
βββ db/
β βββ schema.sql # PostgreSQL schema + pgvector indexes
βββ capture/
β βββ pipeline.py # Main capture pipeline
β βββ classifier.py # Relevance gate (qwen3:8b)
β βββ extractor.py # Metadata extraction
β βββ embedder.py # Embedding generation (mxbai-embed-large)
β βββ api.py # FastAPI HTTP wrapper (runs in container)
β βββ prompts.py # Ollama prompt templates
βββ mcp_server/
β βββ server.py # MCP stdio server (4 tools)
β βββ tools/
β βββ capture.py # capture_text_tool
β βββ search.py # search_brain_tool
β βββ recent.py # recent_entries
β βββ stats.py # brain_stats
βββ db_client/
β βββ client.py # Async PostgreSQL client (asyncpg + pgvector)
βββ pipelines/
β βββ open_brain_filter.py # Open WebUI pipeline filter (capture + retrieval)
βββ scripts/
β βββ seed_knowledge.py # Seed brain with foundational knowledge
β βββ seed_direct.py # Direct DB seeding (bypasses classifier)
βββ docs/
β βββ system_prompt.md # System prompt for Open WebUI
βββ tests/ # Full test suite
βββ .env # Local development config
βββ .env.example
βββ .mcp.json # Project-level MCP config
βββ config.py # Central configuration
βββ requirements.txt
- Start Docker Desktop β all containers auto-start
- Start Ollama β needed for classification, embedding, and local chat
- Chat at localhost:3000 β your messages are passively captured
- Use Claude Code anywhere β brain tools available via MCP, status line shows connection
| Problem | Fix |
|---|---|
| Brain tools fail with connection error | Start Docker Desktop and Ollama |
| Capture API returns embedding errors | Ollama isn't running β start it |
| Chat models don't appear in Open WebUI | Check Ollama is running; restart Open WebUI container |
| Pipeline filter not loading | Run docker compose restart pipelines |
| NVIDIA models not showing | Check NVIDIA_API_KEY in docker/.env; restart Open WebUI |