Base URL: http://localhost:5002
Version: v5.0.1 current (on top of v5.0.0 GA shipped 2026-05-02)
Format: JSON
Full OpenAPI reference: once the server is running, the complete, live, always-accurate endpoint list is at
http://localhost:5002/docs(Swagger UI) andhttp://localhost:5002/redoc. This document describes the core surface and common examples;/docsis the source of truth.
The unified /v1/ namespace is the supported API surface.
- Authentication
- Health & Status
- Memory Operations
- Knowledge Graph
- Consultations (GRAEAE)
- Providers & Models
- OpenAI-Compatible Gateway
- Sessions
- MORPHEUS
- Webhooks
- OAuth / OIDC
- Federation
- Portability (MPF)
- Error Handling
- Examples
Two authentication surfaces coexist:
- Bearer API key — set
MNEMOS_API_KEYon the server and sendAuthorization: Bearer <key>on every request. Suitable for service-to-service and CLI access. - Browser session (cookie-based, see OAuth / OIDC) — for
users who sign in through Google / GitHub / Azure AD / generic OIDC.
get_current_userchecks Bearer first, then themnemos_sessioncookie.
Personal installs may run without auth; team and production installs should
always set MNEMOS_API_KEY and/or enable OAuth.
Liveness + readiness check (no auth required).
Response (200):
{
"status": "healthy",
"timestamp": "2026-05-02T14:30:00.000Z",
"database_connected": true,
"version": "5.0.1",
"profile": "edge",
"distillation_worker": "idle"
}Example:
curl -X GET http://localhost:5002/healthSystem statistics — memory counts by category and task type, compression stats, unreviewed compressions.
curl -H "Authorization: Bearer $MNEMOS_API_KEY" \
http://localhost:5002/statsAll memory routes are under /v1/memories.
Read behavior is symmetric across REST, gateway context, and MCP:
non-root callers can read a memory when they are the owner, when the
row is federated, when the Unix world-read bit is set, or when the
Unix group-read bit is set for one of their groups. The shared helper
is read_visibility_predicate (mnemos/core/visibility.py). Writes
remain owner+namespace scoped.
Create a memory. This writes the memory row and emits a transactional
memory.created webhook event. Compression has been operator-batched since v4.0:
root users enqueue memories through /admin/compression/enqueue or
/admin/compression/enqueue-all; the distillation worker then drains
memory_compression_queue.
Request Body:
{
"content": "Memory content (required)",
"category": "facts|identity|preferences|projects (required)",
"task_type": "reasoning",
"metadata": { "source": "import" }
}Retrieve a single memory. Non-root callers get 404 when the memory
does not pass the shared read predicate; the API does not reveal
cross-tenant existence.
Update content, category, subcategory, metadata, or verbatim content.
The update is owner+namespace scoped. If the version trigger detects a
missing, NULL, or cross-memory branch HEAD, the API returns 409 with
manual reconciliation guidance (handle_trigger_pgerror in
mnemos/core/visibility.py).
Delete a memory under the same owner+namespace write scope. The v3.5
trigger migration keeps DELETE snapshot handling live for deployments
that still attach trg_memory_version_delete; broken branch state maps
to 409.
Semantic + keyword search.
{ "query": "infrastructure", "limit": 5 }Search hits update recall_count and last_recalled_at in the background.
compression_applied and compression_metadata are reserved response fields
on MemoryListResponse; current search responses set
compression_applied=false. Use /v1/memories/rehydrate to receive compressed
variants when present, or /v1/memories/{id}/compression-manifests to inspect
contest output.
GET /v1/memories/{id}/compression-manifests— current winning variant plus every historical contest candidate, score, manifest, and reject reason.POST /v1/memories/rehydrate— token-budgeted context assembly. It prefers amemory_compressed_variantswinner when one exists and reportscompression_applied=trueonly when a variant was actually used.
GET /v1/memories/{id}/versions— version summaries on a branch, filtered per snapshot byversion_visibility_predicate(mnemos/core/visibility.py).GET /v1/memories/{id}/versions/{n}— one version on a branch, filtered by that snapshot's own owner/namespace/permission mode.GET /v1/memories/{id}/diff— diff between visible snapshots.POST /v1/memories/{id}/revert/{n}— revert to version n. Main-branch revert updates the live row under the trigger after a live-row/main-HEAD drift guard; feature-branch revert inserts a new DAG row and leavesmemoriestracking main.GET /v1/memories/{id}/log— commit history from branch HEAD. Recursive walks stay within one memory, andparent_hashis returned only when the actual immediate parent is visible.GET /v1/memories/{id}/commits/{commit}— one visible commit by hash.GET /v1/memories/{id}/branches— branch list. Non-root callers do not see branches whose head snapshot is invisible; corrupt heads are omitted and logged.POST /v1/memories/{id}/branch— create a branch from main HEAD or a specified commit. The handler locks the parent memory row withFOR SHAREand usesON CONFLICT DO NOTHING RETURNING; duplicate branch names return409.POST /v1/memories/{id}/merge— merge source branch into target branch. Latest-wins is implemented; manual strategy returns not-implemented. Merge commits copy content/provenance from source and tenancy from target, and branch writers serialize on_branch_advisory_lock_key(mnemos/api/routes/dag.py).
See ANTI_MEMORY_POISONING.md for the rationale and drift-detection
workflow.
Structured triples live beside free-text memories and use the same owner+namespace tenancy model.
POST /v1/kg/triples— create a subject/predicate/object tripleGET /v1/kg/triples— list triples with filtersGET /v1/kg/timeline/{subject}— subject timelinePATCH /v1/kg/triples/{triple_id}— update a tripleDELETE /v1/kg/triples/{triple_id}— delete a triple
KG tenancy columns were added in v3.1.x and namespace parity became the standard across memory and entity surfaces in v3.2-v3.5.
ConsultationRequest.mode is a literal string. The seven accepted modes are:
| Mode | Type | Meaning |
|---|---|---|
auto |
routing strategy | Use the engine default for the task type. |
local |
routing strategy | Prefer local/self-hosted muses. |
external |
routing strategy | Prefer external provider muses. |
all |
routing strategy | Fan out to every available muse. |
single |
reasoning shape | Select one highest-weighted muse for a fast or low-cost answer. |
debate |
reasoning shape | Run a two-round multi-muse debate and synthesize the result. |
majority |
reasoning shape | Query up to three muses and report whether quorum was reached. |
Unknown modes are rejected by request validation with HTTP 422.
Run a multi-LLM consensus consultation. Writes a tamper-evident audit row.
Request Body:
{
"prompt": "Design a microservices architecture.",
"task_type": "architecture_design",
"mode": "debate",
"context": "optional context to prepend",
"limit_chars": 12000,
"format": "full",
"models": ["openai/gpt-5.2-chat-latest"],
"providers": null,
"tier": null
}Schema summary:
mode: Literal["auto", "local", "external", "all", "single", "debate", "majority"] = "auto"Response includes: id, consensus_response, consensus_score,
winning_muse, cost, latency_ms, memory_refs (citations).
Retrieve a consultation by ID.
Retrieve citations and injected memory refs for a consultation (EMIR Article 57 audit support).
GET /v1/consultations/audit— list audit entriesGET /v1/consultations/audit/verify— verify SHA-256 hash chain integrity
Static /audit routes are mounted before dynamic /{consultation_id} so
path-param matching does not shadow them.
GET /v1/providers— unified catalog (health-tracked)GET /v1/providers/health— per-provider availability + circuit-breaker stateGET /v1/providers/recommend?task_type=...&budget=...— task-aware routing
On a fresh install with an empty model_registry table, /recommend falls
back to the static GRAEAE provider config so new deployments aren't 404.
Drop-in for OpenAI SDK consumers. Point OPENAI_BASE_URL at MNEMOS.
POST /v1/chat/completions— chat completions with memory injection, propagated generation controls, and OpenAI-format SSE whenstream=true.GET /v1/models— registry-backed model list only.GET /v1/models/{model_id}— registry lookup; unregistered IDs return 404.
Field support is intentionally pass-or-reject:
| Request field | OpenAI-style providers | Anthropic | Gemini | Other/text-only providers |
|---|---|---|---|---|
temperature, max_tokens, top_p |
Passed through (max_tokens maps to max_completion_tokens for GPT-5 models) |
Mapped to Messages API fields | Mapped to generationConfig |
Provider default unless adapter supports it |
stream |
Native SSE where available | Single-shot fallback wrapped as OpenAI SSE | Single-shot fallback wrapped as OpenAI SSE | Single-shot fallback wrapped as OpenAI SSE |
tools, tool_choice |
Passed for the OpenAI provider | Converted to Claude tool schema | 400 | 400 |
response_format |
Passed through | 400 | json_object maps to responseMimeType=application/json |
400 |
stop, n, penalties |
Passed through | stop maps to stop_sequences; unsupported penalties return 400 |
Native generationConfig mapping |
400 when not honored |
| content blocks / images | OpenAI vision-capable models | Claude vision | Gemini vision | 400 |
Memory injection is enabled by default. Disable it for one request with either
header X-Mnemos-Inject-Memory: false or the non-OpenAI extension body field
"mnemos_inject_memory": false. Malformed header values are treated as
default-on. When the header is supplied, non-streaming JSON responses include
mnemos_metadata.memory_injected.
Stateful multi-turn chat with memory injection at turn boundaries.
POST /v1/sessions— createPOST /v1/sessions/{id}/messages— post a turnGET /v1/sessions/{id}— retrieve transcriptGET /v1/sessions/{id}/history— full historyDELETE /v1/sessions/{id}— close
Dream-state generation is operator-triggered. v5.0.0 includes the REPLAY,
CLUSTER, SYNTHESISE, CONSOLIDATE, and EXTRACT phases. Generated memories are
tagged with morpheus_run_id; rollback deletes generated memories for that run.
GET /v1/morpheus/runs— list runsGET /v1/morpheus/runs/{run_id}— run detailGET /v1/morpheus/runs/{run_id}/clusters— cluster membership and synthesized memory IDsPOST /admin/morpheus/runs— trigger a synchronous run (root only)DELETE /admin/morpheus/runs/{run_id}— roll back a run (root only)
Outbound event delivery with HMAC-SHA256 signatures and a durable retry log (1m / 5m / 30m / 2h). Delivery log is replayed on server restart via the recovery worker.
POST /v1/webhooks— subscribeGET /v1/webhooks— listDELETE /v1/webhooks/{id}— revoke (soft-delete; delivery log retained)GET /v1/webhooks/{id}/deliveries— per-subscription delivery log
Events emitted: memory.created, memory.updated, memory.deleted,
consultation.completed.
Browser-based sign-in for Google, GitHub, Azure AD, or any generic OIDC provider (Keycloak, Authentik, Auth0, Okta).
GET /auth/oauth/{provider}/login— redirect to providerGET /auth/oauth/{provider}/callback— provider → MNEMOSPOST /auth/oauth/logout— invalidate sessionGET /auth/oauth/me— current-user profile- Admin:
GET /admin/oauth/providers,GET /admin/oauth/identities
Sessions are DB-backed (revocable, 30-day default TTL), with an hourly GC
worker. Session cookies are HttpOnly, SameSite=Lax, and Secure when
served over HTTPS.
OAuth state (PKCE verifier + CSRF nonce) lives in a separate short-lived
signed cookie (mnemos_oauth_state, 10-min TTL) distinct from the
application session cookie. Signing key via MNEMOS_SESSION_SECRET;
auto-generated on startup if unset.
Pull-based one-way sync between MNEMOS instances. Federated memories are
stored with IDs fed:{peer_name}:{remote_id}, owner_id='federation', and
are read-only by convention. Loop prevention via
federation_source IS NOT NULL exclusion.
GET/POST /v1/federation/peers— admin peer CRUDDELETE /v1/federation/peers/{id}— remove peerPOST /v1/federation/peers/{id}/sync— trigger syncGET /v1/federation/peers/{id}/log— per-peer sync logGET /v1/federation/status— aggregate statusGET /v1/federation/feed— outbound feed (requiresrole IN ('federation','root'))
Background sync runs every 60 seconds.
MNEMOS Portability Format (MPF) is the native import/export envelope.
GET /v1/export— export MPF v0.1.x envelope.include_sidecars=trueincludes KG triples, memory-version DAGs, and compression manifests where available.POST /v1/import— import an MPF envelope. Root pluspreserve_owner=trueis for authoritative restore/migration; non-root imports are scoped to the caller's owner+namespace.
CLI helpers: mnemos export, mnemos import, mnemos validate-mpf, and the
modules under mnemos/tools/.
Standard HTTP status codes. Error responses are JSON:
{ "detail": "Consultation not found" }Common codes:
400— request validation error401— missing or invalid auth403— role check failed404— entity not found409— conflict requiring operator action, including incompatible federation schema and v3.5MN001branch-state reconciliation413— request body exceedsMAX_BODY_BYTES(default 5 MB)429— rate-limited (whenRATE_LIMIT_ENABLED=true)503— database pool unavailable
BASE_URL="http://localhost:5002"
AUTH="Authorization: Bearer $MNEMOS_API_KEY"
# Health
curl -s $BASE_URL/health | jq .
# Create a memory
curl -s -X POST $BASE_URL/v1/memories \
-H "$AUTH" -H 'Content-Type: application/json' \
-d '{"content":"Example fact","category":"facts"}' | jq .
# Semantic search
curl -s -X POST $BASE_URL/v1/memories/search \
-H "$AUTH" -H 'Content-Type: application/json' \
-d '{"query":"project completion","limit":5}' | jq .
# Run a consultation
curl -s -X POST $BASE_URL/v1/consultations \
-H "$AUTH" -H 'Content-Type: application/json' \
-d '{"prompt":"Design a REST API for restaurant inspections","task_type":"architecture_design"}' | jq .
# Verify the consultation audit chain
curl -s -H "$AUTH" $BASE_URL/v1/consultations/audit/verify | jq .
# OpenAI-compatible chat completion
curl -s -X POST $BASE_URL/v1/chat/completions \
-H "$AUTH" -H 'Content-Type: application/json' \
-d '{"model":"auto","messages":[{"role":"user","content":"Hello"}]}' | jq .For the full, always-accurate list of routes and schemas, use the live
OpenAPI at http://localhost:5002/docs.