-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdocker-compose.example.yml
More file actions
231 lines (227 loc) · 10.3 KB
/
docker-compose.example.yml
File metadata and controls
231 lines (227 loc) · 10.3 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# Kairix — Docker Compose example with bounded resources for shared hosts
#
# Companion to docker-compose.yml. This example uses the modern
# `deploy.resources` syntax (Compose Spec) instead of the legacy top-level
# `mem_limit` / `cpus` keys so the same file works under `docker compose up`
# AND `docker stack deploy`. Both syntaxes are equivalent at the cgroup level
# when Docker Compose drives the runtime; the `deploy.resources` form is
# preferred for new deployments because it nests reservations + limits in one
# block per service and matches Swarm/Kubernetes vocabulary.
#
# When to use this file
# ---------------------
# Use this example when kairix shares a host with other latency-sensitive
# services (chat agents, ingest pipelines, a graph database, etc.) — see
# docs/operations/SHARED-HOSTS.md for the full guidance. On a dedicated host
# you can keep docker-compose.yml's looser caps.
#
# How to use this file
# --------------------
# # Override the default compose file:
# docker compose -f docker-compose.example.yml up -d
#
# # Or layer it on top of docker-compose.yml so you only override the
# # services you want bounded (recommended for production):
# docker compose -f docker-compose.yml -f docker-compose.example.yml up -d
#
# Tuning notes
# ------------
# The numbers below are starting points sized for a 4 vCPU / 8 GiB shared
# VM running kairix + neo4j + one or two co-located agents. Watch
# `docker stats` for a week and tighten or loosen from there. Don't lower
# reservations below the steady-state RSS reported by `docker stats` — the
# scheduler will swap or OOM-kill instead of throttling.
#
# Failure modes you'll see if these caps are too tight:
# - kairix: 503s from /mcp during rerank-model load; OOMKilled on
# cold start with a small embedding cache.
# - kairix-worker: scan takes longer per cycle (acceptable) or gets
# OOMKilled mid-embed (raise mem limit).
# - neo4j: page-cache thrash, slow Cypher; raise mem limit before
# raising heap.
services:
kairix:
image: ghcr.io/three-cubes/kairix:latest
ports:
- "127.0.0.1:8080:8080" # localhost-only; kairix has no built-in auth.
volumes:
- ./documents:/data/documents
- kairix-data:/data/kairix
# Layered config: the image ships a complete canonical config at
# /opt/kairix/kairix.config.yaml (containing _schema_version, provider:,
# collections, retrieval defaults). Operators overlay sparse host-side
# overrides via the OVERLAY env-var below — the overlay only needs to
# declare the keys you're changing; required keys come from the image.
# NEVER bind-mount over /opt/kairix/kairix.config.yaml — that shadow-
# mount drops required keys silently (the v2026.5.17a9 incident).
#
# Declaring topology v2 (v2026.5.24a1+):
#
# The new connector / collection / scope / skill surface lives
# under a ``topology_v2:`` block — declare it inside your
# kairix.config.local.yaml overlay. See kairix.config.example.yaml
# at the repo root for the full operator-facing shape (2 connectors,
# 1 credential, 2 cc_pairs, 2 collections, 1 scope_profile, 1 skill).
#
# Minimal overlay shape (kairix.config.local.yaml):
# features:
# topology_v2_config: true # parse + apply the block
# topology_v2_obsidian: true # per-folder containers + walk
# connector_sharepoint: true # enable SharePoint connector
# topology_v2:
# connectors: [...]
# credentials: [...]
# cc_pairs: [...]
# collections: [...]
# scope_profiles: [...]
# skills: [...]
#
# For SharePoint: set the three M365 client-credentials secrets in
# the .env file referenced below (CONNECTOR_M365_TENANT_ID +
# CONNECTOR_M365_CLIENT_ID + CONNECTOR_M365_CLIENT_SECRET). The
# connector resolves them via kairix.secrets.get_secret with the
# logical names connector-m365-tenant-id / -client-id / -client-secret.
- ./kairix.config.local.yaml:/opt/kairix/kairix.config.local.yaml:ro
env_file:
- .env
environment:
- KAIRIX_NEO4J_URI=bolt://neo4j:7687
- KAIRIX_NEO4J_USER=neo4j
- KAIRIX_NEO4J_PASSWORD=${KAIRIX_NEO4J_PASSWORD}
# Point kairix at the host-side overlay file. The image-bundled base
# at /opt/kairix/kairix.config.yaml is the source of required keys;
# the overlay supplies operator-specific values (vault paths, agent
# registry, retrieval tuning, topology_v2 block). See
# docs/operations/runbooks/config-upgrade.md.
- KAIRIX_CONFIG_OVERLAY_PATH=/opt/kairix/kairix.config.local.yaml
depends_on:
neo4j:
condition: service_healthy
restart: unless-stopped
# Healthcheck uses `kairix onboard ready` — the narrow readiness probe
# that exits 0 only once kairix has finished warming. Distinct from
# `kairix onboard check` (configuration diagnostic — secrets, paths,
# neo4j, embed pipeline). `ready` answers the deploy-wait question:
# "will the next agent call succeed without a cold-start envelope?".
# Using it as the healthcheck means `docker compose up --wait` blocks
# until the first real request can actually return search results,
# not just until the process binds its port.
#
# ``start_period`` is generous so the initial warm (~7-10s on a hot
# node, ~30s on a fresh container) is treated as "starting", not
# "unhealthy".
healthcheck:
test: ["CMD", "kairix", "onboard", "ready"]
interval: 30s
timeout: 5s
retries: 3
start_period: 60s
# API / MCP service. Steady-state RSS is ~1.2 GiB after rerank-model
# load; cold-start spikes briefly higher. Reservation guarantees the
# scheduler hands kairix enough memory to come up cleanly. Limit caps
# how much it can grow before Docker OOM-kills the container.
#
# Bump cpu.limit if /mcp shows queueing under concurrent tool calls
# (async-tool refactor in v2026.5.10.5 — see #177 — gives concurrent
# tool calls real parallelism). Bump memory.limit if you see OOMKilled
# events in `docker events` during rerank load.
deploy:
resources:
limits:
cpus: "1.0"
memory: 1Gi
reservations:
cpus: "0.5"
memory: 512Mi
# Bound /tmp on tmpfs so connector fetches (SharePoint binary
# downloads, markitdown PDF / PPTX conversions) cannot spill onto
# the host root filesystem. Without this, a backfill can fill the
# OS disk and OOM-kill co-located services (the v2026.5.24 incident).
tmpfs:
- /tmp:size=2G,mode=1777
kairix-worker:
image: ghcr.io/three-cubes/kairix:latest
command: ["worker"]
volumes:
- ./documents:/data/documents
- kairix-data:/data/kairix
# Layered config: the image ships a complete canonical config at
# /opt/kairix/kairix.config.yaml (containing _schema_version, provider:,
# collections, retrieval defaults). Operators overlay sparse host-side
# overrides via the OVERLAY env-var below — the overlay only needs to
# declare the keys you're changing; required keys come from the image.
# NEVER bind-mount over /opt/kairix/kairix.config.yaml — that shadow-
# mount drops required keys silently (the v2026.5.17a9 incident).
- ./kairix.config.local.yaml:/opt/kairix/kairix.config.local.yaml:ro
env_file:
- .env
environment:
- KAIRIX_NEO4J_URI=bolt://neo4j:7687
- KAIRIX_NEO4J_USER=neo4j
- KAIRIX_NEO4J_PASSWORD=${KAIRIX_NEO4J_PASSWORD}
- KAIRIX_CONFIG_OVERLAY_PATH=/opt/kairix/kairix.config.local.yaml
depends_on:
neo4j:
condition: service_healthy
# The worker is the noisy neighbour. Idle worker should sit at ~0 CPU
# after #224 phase 1 (idle backoff). When it scans / embeds it bursts.
# Reservation is low so the scheduler doesn't strand CPU during idle.
# Limit is high so legitimate embed bursts don't take an hour. If you
# see request timeouts on co-located services during a worker burst,
# tighten cpus.limit toward 0.5 — embed cycles will get longer but
# the host stops choking. If you see worker OOMKilled during a large
# backfill, raise memory.limit (large vector index loads dominate RSS).
#
# Switching to `restart: on-failure` (instead of `unless-stopped`)
# prevents a restart storm if the worker fails after start — see
# docs/operations/SHARED-HOSTS.md "Restart-storm anti-pattern".
restart: on-failure:5
deploy:
resources:
limits:
cpus: "1.0"
memory: 1Gi
reservations:
cpus: "0.25"
memory: 256Mi
# Same /tmp tmpfs rationale as the kairix service — worker is the
# heavier consumer (extractor temp files, batch downloads).
tmpfs:
- /tmp:size=2G,mode=1777
neo4j:
image: neo4j:5-community
# Pin for production: neo4j:5.24.0-community
environment:
NEO4J_AUTH: neo4j/${KAIRIX_NEO4J_PASSWORD}
NEO4J_PLUGINS: '[]'
# Heap and page-cache sizing must fit inside the container memory
# limit below. Rule of thumb: heap_max + pagecache <= memory.limit
# minus ~512Mi for the JVM and OS. Defaults are conservative; tune
# via NEO4J_server_memory_heap_max_size and
# NEO4J_server_memory_pagecache_size if your graph grows.
volumes:
- neo4j-data:/data
healthcheck:
test: ["CMD-SHELL", "cypher-shell -u neo4j -p ${KAIRIX_NEO4J_PASSWORD} 'RETURN 1' || exit 1"]
interval: 10s
timeout: 10s
retries: 5
start_period: 30s
restart: unless-stopped
# Neo4j on a shared host. The community edition's defaults are tuned
# for a dedicated machine and will gladly take everything you give it;
# explicit caps stop a runaway query from OOMing the host. Reservation
# guarantees enough headroom for the page cache to be useful even when
# other tenants are busy. If you raise heap_max, raise memory.limit
# in step — Neo4j will swap silently long before it errors.
deploy:
resources:
limits:
cpus: "2.0"
memory: 2Gi
reservations:
cpus: "0.5"
memory: 512Mi
volumes:
kairix-data:
neo4j-data: