Skip to content

feat(k8s): scale core-api to 2 replicas with PDB and surge-only rollouts#208

Merged
doughknee merged 1 commit into
mainfrom
feat/core-api-two-replicas
Jun 10, 2026
Merged

feat(k8s): scale core-api to 2 replicas with PDB and surge-only rollouts#208
doughknee merged 1 commit into
mainfrom
feat/core-api-two-replicas

Conversation

@doughknee

Copy link
Copy Markdown
Owner

Summary

Action items 3 + 5 of ADR-0001 — the final step. Prereqs already merged: sse:ctl:resubscribe control channel (#206) and Redis-backed rate-limit counters (#207). With those deployed, no remaining core-api state assumes a single pod.

  • replicas: 2 with maxUnavailable: 0 / maxSurge: 1 — rollouts surge a new pod before killing an old one, so deploys stop being micro-outages; SSE clients ride through on their 3s retry
  • PodDisruptionBudget (minAvailable: 1) so node drains and cluster upgrades can't take the whole API down
  • cdc-runbook architecture diagram updated to show per-replica Redis pub/sub fan-out and the resubscribe control channel

Merge order matters: merge only after #206 and #207 have deployed to the cluster (both are in main; confirm the core-api rollout finished).

Test plan

  • After merge: kubectl -n scrollr get pods -l app=core-api shows 2 Ready pods
  • In-cluster verification from the ADR (action item 4): SSE stream against pod A, channel-config change served by pod B, confirm the stream picks up the new topic without reconnect
  • During the next deploy, confirm /health never 503s from the gateway (zero-drop rollout)

🤖 Generated with Claude Code

Implements action items 3 and 5 of docs/adr/0001-sse-multi-replica.md.
Prereqs landed first: sse:ctl:resubscribe control channel (#206) and
Redis-backed rate-limit counters (#207). With those in, no remaining
core-api state assumes a single pod.

- replicas: 2 with maxUnavailable: 0 / maxSurge: 1 so deploys never
  drop below capacity; SSE clients ride through on their 3s retry
- PodDisruptionBudget (minAvailable: 1) so node drains and cluster
  upgrades cannot take the whole API down
- cdc-runbook diagram now shows the per-replica Redis pub/sub fan-out
  and the resubscribe control channel

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@doughknee doughknee merged commit 9b659e1 into main Jun 10, 2026
@doughknee doughknee deleted the feat/core-api-two-replicas branch June 10, 2026 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant