Skip to content

feat(sse): broadcast topic-resubscribe via Redis for multi-replica (ADR-0001)#206

Merged
doughknee merged 1 commit into
mainfrom
feat/sse-resubscribe-control
Jun 10, 2026
Merged

feat(sse): broadcast topic-resubscribe via Redis for multi-replica (ADR-0001)#206
doughknee merged 1 commit into
mainfrom
feat/sse-resubscribe-control

Conversation

@doughknee

Copy link
Copy Markdown
Owner

Summary

Action item 1 of ADR-0001 — fixes the one real multi-replica correctness bug before scaling core-api past one pod.

  • Channel CRUD used to call UpdateUserTopicSubscriptions directly, which no-ops unless the request-serving process holds the user's SSE connection (events.go). Correct at replicas: 1; at replicas: 2 a config change served by pod A leaves pod B's registry stale until the client reconnects
  • Channel CRUD now publishes the user's sub on a new sse:ctl:resubscribe Redis channel; every replica's hub listener handles it and refreshes locally. Replicas without that user's connection no-op exactly as before, so the broadcast is safe
  • Falls back to the direct local call if the publish fails — a Redis hiccup never makes single-replica behavior worse than before
  • Listener loop body extracted into handleTopicMessage so routing is testable without a live Redis subscription

Test plan

  • New tests: publish side (miniredis pub/sub round-trip), refresh-when-connected (integration-gated, runs in CI's postgres container), strict no-op-when-absent, and a regression guard on the pre-existing cdc:core:user:* dispatch path
  • go vet + full unit suite pass locally
  • backend-tests green on this PR (integration leg included)
  • After merge + deploy: single-replica smoke — change channel config while an SSE stream is open, confirm the stream picks up the new topic (same behavior as today, now routed through Redis)

🤖 Generated with Claude Code

…DR-0001)

Channel CRUD used to call UpdateUserTopicSubscriptions directly, which
no-ops unless the request-serving process holds the user''s SSE
connection — correct at replicas=1, silently stale at replicas>1 (the
connection-holding replica keeps serving old topics until reconnect).

Publish the user''s sub on a new sse:ctl:resubscribe channel instead.
Every replica''s hub listener receives it and refreshes locally;
replicas without a connection for that user no-op exactly as before.
Falls back to the direct local call if the publish fails, so a Redis
hiccup never makes a single replica worse than the old behavior.

Extract the listener loop body into handleTopicMessage so the routing
is testable without a live Redis subscription; cover publish, refresh-
when-connected (integration-gated), no-op-when-absent, and the
pre-existing cdc:core:user:* dispatch path.

Implements action item 1 of docs/adr/0001-sse-multi-replica.md.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@doughknee doughknee merged commit d827b13 into main Jun 10, 2026
8 checks passed
@doughknee doughknee deleted the feat/sse-resubscribe-control branch June 10, 2026 19:32
doughknee added a commit that referenced this pull request Jun 10, 2026
…uts (#208)

Implements action items 3 and 5 of docs/adr/0001-sse-multi-replica.md.
Prereqs landed first: sse:ctl:resubscribe control channel (#206) and
Redis-backed rate-limit counters (#207). With those in, no remaining
core-api state assumes a single pod.

- replicas: 2 with maxUnavailable: 0 / maxSurge: 1 so deploys never
  drop below capacity; SSE clients ride through on their 3s retry
- PodDisruptionBudget (minAvailable: 1) so node drains and cluster
  upgrades cannot take the whole API down
- cdc-runbook diagram now shows the per-replica Redis pub/sub fan-out
  and the resubscribe control channel

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant