Skip to content

Add Redis Sentinel (HA) support#149

Merged
MastaP merged 5 commits into
mainfrom
feat/redis-sentinel-support
May 4, 2026
Merged

Add Redis Sentinel (HA) support#149
MastaP merged 5 commits into
mainfrom
feat/redis-sentinel-support

Conversation

@MastaP
Copy link
Copy Markdown
Member

@MastaP MastaP commented May 4, 2026

Closes #148.

Summary

  • Adds Redis Sentinel support for the commitment queue while keeping the existing single-endpoint path byte-equivalent for current deployments.
  • RedisConfig gains SentinelAddrs, MasterName, SentinelPassword, SentinelUsername (env vars REDIS_SENTINEL_ADDRS, REDIS_MASTER_NAME, REDIS_SENTINEL_PASSWORD, REDIS_SENTINEL_USERNAME).
  • internal/storage/factory.go::createRedisCommitmentQueue branches on len(SentinelAddrs) > 0: Sentinel mode uses redislib.NewFailoverClient, otherwise the original redislib.NewClient path runs unchanged. Both constructors return *redis.Client, so no downstream signature changes. The factory now takes a context.Context so the initial Ping honors upstream timeouts.
  • Symmetric misconfiguration guards: setting REDIS_SENTINEL_ADDRS without REDIS_MASTER_NAME (and vice versa) returns a clear error at startup instead of silently falling back to single-endpoint mode.
  • Logs the active mode (sentinel/direct) at queue construction time so HA wiring is verifiable from startup logs.
  • Wires the existing REDIS_MIN_IDLE_CONNS env var through both client constructors (it was previously loaded into RedisConfig but never passed to the redis options).
  • README.md documents the new env vars; changes.txt has a dated entry. Two pre-existing stale defaults in the README's Storage Configuration table (REDIS_FLUSH_INTERVAL, REDIS_MAX_BATCH_SIZE) are corrected to match the code.

Why no ReplicaOnly / RouteByLatency / RouteRandomly knobs

go-redis exposes these FailoverOptions to send some or all commands to read-only replicas. Most queue commands (XADD/XACK/XReadGroup/XAutoClaim) are classified as writes and would hit master regardless, but XPendingExt is read-only — with any of these flags on it would route to a replica and observe replication-lagged consumer-group state, silently re-delivering already-acked entries. They are intentionally not exposed; there is no legitimate use case for any of them in this component.

Test plan

  • go build ./... clean
  • go vet ./... clean
  • go test -count=1 ./internal/config/... — sentinel env parsing, defaults, comma-separated slice helper
  • go test -count=1 ./internal/storage/... — factory validation guards (both directions) plus existing storage tests including the testcontainer-backed Redis stream tests
  • go test -count=1 -race ./internal/config/... ./internal/storage/... clean
  • go test -count=1 -run TestSentinelCommitmentQueueIntegration ./internal/storage/redis/ — new integration test brings up redis master + redis-sentinel via testcontainers and runs the full Store → Stream → MarkProcessed pipeline through Sentinel-discovered master resolution

Backwards compatibility

Existing deployments using REDIS_HOST/REDIS_PORT continue to work without any config change. UniversalClient was considered but rejected in favor of NewFailoverClient because it returns *redis.Client directly, keeping internal/storage/redis/commitment.go and other downstream consumers untouched.

Out of scope

  • Redis Cluster mode.
  • A sentinel-compose.yml example deployment.
  • An end-to-end failover test (kill master, observe queue recovery) — worth a follow-up issue.

MastaP added 2 commits May 4, 2026 17:57
Branches createRedisCommitmentQueue on REDIS_SENTINEL_ADDRS to use
NewFailoverClient (still *redis.Client) while keeping the single-endpoint
path unchanged. New env vars: REDIS_SENTINEL_ADDRS, REDIS_MASTER_NAME,
REDIS_SENTINEL_PASSWORD, REDIS_SENTINEL_USERNAME, REDIS_ROUTE_BY_LATENCY,
REDIS_ROUTE_RANDOMLY, REDIS_REPLICA_ONLY.
…#148)

Addresses review feedback on the previous commit:

- Symmetric validation: setting REDIS_MASTER_NAME without
  REDIS_SENTINEL_ADDRS would have silently fallen back to single-endpoint
  mode against localhost:6379, dropping the master name on the floor.
  This is a real footgun for operators who intended to enable Sentinel,
  so the factory now returns a clear error in that case (mirroring the
  existing check for sentinel addrs without master name).
- Operator visibility: log the active mode (sentinel/direct) and the
  endpoint(s) at queue creation time so HA wiring is verifiable from
  startup logs.
- Tests: add the factory-level unit test that closes the coverage gap
  on the sentinel-addrs-without-master-name validation, plus a symmetric
  test for the new master-name-without-sentinel-addrs validation. Both
  exercise the unexported helper directly without spinning up Redis or
  MongoDB.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Redis Sentinel (HA) support for the commitment queue, allowing for failover-capable Redis configurations. The changes include updates to the configuration structure, environment variable parsing, and the storage factory to support failover clients. A review comment identifies an opportunity to improve context propagation in the Redis client creation process to ensure connection tests respect upstream timeouts.

Comment thread internal/storage/factory.go Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Redis Sentinel (high availability) support for the Redis-backed commitment queue by extending Redis configuration, switching the storage factory to build a Sentinel failover client when configured, and documenting/testing the new behavior.

Changes:

  • Extend RedisConfig with Sentinel-specific fields and env var parsing (including a new comma-slice helper).
  • Update internal/storage/factory.go to construct either a direct Redis client or a Sentinel failover client, with symmetric startup validation errors for misconfiguration.
  • Add unit tests for config parsing and factory validation, and document the new env vars in README.md (plus a changes.txt entry).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
README.md Documents Redis Sentinel env vars and usage example; updates Redis section wording.
internal/storage/factory.go Adds Sentinel/direct branching and startup validation/logging for Redis commitment queue client construction.
internal/storage/factory_test.go Adds unit tests for the two Sentinel/direct misconfiguration guards.
internal/config/config.go Extends RedisConfig, wires new env vars, and adds getEnvStringSliceOrDefault.
internal/config/config_test.go Adds tests for Sentinel env parsing, defaults, and the new string-slice helper.
changes.txt Adds a changelog entry describing the Sentinel support work.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread README.md Outdated
Comment thread internal/storage/factory.go Outdated
MastaP added 2 commits May 4, 2026 19:19
- Plumb context.Context through createRedisCommitmentQueue from
  NewStorage so the initial Redis Ping respects upstream timeouts and
  cancellation instead of using a fresh context.Background() (Gemini
  review comment).
- Drop the ReplicaOnly knob entirely: the commitment queue is
  write-heavy (XADD, XACK, XReadGroup, XAutoClaim) and routing all
  commands to read-only replicas would break the queue with READONLY
  errors at runtime. Removing the knob is safer than documenting it
  as incompatible (Copilot review comment).
- Correct two pre-existing stale defaults in the README's Storage
  Configuration table: REDIS_FLUSH_INTERVAL is 100ms (not 50ms) and
  REDIS_MAX_BATCH_SIZE is 5000 (not 2000). Spotted by Copilot while
  reviewing the Sentinel section in the same table.
- Update changelog entry to reflect the final state.
Brings up redis master + redis-sentinel as separate containers on a
shared Docker network, then exercises the same FailoverClient
construction the production factory uses for REDIS_SENTINEL_ADDRS to
verify the full Store -> Stream -> MarkProcessed pipeline works
end-to-end through Sentinel-discovered master resolution.

The factory itself is bypassed because Sentinel resolves the master
alias to the container's internal IP at monitor time and returns that
to clients, which is not reachable from the host where the test runs.
A test-only Dialer redirects any port-6379 dial to the host-mapped
master address; sentinel dials (port 26379) pass through. Factory
validation logic is covered separately by factory_test.go.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/storage/redis/sentinel_integration_test.go
Comment thread changes.txt Outdated
Following review-driven simplification of the Sentinel options surface:

- MinIdleConns is now wired through to both NewClient and
  NewFailoverClient. The field was already on RedisConfig and loaded
  from REDIS_MIN_IDLE_CONNS, but neither client constructor was
  passing it. Cold pool reconnects on burst traffic are a real cost
  for a high-throughput service; closing the gap is a one-line fix
  per branch.
- RouteByLatency and RouteRandomly are removed from RedisConfig, env
  wiring, FailoverOptions, README and tests. Same shape of footgun as
  ReplicaOnly, narrower scope: most queue commands (XADD/XACK/
  XReadGroup/XAutoClaim) are writes and hit master regardless, but
  XPendingExt IS a read-only command. With these flags on it would
  route to a replica and observe replication-lagged consumer-group
  state, causing silent re-delivery of already-acked entries. Better
  to not expose them at all than to document them as incompatible.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/storage/redis/sentinel_integration_test.go
@MastaP MastaP merged commit d92f338 into main May 4, 2026
6 checks passed
@MastaP MastaP deleted the feat/redis-sentinel-support branch May 4, 2026 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Redis Sentinel (HA) support

2 participants