Skip to content

fix(transport): forward connection-resilience kwargs to the PEL reclaimer#243

Merged
tsiaeggai merged 2 commits into
mainfrom
fix/reclaimer-connection-resilience
Jun 15, 2026
Merged

fix(transport): forward connection-resilience kwargs to the PEL reclaimer#243
tsiaeggai merged 2 commits into
mainfrom
fix/reclaimer-connection-resilience

Conversation

@tsiaeggai

@tsiaeggai tsiaeggai commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

The PEL reclaimer runs its own background Redis client, created with
aioredis.from_url(url, decode_responses=False) and nothing else — so it
ignored whatever connection settings the transport was given. Even when the
broker was configured with socket_timeout / socket_keepalive /
health_check_interval, the reclaimer's client had none of them. When the
underlying connection was silently dropped (cloud Redis failover, an
idle-connection reaper, a network blip), the reclaimer's blocking reads had no
socket timeout or keepalive to break them and hung indefinitely, with no path
to recovery short of a process restart.

This PR forwards the transport's connection-resilience kwargs to the reclaimer's
client so both connections recover the same way. The forwarded keys are a
whitelist of redis-py connection/resilience options valid for
aioredis.from_url; broker/FastStream-only kwargs (decoder, middlewares,
asyncapi_*, …) are intentionally excluded. decode_responses stays pinned to
False for binary passthrough and cannot be overridden by callers.

The change is opt-in and fully backward compatible: when no connection kwargs
are passed (or a pre-built broker= is supplied), the reclaimer behaves exactly
as before.

Closes #244

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement
  • CI/CD update
  • Dependency update

Related Issue

Fixes #244

Changes Made

  • RedisTransport.__init__ captures connection-resilience kwargs from **kwargs into self._connection_kwargs via a _RECLAIMER_CONNECTION_KEYS whitelist (socket_timeout, socket_connect_timeout, socket_keepalive, socket_keepalive_options, health_check_interval, retry_on_timeout, retry_on_error, max_connections, ssl_*). Copied, not popped — the broker still receives them too.
  • _setup_reclaimer passes connection_kwargs=self._connection_kwargs to PendingReclaimerManager.
  • PendingReclaimerManager.__init__ accepts connection_kwargs; start() applies them in from_url, with decode_responses=False pinned last so callers cannot break binary passthrough.
  • Added test_connection_kwargs_propagate_to_reclaimer (transport → manager propagation, broker-only kwargs excluded) and test_reclaimer_start_applies_connection_kwargs (kwargs reach from_url; decode_responses forced to False).
  • Updated sdk/CHANGELOG.md under [Unreleased] → Fixed.

Migration

None — fully backward compatible. No on-wire or API changes; the reclaimer only
gains the connection settings already passed to the transport.

Testing

  • Existing tests pass locally (make test)
  • Added new tests for new functionality

Local run against a Redis 7 container: 48 passed including the 2 new tests. Two
pre-existing failures (test_backoff_*) are unrelated to this change — a local
faststream-version mismatch (BinaryMessageFormatV1.encode not awaitable) that
also fails on the clean base commit.

Changelog

  • I have updated sdk/CHANGELOG.md under [Unreleased] section (required for code changes)

Checklist

  • My code follows the project's code style (make lint passes)
  • My code is properly formatted (make format applied)
  • I have added/updated tests that prove my fix/feature works
  • I have added/updated documentation as needed
  • All tests pass locally
  • I have reviewed my own code
  • My changes generate no new warnings
  • My commit messages follow the Conventional Commits standard

…imer

The reclaimer created its own Redis client with no socket timeout or
keepalive regardless of the settings passed to the transport, so a
silently dropped connection (cloud Redis failover, idle-connection
reaper) left its blocking reads hung indefinitely with no way to
recover.

Forward the broker's connection-resilience kwargs (socket_timeout,
socket_keepalive, health_check_interval, retry_on_timeout, ssl_*, …) to
PendingReclaimerManager so its independent client recovers the same way
the broker does. decode_responses stays pinned to False for binary
passthrough and cannot be overridden by callers.
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

QualOps Code Quality Analysis

Status: ⚠️ WARNINGS - Medium severity issues found

Summary

  • Total Issues: 4
  • Critical: 0 🔴
  • High: 0 🟠
  • Medium: 2 🟡
  • Low: 2 🟢
  • Files Analyzed: 4

🟡 Medium Issues (2)

  • sdk/eggai/transport/redis.py:162 - bug
    The _monitor_stream_groups background client is constructed with **{**self._connection_kwargs, 'decode_responses': True}, but self._connection_kwargs is populated from kwargs at construction time — if the user passed a broker= instance (line 154-155), kwargs is still forwarded to _connection_kwargs even though it was not used to build the broker, potentially injecting unexpected kwargs into aioredis.from_url.

  • sdk/eggai/transport/redis.py:35 - maintainability
    The _BACKGROUND_CLIENT_CONNECTION_KEYS tuple omits retry (the redis-py Retry object) and ssl (the boolean SSL flag), which are valid aioredis.from_url kwargs that users may pass to RedisTransport.

📊 Full Report

View detailed report


Powered by QualOps

@tsiaeggai tsiaeggai marked this pull request as draft June 10, 2026 05:59
…itor

The consumer-group monitor (_monitor_stream_groups) created its own
long-lived background Redis client with no socket timeout or keepalive,
so a silently dropped connection left its blocking reads hung
indefinitely — and that hang struck during the very failover the
monitor exists to recover from.

Forward the same resilience kwargs already harvested for the PEL
reclaimer to the monitor's client, pinning decode_responses=True for its
string commands. Rename _RECLAIMER_CONNECTION_KEYS to
_BACKGROUND_CLIENT_CONNECTION_KEYS since it now feeds both background
clients.
@tsiaeggai tsiaeggai marked this pull request as ready for review June 15, 2026 07:43
@tsiaeggai tsiaeggai merged commit 66d1668 into main Jun 15, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Redis transport: PEL reclaimer ignores connection-resilience settings and hangs on a silently dropped connection

2 participants