Skip to content

Timeout writing to socket when using redis-py with configured timeouts and health check #714

@sandeepk97

Description

@sandeepk97

We’re using redis-om on top of redis-py with retries configured:

self._async_connection = get_redis_connection(
    host=self.redis_configuration.host,
    port=self.redis_configuration.port,
    ssl_certfile=cert,
    ssl_keyfile=key,
    ssl=True,
    ssl_check_hostname=False,
    password=self.redis_configuration.password,
    decode_responses=True,
    socket_keepalive=True,
    socket_connect_timeout=15,
    socket_timeout=5,
    retry=Retry(ExponentialBackoff(cap=10, base=1), 25),
    retry_on_error=[ConnectionError, TimeoutError, ConnectionResetError],
    health_check_interval=5,
)

do we also need to add?
retry_on_error=[ConnectionError, TimeoutError, ConnectionResetError, asyncio.TimeoutError]

we are seeing this intermittently. i think retries are not happening here as its not taking longer time to log this error.

If we wrap this connection block in a try/except:

try:
    self._async_connection = get_redis_connection(...)
except Exception as e:
    logger.error("Redis connection failed", e)

will the exception occur up immediately on the first failed attempt, or will it only raise after all 25 retries have been exhausted?

After enabling debug logs, i found this error its not retrying internally

this error is happening sometimes on

result_ping = await self. _async_connection.info("memory")
[asyncio] DEBUG: got a new connection from ('10.16.177.28', 36358)
[aiothttp.access] INFO: GET /health HTTP/1.1 200 326 - "kube-probe/1.25"
[BaseCacheHandler.py] INFO: Checking connection to Redis
[asyncio] DEBUG: Fatal read error on socket transport
Traceback (most recent call last):
File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 988, in _read_ready__get_buffer
nbytes = self._sock.recv_into(buf)
ConnectionResetError: [Errno 104] Connection reset by peer

{"asctime":"2025-09-03T21:31:29.431867Z","LEVEL":"ERROR","name":".BaseCacheHandler",
"filename":"BaseCacheHandler.py","lineno":70,"message":"redis_check_failed",
"error":"Error while reading from sample-cache02.np.cache.cloud.net:443 (104, 'Connection reset by peer')",
"redis_configuration":"host=... port=443 retry_exponential.cap=10 retry_count=25 health_check_interval=5 ssl=True ..."}
[valkeysample2-debug] [min] WARNING: Readiness check failed: Redis is not healthy

this error is not of ConnectionResetError error type

https://github.com/python/cpython/blob/c22cc8fccdd299fa923f04e253a3f7c59ce88bfe/Lib/asyncio/selector_events.py#L990

Observed behavior

Connection succeeds sometimes.

Some requests fail with:

Timeout writing to socket

Expected behavior
With the above retry/backoff + timeouts, I expect the client to automatically handle transient network issues instead of failing with write socket timeouts.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions