-
Notifications
You must be signed in to change notification settings - Fork 123
Description
We’re using redis-om on top of redis-py with retries configured:
self._async_connection = get_redis_connection(
host=self.redis_configuration.host,
port=self.redis_configuration.port,
ssl_certfile=cert,
ssl_keyfile=key,
ssl=True,
ssl_check_hostname=False,
password=self.redis_configuration.password,
decode_responses=True,
socket_keepalive=True,
socket_connect_timeout=15,
socket_timeout=5,
retry=Retry(ExponentialBackoff(cap=10, base=1), 25),
retry_on_error=[ConnectionError, TimeoutError, ConnectionResetError],
health_check_interval=5,
)
do we also need to add?
retry_on_error=[ConnectionError, TimeoutError, ConnectionResetError, asyncio.TimeoutError]
we are seeing this intermittently. i think retries are not happening here as its not taking longer time to log this error.
If we wrap this connection block in a try/except:
try:
self._async_connection = get_redis_connection(...)
except Exception as e:
logger.error("Redis connection failed", e)
will the exception occur up immediately on the first failed attempt, or will it only raise after all 25 retries have been exhausted?
After enabling debug logs, i found this error its not retrying internally
this error is happening sometimes on
result_ping = await self. _async_connection.info("memory")
[asyncio] DEBUG: got a new connection from ('10.16.177.28', 36358)
[aiothttp.access] INFO: GET /health HTTP/1.1 200 326 - "kube-probe/1.25"
[BaseCacheHandler.py] INFO: Checking connection to Redis
[asyncio] DEBUG: Fatal read error on socket transport
Traceback (most recent call last):
File "/usr/local/lib/python3.11/asyncio/selector_events.py", line 988, in _read_ready__get_buffer
nbytes = self._sock.recv_into(buf)
ConnectionResetError: [Errno 104] Connection reset by peer
{"asctime":"2025-09-03T21:31:29.431867Z","LEVEL":"ERROR","name":".BaseCacheHandler",
"filename":"BaseCacheHandler.py","lineno":70,"message":"redis_check_failed",
"error":"Error while reading from sample-cache02.np.cache.cloud.net:443 (104, 'Connection reset by peer')",
"redis_configuration":"host=... port=443 retry_exponential.cap=10 retry_count=25 health_check_interval=5 ssl=True ..."}
[valkeysample2-debug] [min] WARNING: Readiness check failed: Redis is not healthy
this error is not of ConnectionResetError error type
Observed behavior
Connection succeeds sometimes.
Some requests fail with:
Timeout writing to socket
Expected behavior
With the above retry/backoff + timeouts, I expect the client to automatically handle transient network issues instead of failing with write socket timeouts.