Skip to content

Added retry + delay for TRYAGAIN error#62

Open
Candyman770 wants to merge 3 commits intojuspay:masterfrom
Candyman770:tryAgain-error-retry
Open

Added retry + delay for TRYAGAIN error#62
Candyman770 wants to merge 3 commits intojuspay:masterfrom
Candyman770:tryAgain-error-retry

Conversation

@Candyman770
Copy link
Copy Markdown

When a multi-key operation targets keys where some are still on the source node and others have been migrated to the destination node, a TRYAGAIN error is returned to prompt the client to retry after the migration is complete

previously we only retried the request once when we received TRYAGAIN error, but we have come across cases where we received ASK error after this retry.

New changed include handling of MOVED and ASK error even after retry.
Also added configurable delay before retry

(Error errString) | (B.isPrefixOf "TRYAGAIN" errString) ->
if retryCount > 0
then do
tryAgainDelayed <- IOR.readIORef tryAgainDelayIORef
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need tryAgainDelayIORef and I think retryCount should do

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid doing threadDelay again if multiple TRYAGAIN's are received in pipeline

nodeConns <- nodeConnections
shardNodeVar <- newMVar (shardMap, nodeConns)
nodeRequestTimeout <- round . (\x -> (x :: Time.NominalDiffTime) * 1000000) . realToFrac . fromMaybe (5 :: Double) . (requestTimeoutSeconds <|> ). (>>= readMaybe) <$> lookupEnv "REDIS_REQUEST_NODE_TIMEOUT"
tryAgainDelayTime <- round . (\x -> (x :: Time.NominalDiffTime) * 1000000) . realToFrac . fromMaybe (0.1 :: Double) . (tryAgainDelaySeconds <|> ). (>>= readMaybe) <$> lookupEnv "REDIS_TRYAGAIN_ERROR_DELAY"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove from env lookup as it is already part redis config

(Error errString) | (B.isPrefixOf "TRYAGAIN" errString) ->
if retryCount > 0
then do
whenJust tryAgainDelayTime $ \_tryAgainDelay -> do
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When tryAgainDelay is not set, should we put a very small delay by default, say of 50ms or something?

then do
whenJust tryAgainDelayTime $ \_tryAgainDelay -> do
tryAgainDelayed <- IOR.readIORef tryAgainDelayIORef
unless tryAgainDelayed $ do
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets discuss this, on what exactly this is solving. @neeraj97 @Candyman770

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked to Neeraj, this works.

@neeraj97
Copy link
Copy Markdown

neeraj97 commented Sep 16, 2025

@Candyman770 Kindly resolve the conflicts
@kalyanburri @aravindgopall Kindly review, approve and merge

-- TODO add for non cluster redis also
, tryAgainDelay :: Maybe Double
-- ^ retry delay for a redis command request when TRYAGAIN error is received during cluster slot migration
-- default value is 100 ms
Copy link
Copy Markdown

@neeraj97 neeraj97 Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove this comment -- default value is 100 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants