Added retry + delay for TRYAGAIN error#62
Conversation
src/Database/Redis/Cluster.hs
Outdated
| (Error errString) | (B.isPrefixOf "TRYAGAIN" errString) -> | ||
| if retryCount > 0 | ||
| then do | ||
| tryAgainDelayed <- IOR.readIORef tryAgainDelayIORef |
There was a problem hiding this comment.
Do we really need tryAgainDelayIORef and I think retryCount should do
There was a problem hiding this comment.
to avoid doing threadDelay again if multiple TRYAGAIN's are received in pipeline
src/Database/Redis/Cluster.hs
Outdated
| nodeConns <- nodeConnections | ||
| shardNodeVar <- newMVar (shardMap, nodeConns) | ||
| nodeRequestTimeout <- round . (\x -> (x :: Time.NominalDiffTime) * 1000000) . realToFrac . fromMaybe (5 :: Double) . (requestTimeoutSeconds <|> ). (>>= readMaybe) <$> lookupEnv "REDIS_REQUEST_NODE_TIMEOUT" | ||
| tryAgainDelayTime <- round . (\x -> (x :: Time.NominalDiffTime) * 1000000) . realToFrac . fromMaybe (0.1 :: Double) . (tryAgainDelaySeconds <|> ). (>>= readMaybe) <$> lookupEnv "REDIS_TRYAGAIN_ERROR_DELAY" |
There was a problem hiding this comment.
Lets remove from env lookup as it is already part redis config
| (Error errString) | (B.isPrefixOf "TRYAGAIN" errString) -> | ||
| if retryCount > 0 | ||
| then do | ||
| whenJust tryAgainDelayTime $ \_tryAgainDelay -> do |
There was a problem hiding this comment.
When tryAgainDelay is not set, should we put a very small delay by default, say of 50ms or something?
| then do | ||
| whenJust tryAgainDelayTime $ \_tryAgainDelay -> do | ||
| tryAgainDelayed <- IOR.readIORef tryAgainDelayIORef | ||
| unless tryAgainDelayed $ do |
There was a problem hiding this comment.
Lets discuss this, on what exactly this is solving. @neeraj97 @Candyman770
|
@Candyman770 Kindly resolve the conflicts |
| -- TODO add for non cluster redis also | ||
| , tryAgainDelay :: Maybe Double | ||
| -- ^ retry delay for a redis command request when TRYAGAIN error is received during cluster slot migration | ||
| -- default value is 100 ms |
There was a problem hiding this comment.
Lets remove this comment -- default value is 100 ms
When a multi-key operation targets keys where some are still on the source node and others have been migrated to the destination node, a TRYAGAIN error is returned to prompt the client to retry after the migration is complete
previously we only retried the request once when we received TRYAGAIN error, but we have come across cases where we received ASK error after this retry.
New changed include handling of MOVED and ASK error even after retry.
Also added configurable delay before retry