-
Notifications
You must be signed in to change notification settings - Fork 91
Description
We are looking for a way to solve the problem of lambdas being invoked multiple times for the same event. We have some long-running lambdas (processing uploaded files up to 10GB in size), and sometimes we get the Lambda triggered a 2nd (or 3rd) time while the first invocation is still running normally.
I thought we could use this library to acquire a lock in the lambda, but if the lease duration approaches the runtime limit of a lambda (currently 15 minutes), there will never be enough time after detecting a stale lock for the next client to do its work. For example:
- We have some files that take 12-15 minutes to process, so we want to set our maximum lease time to 15 minutes.
- If a lambda invocation crashed without releasing the lock, another lambda would wait the lease duration (15 minutes) to determine that the lock was stale. Thus, the second lambda will use its entire invocation time limit just waiting to acquire the lock.
Would there be any drawback to using the heartbeat period (plus some configurable buffer) to detect stale locks instead of the lease duration? This would require the heartbeatPeriod being added to the item stored. But having done that, the How we handle clock skew section could become:
...a call to acquireLock reads in the current lock, checks the RecordVersionNumber of the lock (which is a GUID) and starts a timer. If the lock still has the same GUID after the
lease duration timeheartbeat period (plus a buffer) has passed, the client will determine that the lock is stale and expire it.
With this approach, I could set my lease duration to 15min, and my heartbeat period to something like 5s (plus 5s buffer). Then I could detect a stale lock within ~10s rather than 15 minutes.