Skip to content

Detect stale locks by heartbeat period rather than lease duration #34

@mmindenhall

Description

@mmindenhall

We are looking for a way to solve the problem of lambdas being invoked multiple times for the same event. We have some long-running lambdas (processing uploaded files up to 10GB in size), and sometimes we get the Lambda triggered a 2nd (or 3rd) time while the first invocation is still running normally.

I thought we could use this library to acquire a lock in the lambda, but if the lease duration approaches the runtime limit of a lambda (currently 15 minutes), there will never be enough time after detecting a stale lock for the next client to do its work. For example:

  1. We have some files that take 12-15 minutes to process, so we want to set our maximum lease time to 15 minutes.
  2. If a lambda invocation crashed without releasing the lock, another lambda would wait the lease duration (15 minutes) to determine that the lock was stale. Thus, the second lambda will use its entire invocation time limit just waiting to acquire the lock.

Would there be any drawback to using the heartbeat period (plus some configurable buffer) to detect stale locks instead of the lease duration? This would require the heartbeatPeriod being added to the item stored. But having done that, the How we handle clock skew section could become:

...a call to acquireLock reads in the current lock, checks the RecordVersionNumber of the lock (which is a GUID) and starts a timer. If the lock still has the same GUID after the lease duration time heartbeat period (plus a buffer) has passed, the client will determine that the lock is stale and expire it.

With this approach, I could set my lease duration to 15min, and my heartbeat period to something like 5s (plus 5s buffer). Then I could detect a stale lock within ~10s rather than 15 minutes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions