feat(backend): ensure a task is only executed once regardless of how many backend instances are run#6711
feat(backend): ensure a task is only executed once regardless of how many backend instances are run#6711anna-parker wants to merge 24 commits into
Conversation
|
Claude finished @anna-parker's task in 2m 50s —— View job Code Review
The approach is clean and correct: a PostgreSQL A few things worth addressing before merge (see inline comments for details):
|
6e197bb to
3909681
Compare
|
@claude review |
|
Claude finished @anna-parker's task in 3m 46s —— View job Code Review
The core mechanism is solid: the PostgreSQL Bug: Missing
|
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b95ac625fd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…TaskLockService.kt Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com>
## Summary This PR replaces the repeated task-lock acquire/release boilerplate in scheduled backend tasks with a reusable `@TaskLock` annotation. The annotation keeps the lock name and schedule interval at the method callsite, while `TaskLockAspect` wraps the method execution with the existing `TaskLockService` acquire/release behavior. The affected scheduled tasks now keep their bodies focused on task logic: cleaning stale sequences, checking for newer preprocessing pipeline versions, and updating SeqSet CrossRef citations. The aspect resolves Spring property placeholders from the annotation, converts non-second intervals when needed, skips the task body if the lock is already held, and releases the lock in a `finally` block after execution. This also adds Spring Boot AOP support and locks the new AOP dependencies in `gradle.lockfile`. A focused unit test covers the aspect behavior for skipped execution, successful execution, exception handling, and minute-to-second interval conversion. ## Validation Ran with Docker-backed test infrastructure: ```bash ./gradlew test --tests org.loculus.backend.service.submission.CleanUpStaleSequencesInProcessingTaskTest --tests org.loculus.backend.service.submission.UseNewerProcessingPipelineVersionTaskTest --tests org.loculus.backend.service.scheduler.TaskLockAspectTest --console=plain ./gradlew ktlintFormat --console=plain ``` Both commands completed successfully. 🚀 Preview: Add `preview` label to enable
resolves #6704
Alternative to #6705 (shedLock seems quite easy to implement so one less dependency that we fully understand)
Background
Today every
@Scheduledtask runs independently on each backend replica, so the effective execution frequency scales with the replica count (e.g. with 3 replicas the cleanup task runs ~3× as often, and the CrossRef citation fetch hits the external service 3×).Overview
This PR adds a new table called
task_lockwith the columnstask_name,started_atandlocked_util. When a task attempts to start it will try to get a lock on this table for that task, this is only possible if there is no row fortask_nameorlocked_utilis less than NOW. When it gets a lock it populates the row withtask_name,started_at=NOWandlocked_until=NOW + maxLockTimeIf a task finishes it will delete the row if NOW is past
locked_untilor updatelock_untilwithstarted_at + minLockTimeminLockTimeandmaxLockTimeare set to be a factor of the desiredfrequencya task runs at where thefrequencyneeds to be set as higher than the anticipated run time of a task. The task can run up tomaxLockTimewithout fear of parallel executions,maxLockTimeis by default 5*frequency.minLockTimeis set to be 0.9 of the expectedfrequencyto prevent subsequent tasks being blocked from starting due to a previous lock being delayed in its removal due to clock skew etc.Note we need to use the DB time for the
task_lockand now comparisons.Changes in behavior
PR Checklist
🚀 Preview: https://shedlock-alternative.loculus.org