Skip to content

fix: avoid duplicate policy updates during scheduler cancellation#2004

Merged
samsja merged 2 commits into
mainfrom
sebastian/nccl-broadcast-deadlock-2026-03-07
Mar 12, 2026
Merged

fix: avoid duplicate policy updates during scheduler cancellation#2004
samsja merged 2 commits into
mainfrom
sebastian/nccl-broadcast-deadlock-2026-03-07

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented Mar 10, 2026

Summary

generate_batch() cancels the previous step's update_policy_task before starting the next step. If that cancellation lands after update_weights() has started but before ckpt_step is committed, the next maybe_update_policy() can replay the same checkpoint update.

That duplicates both the weight sync and the off-policy bookkeeping for a single checkpoint step.

This PR fixes the race by:

  • moving the checkpoint application into a shared in-flight task
  • reusing that task across callers instead of starting duplicate updates
  • shielding the in-flight task from outer-task cancellation
  • committing ckpt_step immediately after the weight update completes
  • explicitly cancelling the in-flight update task during shutdown so stop() does not hang

Testing

  • uv run pytest tests/unit/orchestrator/test_scheduler.py -q

Note

Medium Risk
Touches async concurrency/cancellation paths for policy checkpoint updates; mistakes could cause missed updates or scheduler stalls, though changes are localized and unit-tested.

Overview
Prevents duplicate checkpoint applications in Scheduler.maybe_update_policy() by introducing a shared, lock-guarded inflight_policy_update_task and awaiting it via asyncio.shield() so outer cancellation doesn’t restart the same weight sync/off-policy update.

Refactors policy update logic into _compute_next_ckpt_step() and _apply_policy_update(), commits ckpt_step immediately after update_weights() completes, and extends stop() to cancel any in-flight policy update task. Adds unit tests covering reuse of an in-flight update after cancellation and shutdown cancellation behavior.

Written by Cursor Bugbot for commit 1392836. This will update automatically on new commits. Configure here.

@samsja samsja changed the title WIP, maybe Fix duplicate policy updates during cancellation fix: avoid duplicate policy updates during scheduler cancellation Mar 10, 2026
@samsja samsja marked this pull request as ready for review March 10, 2026 02:18
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread src/prime_rl/orchestrator/scheduler.py
Comment thread src/prime_rl/orchestrator/scheduler.py
Comment thread src/prime_rl/orchestrator/scheduler.py
@samsja samsja merged commit 67de232 into main Mar 12, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants