[Refactor] Refactor the weight update logic #2914

vmoens · 2025-04-23T09:46:44Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2025-04-23T09:46:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2914

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 13 New Failures, 1 Cancelled Job, 1 Unrelated Failure

As of commit b9e7568 with merge base 0475cbf ():

NEW FAILURES - The following jobs have failed:

Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
Process completed with exit code 1.
Generate documentation / build-docs (3.9, 12.8) / linux-job (gh)
RuntimeError: Command docker exec -t 05b0c6e1c29278e94ac1f126dd317e42b2ac31f93da5262e48b431a8426bdfc8 /exec failed with exit code 2
Habitat Tests on Linux / tests (3.9, 12.8) / linux-job (gh)
RuntimeError: Command docker exec -t 8dc4100597ed955648580135546459e33691f2701edf27d736c955fbf80f0496 /exec failed with exit code 1
LLM Tests on Linux / unittests (3.9, 12.8) / linux-job (gh)
RuntimeError: Command docker exec -t 5b035d851773c5ab1b84ea4f46eaa749a55c56aa5d328e4a9abd8c6b7946198d /exec failed with exit code 4
SOTA Tests on Linux / tests (3.9, 12.8) / linux-job (gh)
RuntimeError: Command docker exec -t 418a9e69aef6900eee920cc281f6484ad420e093ce03a3e9e8733c75f6fa14dc /exec failed with exit code 1
Unit-tests on Linux / tests-cpu (3.10) / linux-job (gh)
test/test_transforms.py::TestKLRewardTransform::test_kl_lstm
Unit-tests on Linux / tests-cpu (3.11) / linux-job (gh)
test/test_transforms.py::TestKLRewardTransform::test_kl_lstm
Unit-tests on Linux / tests-cpu (3.12) / linux-job (gh)
test/test_transforms.py::TestKLRewardTransform::test_kl_lstm
Unit-tests on Linux / tests-cpu (3.9) / linux-job (gh)
test/test_transforms.py::TestKLRewardTransform::test_kl_lstm
Unit-tests on Linux / tests-gpu (3.11, 12.8) / linux-job (gh)
test/test_transforms.py::TestKLRewardTransform::test_kl_lstm
Unit-tests on Linux / tests-olddeps (3.8, 11.6) / linux-job (gh)
RuntimeError: Command docker exec -t 74107fdf09a1a3ada7a1f38a1881f3dae7c0cab4a9e76f878f6cb4fbdac53ea8 /exec failed with exit code 1
Unit-tests on Linux / tests-optdeps (3.11, 12.8) / linux-job (gh)
test/test_transforms.py::TestKLRewardTransform::test_kl_lstm
Unit-tests on Linux / tests-stable-gpu (3.10, 11.8) / linux-job (gh)
test/test_transforms.py::TestKLRewardTransform::test_kl_lstm

CANCELLED JOB - The following job was cancelled. Please retry:

Continuous Benchmark (PR) / CPU Pytest benchmark (gh)

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Unit-tests on Windows / unittests-cpu (3.10, windows.4xlarge, cpu) / windows-job (gh) (trunk failure)
test/test_transforms.py::TestTimer::test_transform_env

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: f685b500e05e61c421297bd9f0215167a4e5642f Pull Request resolved: #2914

[ghstack-poisoned]

ghstack-source-id: fe044d88e919be026afb2e1f8756ff986e9a65b0 Pull Request resolved: #2914

vmoens

I'm trying to rethink about sender and receiver one last time.

I think we always need a sender: in some way, you always need to push the weights somewhere (because vllm will never ask for weights, you push the weights to vllm).

In centralized settings, where you have a central collector orchestrating satellites ones, the responsibility of the central collector is to push weights to the workers (note that this is not the schema that we are using, which is decentralized).

The receiver on the other hand is accessory, it's more like the kind of settings where your worker can ask for weights by itself at a given interval or when some conditions are met.

The update_policy_weights_ function then looks like

def update_policy_weights_(self, *args, **kwargs):
    weights = self.receive(*args, **kwargs) #  this is a no-op if the weights (hanlde) are in the args
    self.send(weights)  # this should never be a no-op, as this is where the weight update actually occurs

@mikaylagawarecki @Darktex

vmoens · 2025-04-23T09:52:20Z

torchrl/collectors/weight_update.py

@@ -63,72 +66,80 @@ def register_collector(self, collector: DataCollectorBase):  # noqa

    @property
    def collector(self) -> torchrl.collectors.DataCollectorBase:  # noqa
-        return self._collector_wr() if self._collector_wr is not None else None
+        """The collector or container of the receiver.


I'm saying collector or container because we may want to use these classes with something else than a collector (eg have a sender in a parameter server)
cc @mikaylagawarecki

[ghstack-poisoned]

ghstack-source-id: a53a09e4ff0c8ddd1cde46009481f8a8e43afbd7 Pull Request resolved: #2914

mikaylagawarecki · 2025-04-24T04:49:48Z

docs/source/reference/collectors.rst

+
+.. figure:: /_static/img/param-update.svg
+
+   In this setting, a parameter server holds various copies of the parameters. The "pulling" of the weights from the


why do you envision this to hold various copies rather than one?

mikaylagawarecki · 2025-04-24T04:51:03Z

docs/source/reference/collectors.rst

+.. figure:: /_static/img/param-update.svg
+
+   In this setting, a parameter server holds various copies of the parameters. The "pulling" of the weights from the
+    parameter server is handled by the main collector receiver. The main collector server sender instance sends the


main collector server

Is it accurate to think of this as the main thread in RayCollector?

mikaylagawarecki · 2025-04-24T05:05:02Z

docs/source/reference/collectors.rst

-  the local inference worker. It is particularly useful when the training and inference occur on the same machine but on
+- :class:`~torchrl.collectors.WeightUpdateSenderBase`: This component handles the distribution of policy weights to
+  the policy or to remote inference workers. Every collector -- server or worker -- should have a `WeightUpdateSenderBase`
+  instance to handle the "push" operation of the weights to the policy.


I think "push/pull" and "sender/receiver" are confusing 🫤 In particular, for me the Receiver == "Puller" part is tough to wrap my head around.

Pull architecture: the client sends the request, and the server responds accordingly
Push architecture: the server pushes data to clients as updates become available

The confusion for me is that I think of sender --> receiver as "sender actively pushes, receiver passively receives". Hence receiver == puller is not intuitive

Got it
In this context I'm starting to think that having 2 separate classes will always be confusing so perhaps we should just have one that can be customized at will.
In every case I've been dealing with so far it never occured that I could write senders and receivers that would compose freely, so that tells me that making a perfectly composable API may be an illusion.
I'm myself a bit confused about what should live within each of these classes to be honest...
I'll refactor this to have a single Updater class that gives a somewhat unopinionated implementation of the update functionality!

mikaylagawarecki · 2025-04-24T05:14:52Z

docs/source/reference/collectors.rst

@@ -118,27 +118,44 @@ try to limit the cases where a deepcopy will be executed. The following chart sh
   Policy copy decision tree in Collectors.

 Weight Synchronization in Distributed Environments
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+--------------------------------------------------


I read the diagram above as

CollectorServer: main thread of RayCollector

Collector Worker {i}, remote DataCollector

If this read is correct, in my mind, it might sometimes make sense to have the receiver on the collector worker rather than the collector server
e.g. If the number of remote workers is sufficiently high, the collector worker might not be colocated with the collector server, in that case it might not make sense to pass the weights "two hops" to get to the worker

Separate qn -- from the diagram it looks like the collector server chooses when to pull from the param server and then "forcefully pushes" to all the workers at once. Is this design intentional? (e.g. Is the purpose of this to batch up workers to different collector servers and update them in batches?)

Update

e0ec386

[ghstack-poisoned]

vmoens mentioned this pull request Apr 23, 2025

[Refactor] Remove LLM features for release #2912

Merged

vmoens added a commit that referenced this pull request Apr 23, 2025

[Refactor] Refactor the weight update logic

2a406fb

ghstack-source-id: f685b500e05e61c421297bd9f0215167a4e5642f Pull Request resolved: #2914

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 23, 2025

Update

9aaab89

[ghstack-poisoned]

vmoens added a commit that referenced this pull request Apr 23, 2025

[Refactor] Refactor the weight update logic

fbbeeeb

ghstack-source-id: fe044d88e919be026afb2e1f8756ff986e9a65b0 Pull Request resolved: #2914

vmoens commented Apr 23, 2025

View reviewed changes

Update

b9e7568

[ghstack-poisoned]

vmoens added a commit that referenced this pull request Apr 23, 2025

[Refactor] Refactor the weight update logic

d56ebac

ghstack-source-id: a53a09e4ff0c8ddd1cde46009481f8a8e43afbd7 Pull Request resolved: #2914

mikaylagawarecki reviewed Apr 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] Refactor the weight update logic #2914

[Refactor] Refactor the weight update logic #2914

vmoens commented Apr 23, 2025 •

edited

Loading

pytorch-bot bot commented Apr 23, 2025 •

edited

Loading

vmoens left a comment

vmoens Apr 23, 2025

mikaylagawarecki Apr 24, 2025

mikaylagawarecki Apr 24, 2025

mikaylagawarecki Apr 24, 2025 •

edited

Loading

vmoens Apr 24, 2025

mikaylagawarecki Apr 24, 2025 •

edited

Loading


		.. figure:: /_static/img/param-update.svg

		In this setting, a parameter server holds various copies of the parameters. The "pulling" of the weights from the

[Refactor] Refactor the weight update logic #2914

Are you sure you want to change the base?

[Refactor] Refactor the weight update logic #2914

Conversation

vmoens commented Apr 23, 2025 • edited Loading

pytorch-bot bot commented Apr 23, 2025 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/2914

❌ 13 New Failures, 1 Cancelled Job, 1 Unrelated Failure

vmoens left a comment

Choose a reason for hiding this comment

vmoens Apr 23, 2025

Choose a reason for hiding this comment

mikaylagawarecki Apr 24, 2025

Choose a reason for hiding this comment

mikaylagawarecki Apr 24, 2025

Choose a reason for hiding this comment

mikaylagawarecki Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

vmoens Apr 24, 2025

Choose a reason for hiding this comment

mikaylagawarecki Apr 24, 2025 • edited Loading

Choose a reason for hiding this comment

vmoens commented Apr 23, 2025 •

edited

Loading

pytorch-bot bot commented Apr 23, 2025 •

edited

Loading

mikaylagawarecki Apr 24, 2025 •

edited

Loading

mikaylagawarecki Apr 24, 2025 •

edited

Loading