Initial pipeline parallelism support #1008
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Goal: Introduce pipeline parallelism without requiring a change to the weight irpa files or the forward passes for the different layers (see PPFFN.forward in the example file).
Changes
.device
attribute. Previously the implicit convection was that shardi
lived on devicei
..devices_pinned
attribute.ops.foo(t1 on devs [1,2,3], t2 pinned on devs[5,6,7])
would transfer the shards of t1 onto devices [5,6,7] before performing the operation.ops
can take in atorch.Tensor
and therefore won't know what devices to place them on, e.g.def replicate(input: AnyTensor, count: int) -> ShardedTensor:
. I've addeddevices
anddevices_pinned
as extra parameters and used defaults to keep the current behaviour unchanged.Discussion points
device
anddevices_pinned
are required parameters. Should either, especiallydevices_pinned
, be optional and have defaults?ops.replicate
handle the extra parameters needs more thought.TODOs
transfer_if_needed
into a decorator to automatically perform the transfersops
Change signature to accept adding current behavior as default