Initial pipeline parallelism support #1008

Alex-Vasile · 2025-02-25T19:25:52Z

Goal: Introduce pipeline parallelism without requiring a change to the weight irpa files or the forward passes for the different layers (see PPFFN.forward in the example file).

Changes

ShardedTensor now explicitly store what device each of their shards should live on in a .device attribute. Previously the implicit convection was that shard i lived on device i.
ShardedTensor can also be pinned to specific devices, such as for weights, or left unpinned to signal that it should be moved if needed using a .devices_pinned attribute.
Binary operators call a helper function to see if either tensor needs to be transferred such that all shards are on matching devices. E.g. ops.foo(t1 on devs [1,2,3], t2 pinned on devs[5,6,7]) would transfer the shards of t1 onto devices [5,6,7] before performing the operation.
Several helper functions in ops can take in a torch.Tensor and therefore won't know what devices to place them on, e.g. def replicate(input: AnyTensor, count: int) -> ShardedTensor:. I've added devices and devices_pinned as extra parameters and used defaults to keep the current behaviour unchanged.

Discussion points

Overall thoughts on approach?
Both device and devices_pinned are required parameters. Should either, especially devices_pinned, be optional and have defaults?
Exactly how should the different unary ops like ops.replicate handle the extra parameters needs more thought.

TODOs

Better names
Change transfer_if_needed into a decorator to automatically perform the transfers
Add support for all ops
Add tests based on sharded tests
Several helper functions in ops Change signature to accept adding current behavior as default

Alex-Vasile requested review from stellaraccident and rsuderman February 25, 2025 19:25

Alex-Vasile marked this pull request as draft February 25, 2025 19:26

Alex-Vasile force-pushed the pipeline branch from 5e54f75 to b549558 Compare February 25, 2025 20:59

Alex-Vasile added 12 commits February 25, 2025 13:03

Example

5145b9b

Changes to tensors and sharded_impls to get working

76991de

Use valid PP and TP combo

524d4c1

Formatting cleanup

93dad69

Cleanup from feedback

934d556

Changes to tensors to make tests pass

b421fd7

Changes to sharded_impls to have tests pass

4a198ea

Changes to tests to have them pass

db43786

Initial commit of tests

f9bd9fd

Cleanedup TODO

fdef39a

ops.replicate support

0a12c71

Remove ambiguity of calling ops.replicate with a ShardedTensor

b549558

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial pipeline parallelism support #1008

Initial pipeline parallelism support #1008

Alex-Vasile commented Feb 25, 2025 •

edited

Loading

Initial pipeline parallelism support #1008

Are you sure you want to change the base?

Initial pipeline parallelism support #1008

Conversation

Alex-Vasile commented Feb 25, 2025 • edited Loading

Alex-Vasile commented Feb 25, 2025 •

edited

Loading