Skip to content

Commit dfe09bf

Browse files
committed
option 1 - use block_current to overlap compute/communication
1 parent dc7b2e0 commit dfe09bf

File tree

1 file changed

+1
-2
lines changed

1 file changed

+1
-2
lines changed

torchft/manager.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -385,8 +385,7 @@ def allreduce(self, tensor: torch.Tensor, should_quantize: bool = False) -> Work
385385
)
386386
else:
387387
work = self._pg.allreduce([tensor], ReduceOp.SUM)
388-
# TODO(tushar00jain): Set up the stream dependency correctly so it doesn't block cpu when using gloo
389-
work.wait()
388+
work.block_current_stream()
390389

391390
fut = work.get_future()
392391

0 commit comments

Comments
 (0)