You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In MoE model architectures, especially when the model size is quite large. We found the throughput is limited by communication (all-gather / reduce-scatter / all-to-all). Where all-gather and reduce-scatter mainly used in ZeRO3 or FSDP, all-to-all mainly used in expert parallelism. The communication is quite large, and finally becomes the bottleneck.
We found another FP8 library named torchao has FP8 all-gather communication enabled. But I cannot find the similar FP8 communication API provided in TE.
So, does TransformerEngine support FP8 communication suck like all-gather/reduce-scatter or all-to-all?
The text was updated successfully, but these errors were encountered:
yaox12
changed the title
Does TransformerEngine support FP8 communication suck like all-gather or all-to-all?
Does TransformerEngine support FP8 communication such like all-gather or all-to-all?
Mar 19, 2025
In MoE model architectures, especially when the model size is quite large. We found the throughput is limited by communication (all-gather / reduce-scatter / all-to-all). Where all-gather and reduce-scatter mainly used in ZeRO3 or FSDP, all-to-all mainly used in expert parallelism. The communication is quite large, and finally becomes the bottleneck.
We found another FP8 library named
torchao
has FP8 all-gather communication enabled. But I cannot find the similar FP8 communication API provided in TE.So, does TransformerEngine support FP8 communication suck like all-gather/reduce-scatter or all-to-all?
The text was updated successfully, but these errors were encountered: