-
Notifications
You must be signed in to change notification settings - Fork 321
[mxfp8 moe training] Add mxfp8 to FSDP tests #2849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2849
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 7b0add0 with merge base df7bf37 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@liangel-02 we need to use this torchtitan API: set_token_group_alignment_size_m(16) # fp8 or set_token_group_alignment_size_m(32) # mxfp8 This is because TMA (some background here) requires the slowest moving dim (stride of 1) to be 128 bit (16 byte) aligned. In the backward pass, when grad_weight = grad_output_t @ input, the “M” dimension (flattened token groups) become this “stride 1” dim. Therefore, each token group must be 16 byte aligned for this grouped gemm.
This might all be a bit confusing without background knowledge on GPU architecture and MoE models, but don't worry! Plenty of time to learn. For now, just implement this setting in some clean/interpretable way in the test code and it should (hopefully) work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! 1 minor comment before landing, thanks
@@ -83,7 +95,8 @@ def moe_module_filter_fn(mod: nn.Module, cur_fqn: str) -> bool: | |||
return False | |||
|
|||
# quantize test model | |||
config = MoETrainingConfig() | |||
config = MoETrainingConfig(recipe) | |||
# config = MoETrainingConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove commented code before landing
addressing #2833
updating test to include mxfp8:
torchrun --nproc_per_node=${NUM_GPUS} -m pytest test_fsdp.py
, all tests pass