Add ZeRO-3 leaf module support for Qwen MoE models #701

HayatoHongo · 2025-08-26T07:36:38Z

Summary

This PR adds explicit support for Qwen Mixture-of-Experts (MoE) models when running with DeepSpeed ZeRO-3.
When model.config.model_type == "qwen3_moe", the script sets the Qwen3MoeSparseMoeBlock as a ZeRO-3 leaf module.
This ensures that collective communication works correctly during training.

Changes

Added lazy import of deepspeed and Qwen3MoeSparseMoeBlock inside the Qwen MoE branch.
Applied deepspeed.utils.set_z3_leaf_modules() to register the MoE block.
Added logging to indicate whether the setup was successful or skipped.

Notes

This change does not affect non-Qwen models.
If deepspeed or the Qwen MoE block import fails, the script logs a warning and continues without modification.
No other functionality is changed.

Motivation

Without this fix, training Qwen MoE models under ZeRO-3 may hang or suffer from incorrect collective operations.
This patch enables stable fine-tuning of Qwen MoE models within the open-r1 training pipeline.

Limitations / Future Work

This commit only covers Qwen MoE models; other types of MoE models are not yet supported.
Incorrect collective operations may still occur for other MoE implementations.
Future work should extend ZeRO-3 leaf module registration to other MoE model types.

Reference

A similar fix has been applied in the OpenRLHF repository:
OpenRLHF/OpenRLHF@d5fcb42#diff-da77c0ae1d958e6b8c491f9d6f1f8ad54ee9ab21c231d4b2490fb1c09af1046f

[MoE] ZeRO-3 leaf module setup for Qwen MoE model completed.

f386e7f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ZeRO-3 leaf module support for Qwen MoE models #701

Add ZeRO-3 leaf module support for Qwen MoE models #701

Uh oh!

HayatoHongo commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add ZeRO-3 leaf module support for Qwen MoE models #701

Are you sure you want to change the base?

Add ZeRO-3 leaf module support for Qwen MoE models #701

Uh oh!

Conversation

HayatoHongo commented Aug 26, 2025

Summary

Changes

Notes

Motivation

Limitations / Future Work

Reference

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant