Skip to content

Conversation

HayatoHongo
Copy link

Summary

This PR adds explicit support for Qwen Mixture-of-Experts (MoE) models when running with DeepSpeed ZeRO-3.
When model.config.model_type == "qwen3_moe", the script sets the Qwen3MoeSparseMoeBlock as a ZeRO-3 leaf module.
This ensures that collective communication works correctly during training.

Changes

  • Added lazy import of deepspeed and Qwen3MoeSparseMoeBlock inside the Qwen MoE branch.
  • Applied deepspeed.utils.set_z3_leaf_modules() to register the MoE block.
  • Added logging to indicate whether the setup was successful or skipped.

Notes

  • This change does not affect non-Qwen models.
  • If deepspeed or the Qwen MoE block import fails, the script logs a warning and continues without modification.
  • No other functionality is changed.

Motivation

Without this fix, training Qwen MoE models under ZeRO-3 may hang or suffer from incorrect collective operations.
This patch enables stable fine-tuning of Qwen MoE models within the open-r1 training pipeline.

Limitations / Future Work

  • This commit only covers Qwen MoE models; other types of MoE models are not yet supported.
  • Incorrect collective operations may still occur for other MoE implementations.
  • Future work should extend ZeRO-3 leaf module registration to other MoE model types.

Reference

A similar fix has been applied in the OpenRLHF repository:
OpenRLHF/OpenRLHF@d5fcb42#diff-da77c0ae1d958e6b8c491f9d6f1f8ad54ee9ab21c231d4b2490fb1c09af1046f

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant