Skip to content

Releases: modelscope/mcore-bridge

v1.5.2

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 28 Jun 15:31

新特性

  1. 新增 model_type 支持:GLM-5.2(glm_moe_dsa 新增 indexer_type = 'shared' 支持)、minicpmv4_6。
  2. moe_router_load_balancing_type 支持同时设置多个类型。
  3. 修复 DeepSeek-V4 FP8 相关问题。

New Features

  1. Added support for new model_type: GLM-5.2 (with indexer_type = 'shared' support added for glm_moe_dsa) and minicpmv4_6.
  2. moe_router_load_balancing_type now supports specifying multiple types simultaneously.
  3. Fixed issues related to DeepSeek-V4 FP8.

What's Changed

New Contributors

Full Changelog: v1.5.1...v1.5.2

Patch release v1.5.1

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 21 Jun 18:33

Full Changelog: v1.5.0...v1.5.1

v1.5.0

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 16 Jun 15:38

新特性

  1. generative_reranker 任务训练 lm_head 部分显存占用优化,只提取 positive / negative token 位置的 logits 而不是完整的 logits。
  2. 不再兼容 megatron-core 0.15。
  3. 修复若干 Bugs。

New Features

  1. Optimized GPU memory usage for the generative_reranker task during training of the lm_head component, by extracting logits only at positive/negative token positions instead of computing the full logits.
  2. Dropped compatibility with megatron-core 0.15.
  3. Fixed several bugs.

What's Changed

Full Changelog: v1.4.3...v1.5.0

v1.4.3

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 07 Jun 14:57

新特性

  1. 新增 model_type 支持:gemma4_unified;kimi_k25 新增多模态支持。
  2. 新增 language_model_only 参数,启用后仅创建语言模型部分,并只加载与保存语言模型相关权重。
  3. 修复若干 Bug。

New Features

  1. Added model_type support for gemma4_unified; added multimodal support for kimi_k25.
  2. Added language_model_only parameter, which when enabled, only creates the language model component and exclusively loads/saves language model weights.
  3. Fixed several bugs.

What's Changed

New Contributors

Full Changelog: v1.4.2...v1.4.3

v1.4.2

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 31 May 12:05

新特性

  1. 新增 model_type 支持:bailing_hybrid。
  2. 修复 olmoe/bailing_moe 在TP > 1时的损失异常。

New Features

  1. Add model_type support: bailing_hybrid.
  2. Fix abnormal loss for olmoe/bailing_moe when TP > 1.

What's Changed

Full Changelog: v1.4.1...v1.4.2

v1.4.1

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 27 May 15:23

中文版

新特性

  1. 新增 model_type 支持:gemma4、deepseek_v4。
  2. README 新增使用 Mcore-Bridge 创建模型并执行 forward、计算损失的最简示例。
  3. 兼容 megatron-core main 与 dev 分支。

English Version

New Features

  1. Added model_type support for: gemma4, deepseek_v4.
  2. Added a minimal example in README demonstrating how to create a model using Mcore-Bridge to perform forward pass and compute loss.
  3. Compatible with both megatron-core main and dev branches.

What's Changed

Full Changelog: v1.4.0...v1.4.1

v1.4.0

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 17 May 15:50

中文版

新特性

  1. 新增 model_type 支持:bailing_moeqwen3_asr
  2. 支持 Qwen3-Next 以 Mcore-GDN 方式运行(默认),从而支持序列 packing、FP8 及 CP。
  3. transformer_block / transformer_layer 进行重构,通过可继承的方式便于新模型的接入。
  4. 兼容 Python 3.13。
  5. 支持 transformers 中以 grouped 方式组织专家的 MoE 模型的 LoRA 权重存储与读取。(注意:该 LoRA 权重不支持通过 transformers 直接加载,但可通过 Megatron 加载以用于后续继续训练。)
  6. 新增 padding_mask 支持,修复了在 padding_free=False 时,moe_aux_loss 对 padding token 错误计算 routing loss 的问题。

English Version

New Features

  1. Added model_type support for bailing_moe and qwen3_asr.
  2. Support running Qwen3-Next with Mcore-GDN (default), enabling sequence packing, FP8, and CP.
  3. Refactored transformer_block / transformer_layer with an inheritable design to simplify the integration of new models.
  4. Added compatibility with Python 3.13.
  5. Support LoRA weight saving and loading for MoE models whose experts are organized in grouped mode in transformers. (Note: these LoRA weights cannot be loaded directly via transformers, but can be loaded via Megatron for continued training.)
  6. Added padding_mask support, fixing an issue where moe_aux_loss incorrectly computed routing loss on padding tokens when padding_free=False.

What's Changed

Full Changelog: v1.3.0...v1.4.0

Patch release v1.3.2

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 12 May 14:41

Full Changelog: v1.3.1...v1.3.2

Patch release v1.3.1

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 10 May 05:29

Full Changelog: v1.3.0...v1.3.1

v1.3.0

Choose a tag to compare

@Jintao-Huang Jintao-Huang released this 07 May 02:51

中文版

新特性

  1. 新增 model_type 支持:kimi_k25、hy_v3、llava_onevision。
  2. mlp_padding_free 兼容 Sequence Parallelism。
  3. 移除对 megatron-core 0.12 - 0.14 版本的依赖支持。

English Version

New Features

  1. Added model_type support: kimi_k25, hy_v3, llava_onevision.
  2. mlp_padding_free is now compatible with Sequence Parallelism.
  3. Removed dependency support for megatron-core versions 0.12 - 0.14.

What's Changed

New Contributors

Full Changelog: v1.2.0...v1.3.0