MCore-Bridge: 让 Megatron 训练像 Transformers 一样简单

为最先进的大模型提供 Megatron-Core 模型定义

📖 目录

用户群
新闻
安装
模型列表
快速开始
License

☎ 用户群

请扫描下面的二维码来加入我们的交流群：

微信群

📝 简介

mcore-bridge 是由魔搭社区推出的、基于 Megatron-Core 生态构建的大模型与多模态大模型定义库。目前已支持 300+ 纯文本模型与 200+ 多模态模型。其中大语言模型包括 Qwen3-Next、GLM-5.1、DeepSeek-V4、Minimax-2.7、Kimi-K2.5、GPT-OSS 等；多模态大模型包括 Qwen3.5、Qwen3-Omni、Gemma4、GLM4.6-V、InternVL3.5、Ovis2.5 等。

为什么选择 mcore-bridge？

模型类型：支持 300+ 纯文本大模型与 200+ 多模态大模型，热门模型 Day 0 支持。
硬件支持：支持 A10/A100/H100/B200、RTX 系列、以及国产硬件昇腾 NPU 等多种硬件平台。
训练方式：支持全参数训练与 LoRA 训练，兼容 PEFT 生态。
并行技术：支持 Megatron Core 提供的多种并行策略（张量并行、流水线并行、序列并行、上下文并行、专家并行、虚拟流水线并行）。
多模态能力：支持多模态 FP8 训练、MTP、序列 padding-free 及 packing 等特性。
任务类型：支持因果语言模型（Causal LM）、序列分类、Embedding 及 Reranker 等多种任务类型。
生态兼容：支持直接加载与保存 LoRA/全参数 safetensors 权重，兼容 Transformers、vLLM、SGLang 等主流推理框架。

相关文档：

🎉 新闻

🎉 2026.03.30: MCore-Bridge 正式发布！为最先进的大模型提供 Megatron-Core 模型定义，让 Megatron 训练像 Transformers 一样简单。

🛠️ 安装

使用pip进行安装：

pip install mcore-bridge -U

# 使用uv
pip install uv
uv pip install mcore-bridge -U --torch-backend=auto

从源代码安装：

# pip install git+https://github.com/modelscope/mcore-bridge.git

git clone https://github.com/modelscope/mcore-bridge.git
cd mcore-bridge
pip install -e .

# 使用uv
uv pip install -e . --torch-backend=auto

推荐运行环境：

	范围	推荐	备注
python	>=3.10	3.12
cuda		cuda12.8/13.0
torch	>=2.0	2.8.0/2.11.0
transformer-engine	>=2.3	2.14.1
apex		0.1	可选
megatron-core	>=0.16,<0.20	0.17.1
flash-attn		2.8.3/3.0.0b1	可选
transformers	>=4.33	4.57.6/5.8.1
modelscope	>=1.23
peft	>=0.11,<0.20		LoRA

✨ 模型列表

纯文本模型：

系列	model_type
Qwen	qwen2, qwen2_moe qwen3, qwen3_moe, qwen3_next
DeepSeek	deepseek_v3, deepseek_v32, deepseek_v4
GLM	glm4, glm4_moe, glm4_moe_lite glm_moe_dsa
MiniMax	minimax_m2
Kimi	kimi_k2
Bailing	bailing_moe, bailing_hybrid
InternLM	internlm3
Llama	llama
GPT-OSS	gpt_oss
Hunyuan	hy_v3
ERNIE	ernie4_5, ernie4_5_moe
MiMo	mimo
Dots	dots1
OLMoE	olmoe

多模态模型：

系列	model_type
Qwen	qwen2_vl, qwen2_5_vl, qwen2_5_omni qwen3_vl, qwen3_vl_moe, qwen3_omni_moe, qwen3_asr qwen3_5, qwen3_5_moe
Gemma	gemma4, gemma4_unified
GLM	glm4v, glm4v_moe
Kimi	kimi_vl, kimi_k25
OpenBMB	minicpmv4_6
InternVL	internvl_chat, internvl
Ovis	ovis2_5
Llama	llama4
Llava	llava-onevision

🚀 快速开始

如何使用MCore-Bridge进行训练可以参考ms-swift项目。这里介绍如何使用代码方式使用Mcore-Bridge。

你需要创建以下文件（test.py），然后运行CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 test.py。以下为使用Mcore-Bridge进行创建模型、权重加载、导出、保存的示例代码。

保存的模型，可以参考模型卡片的示例代码进行推理。

import os
import torch
import torch.distributed as dist
from megatron.core import mpu
from modelscope import snapshot_download
from transformers import AutoConfig, AutoProcessor
from mcore_bridge import ModelConfig, get_mcore_model, hf_to_mcore_config

is_rank0 = int(os.getenv('RANK')) == 0
torch.cuda.set_device(f"cuda:{os.getenv('LOCAL_RANK')}")
dist.init_process_group(backend='nccl')
TP, PP, EP, ETP = 2, 2, 2, 1
mpu.initialize_model_parallel(
    tensor_model_parallel_size=TP,
    pipeline_model_parallel_size=PP,
    expert_model_parallel_size=EP,
    expert_tensor_parallel_size=ETP,
)

model_dir = snapshot_download('Qwen/Qwen3.5-35B-A3B')
hf_config = AutoConfig.from_pretrained(model_dir, trust_remote_code=True)
processor = AutoProcessor.from_pretrained(model_dir, trust_remote_code=True)
config_kwargs = hf_to_mcore_config(hf_config)
config = ModelConfig(
    params_dtype=torch.bfloat16,
    tensor_model_parallel_size=TP,
    pipeline_model_parallel_size=PP,
    expert_model_parallel_size=EP,
    expert_tensor_parallel_size=ETP,
    sequence_parallel=True,
    mtp_num_layers=1,
    **config_kwargs)

# 创建模型
mg_models = get_mcore_model(config)

# 加载权重
bridge = config.bridge
bridge.load_weights(mg_models, model_dir)

# 导出权重
for name, parameter in bridge.export_weights(mg_models):
    pass

# 保存权重
output_dir = 'Qwen3.5-35B-A3B-HF'
bridge.save_weights(mg_models, output_dir)
if is_rank0:
    processor.save_pretrained(output_dir)
    hf_config.save_pretrained(output_dir)

使用Peft

Mcore-Bridge完全兼容使用Peft进行LoRA训练。以下介绍如何使用peft准备PeftModel，并保存增量权重。

你需要创建以下文件（test.py），然后运行CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 test.py。

import copy
import os
import torch
import torch.distributed as dist
from megatron.core import mpu
from modelscope import snapshot_download
from peft import LoraConfig, get_peft_model
from transformers import AutoConfig, AutoProcessor

from mcore_bridge import ModelConfig, get_mcore_model, hf_to_mcore_config, set_random_seed

is_rank0 = int(os.getenv('RANK')) == 0
torch.cuda.set_device(f"cuda:{os.getenv('LOCAL_RANK')}")
dist.init_process_group(backend='nccl')
TP, PP = 2, 2
mpu.initialize_model_parallel(
    tensor_model_parallel_size=TP,
    pipeline_model_parallel_size=PP,
)
# 为了正确随机初始化模型（全参数/LoRA），你需要设置随机种子
set_random_seed(42)

model_dir = snapshot_download('Qwen/Qwen3.5-4B')
hf_config = AutoConfig.from_pretrained(model_dir, trust_remote_code=True)
config_kwargs = hf_to_mcore_config(hf_config)
config = ModelConfig(
    params_dtype=torch.bfloat16,
    tensor_model_parallel_size=TP,
    pipeline_model_parallel_size=PP,
    sequence_parallel=True,
    **config_kwargs)

# 创建模型并加载权重
mg_models = get_mcore_model(config)
bridge = config.bridge
bridge.load_weights(mg_models, model_dir)

# 准备PeftModel并加载LoRA权重
# 多模态模型建议使用正则表达式指定target_modules
target_modules = r'^language_model.*\.(in_proj|out_proj|linear_fc1|linear_fc2|linear_qkv|linear_proj)$'
# 存储成safetensors时，需要存储hf对应的target_modules
hf_target_modules = r'^model.language_model.*\.(in_proj_qkv|in_proj_z|in_proj_b|in_proj_a|out_proj|gate_proj|up_proj|down_proj|q_proj|k_proj|v_proj|o_proj)$'
lora_config = LoraConfig(task_type='CAUSAL_LM', r=8, lora_alpha=32, lora_dropout=0.05, target_modules=target_modules)
peft_models = [get_peft_model(model, lora_config) for model in mg_models]
# 可选
# bridge.load_weights(peft_models, model_dir, peft_format=True)

# 导出LoRA权重
for name, parameter in bridge.export_weights(mg_models, peft_format=True):
    pass

# 保存LoRA权重
output_dir = 'Qwen3.5-4B-LoRA'
bridge.save_weights(mg_models, output_dir, peft_format=True)
if is_rank0:
    hf_lora_config = copy.copy(lora_config)
    hf_lora_config.target_modules = hf_target_modules
    hf_lora_config.save_pretrained(output_dir)

使用存储下来的LoRA权重：

from transformers import Qwen3_5ForConditionalGeneration
from modelscope import snapshot_download
from peft import PeftModel

model_dir = snapshot_download('Qwen/Qwen3.5-4B')
model = Qwen3_5ForConditionalGeneration.from_pretrained(model_dir)
peft_model = PeftModel.from_pretrained(model, 'Qwen3.5-4B-LoRA')

最简forward例子

Mcore-Bridge 可以与 ms-swift template 无缝结合，轻松实现模型训练。你也可以替换 ms-swift 的 template 模块，自定义数据处理流程。

以下通过一个最简示例，演示如何使用 Mcore-Bridge 创建的模型执行 forward 并计算损失，帮助用户将 Mcore-Bridge 快速接入其他项目。

创建如下文件（test.py），然后执行以下命令运行：CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 test.py。

import os
import torch
import torch.distributed as dist
from megatron.core import mpu
from modelscope import snapshot_download
from swift import get_processor, get_template
from swift.megatron.utils import get_packed_seq_params, get_padding_to
from swift.utils import to_device

from mcore_bridge import ModelConfig, get_mcore_model, hf_to_mcore_config, set_random_seed

data = {
    'messages': [{
        'role': 'user',
        'content': '<image>describe the image.'
    }, {
        'role':
        'assistant',
        'content':
        'The image depicts a close-up of a kitten with striking features. '
        'The kitten has a white and gray coat with distinct black stripes, '
        'particularly noticeable on its face and ears. Its eyes are large '
        'and expressive, with a captivating blue hue that stands out against '
        "the darker fur around them. The kitten's nose is small and pink, "
        'and it has long, delicate whiskers extending from either side of its mouth. '
        "The background is blurred, drawing attention to the kitten's face and "
        'making it the focal point of the image. The overall impression is '
        'one of cuteness and charm.'
    }],
    'images': ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
}


def forward_mg_model(mg_model, template):
    template.use_megatron = True
    template.set_mode('train')
    inputs = template.encode(data, return_length=True)
    mg_inputs = to_device(template.data_collator([inputs], padding_to=get_padding_to(mg_model.config)), 'cuda')
    text_position_ids = mg_inputs.pop('text_position_ids', None)
    if text_position_ids is None:
        text_position_ids = mg_inputs.get('position_ids')
    for key in ['num_samples', 'attention_mask_2d', 'loss_scale']:
        mg_inputs.pop(key, None)
    if template.padding_free:
        mg_inputs['packed_seq_params'] = get_packed_seq_params(text_position_ids)
    mg_inputs['labels'] = torch.roll(mg_inputs['labels'], -1, dims=-1)
    loss = mg_model(**mg_inputs)
    loss_mask = mg_inputs['labels'] != -100
    loss = loss * loss_mask
    return loss.sum() / loss_mask.sum()


torch.cuda.set_device(f"cuda:{os.getenv('LOCAL_RANK')}")
dist.init_process_group(backend='nccl')
TP, PP, EP, ETP = 2, 1, 2, 1
mpu.initialize_model_parallel(
    tensor_model_parallel_size=TP,
    pipeline_model_parallel_size=PP,
    expert_model_parallel_size=EP,
    expert_tensor_parallel_size=ETP,
)
set_random_seed(42)

model_dir = snapshot_download('Qwen/Qwen3.5-35B-A3B')
template = get_template(get_processor(model_dir), padding_free=True)
config_kwargs = hf_to_mcore_config(template.config)
config = ModelConfig(
    params_dtype=torch.bfloat16,
    tensor_model_parallel_size=TP,
    pipeline_model_parallel_size=PP,
    expert_model_parallel_size=EP,
    expert_tensor_parallel_size=ETP,
    sequence_parallel=True,
    mtp_num_layers=1,
    **config_kwargs)

mg_model = get_mcore_model(config)[0]
mg_model.cuda()
config.bridge.load_weights([mg_model], model_dir)
loss = forward_mg_model(mg_model, template)
print(f'loss: {loss}')  # loss: 0.8161308169364929

🏛 License

本框架使用Apache License (Version 2.0)进行许可。模型和数据集请查看原资源页面并遵守对应License。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MCore-Bridge: 让 Megatron 训练像 Transformers 一样简单

📖 目录

☎ 用户群

📝 简介

🎉 新闻

🛠️ 安装

✨ 模型列表

🚀 快速开始

使用Peft

最简forward例子

🏛 License

Uh oh!

FilesExpand file tree

README_zh.md

Latest commit

History

README_zh.md

File metadata and controls

MCore-Bridge: 让 Megatron 训练像 Transformers 一样简单

📖 目录

☎ 用户群

📝 简介

🎉 新闻

🛠️ 安装

✨ 模型列表

🚀 快速开始

使用Peft

最简forward例子

🏛 License