Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
75 commits
Select commit Hold shift + click to select a range
784c3ca
update
Jintao-Huang Oct 17, 2025
5294a1e
Merge branch 'main' into support_mcore_bridge
Jintao-Huang Oct 22, 2025
9545030
update
Jintao-Huang Oct 23, 2025
29a487d
update
Jintao-Huang Oct 23, 2025
879819e
Merge branch 'main' into support_mcore_bridge
Jintao-Huang Oct 23, 2025
4be6ae1
update
Jintao-Huang Oct 24, 2025
4e6e41f
update
Jintao-Huang Oct 24, 2025
bd14154
Merge branch 'main' into support_mcore_bridge
Jintao-Huang Oct 24, 2025
4900c15
update
Jintao-Huang Oct 24, 2025
302fbf4
updae
Jintao-Huang Oct 24, 2025
33a4ef0
update
Jintao-Huang Oct 24, 2025
8e8fdb1
update
Jintao-Huang Oct 24, 2025
385f5bc
update
Jintao-Huang Oct 26, 2025
4e2cee2
update
Jintao-Huang Oct 26, 2025
8bec008
update
Jintao-Huang Oct 27, 2025
d036165
update
Jintao-Huang Oct 28, 2025
5d7233d
update
Jintao-Huang Oct 28, 2025
7647fb9
Merge branch 'main' into support_mcore_bridge
Jintao-Huang Oct 28, 2025
b8c1746
update
Jintao-Huang Oct 28, 2025
43830cb
update
Jintao-Huang Oct 28, 2025
9a968b5
update
Jintao-Huang Oct 28, 2025
15e5d07
support pp tp
Jintao-Huang Oct 28, 2025
856c52e
update
Jintao-Huang Oct 28, 2025
8344a92
update
Jintao-Huang Oct 28, 2025
c294248
update
Jintao-Huang Oct 28, 2025
51b6411
update
Jintao-Huang Oct 29, 2025
20d9fb3
update
Jintao-Huang Oct 29, 2025
74b7456
update
Jintao-Huang Oct 29, 2025
296a16f
support vpp
Jintao-Huang Oct 29, 2025
aa15b97
update
Jintao-Huang Oct 29, 2025
333d42e
support lora
Jintao-Huang Oct 30, 2025
e26587d
support lora
Jintao-Huang Oct 30, 2025
793fbb2
Merge branch 'main' into support_mcore_bridge
Jintao-Huang Oct 30, 2025
26a46ea
fix lora
Jintao-Huang Oct 30, 2025
763d091
update
Jintao-Huang Oct 30, 2025
c456e2b
update
Jintao-Huang Oct 31, 2025
c2241a9
update
Jintao-Huang Nov 1, 2025
c5064d4
update
Jintao-Huang Nov 1, 2025
649e2fe
update
Jintao-Huang Nov 1, 2025
8c90464
update
Jintao-Huang Nov 2, 2025
ec6d5be
update
Jintao-Huang Nov 2, 2025
ea276a1
update
Jintao-Huang Nov 2, 2025
0295ec3
Merge branch 'main' into support_mcore_bridge
Jintao-Huang Nov 2, 2025
bc48b6c
update
Jintao-Huang Nov 2, 2025
8ab3d50
Merge remote-tracking branch 'refs/remotes/origin/support_mcore_bridg…
Jintao-Huang Nov 2, 2025
c882414
update
Jintao-Huang Nov 2, 2025
e761b03
update
Jintao-Huang Nov 2, 2025
209abf2
update
Jintao-Huang Nov 2, 2025
cda6c72
Merge remote-tracking branch 'refs/remotes/origin/support_mcore_bridg…
Jintao-Huang Nov 2, 2025
38be0f0
Merge branch 'main' into support_mcore_bridge
Jintao-Huang Nov 2, 2025
325f621
update
Jintao-Huang Nov 3, 2025
8e833a9
update
Jintao-Huang Nov 3, 2025
7c2139f
Merge remote-tracking branch 'refs/remotes/origin/support_mcore_bridg…
Jintao-Huang Nov 3, 2025
884711f
update
Jintao-Huang Nov 3, 2025
c86c695
Merge branch 'main' into support_mcore_bridge
Jintao-Huang Nov 3, 2025
927304e
fix
Jintao-Huang Nov 3, 2025
088fbef
Merge remote-tracking branch 'refs/remotes/origin/support_mcore_bridg…
Jintao-Huang Nov 3, 2025
b7ecd34
update
Jintao-Huang Nov 3, 2025
d48f8e7
update
Jintao-Huang Nov 3, 2025
f56b94d
update
Jintao-Huang Nov 3, 2025
bec6965
Merge remote-tracking branch 'refs/remotes/origin/support_mcore_bridg…
Jintao-Huang Nov 3, 2025
9a7a21d
update
Jintao-Huang Nov 3, 2025
50be186
fix
Jintao-Huang Nov 3, 2025
6d393a9
update
Jintao-Huang Nov 4, 2025
c8c0174
Merge remote-tracking branch 'refs/remotes/origin/support_mcore_bridg…
Jintao-Huang Nov 4, 2025
9cd0e92
update
Jintao-Huang Nov 4, 2025
d5362b6
update
Jintao-Huang Nov 4, 2025
6ac9d0f
update
Jintao-Huang Nov 4, 2025
cdd3417
fix
Jintao-Huang Nov 4, 2025
5d62d8e
update
Jintao-Huang Nov 4, 2025
991ed4a
fix
Jintao-Huang Nov 4, 2025
5d2ae4c
update
Jintao-Huang Nov 4, 2025
075b8df
update
Jintao-Huang Nov 4, 2025
0dc3621
update
Jintao-Huang Nov 4, 2025
1e1bda9
update
Jintao-Huang Nov 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ You can contact us and communicate with us by adding our group:


## 🎉 News
- 🎁 2025.11.04: Support for [Mcore-Bridge](docs/source_en/Megatron-SWIFT/Mcore-Bridge.md), making Megatron training as simple and easy to use as transformers.
- 🎁 2025.10.28: Ray [here](docs/source_en/Instruction/Ray.md).
- 🎁 2025.10.28: Support [use yaml](examples/yaml) to configure command line parameters.
- 🎁 2025.09.29: Support padding_free for embedding/reranker/seq_cls tasks, use `--padding_free true --task_type embedding/reranker/generative_reranker/seq_cls` to begin!
Expand Down
1 change: 1 addition & 0 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@
- **模型量化**:支持AWQ、GPTQ、FP8和BNB的量化导出,导出的模型支持使用vLLM/SGLang/LmDeploy推理加速,并支持继续训练。

## 🎉 新闻
- 🎁 2025.11.04: 支持[Mcore-Bridge](docs/source/Megatron-SWIFT/Mcore-Bridge.md),使Megatron训练像transformers一样简单易用。
- 🎁 2025.10.28: Ray [已支持](docs/source/Instruction/ray的支持.md)
- 🎁 2025.10.28: 已支持[使用yaml](examples/yaml)配置命令行参数。
- 🎁 2025.09.29: 支持embedding/reranker/seq_cls任务的padding_free参数, 使用`--padding_free true --task_type embedding/reranker/generative_reranker/seq_cls`开始训练!
Expand Down
2 changes: 2 additions & 0 deletions docs/source/Instruction/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -701,6 +701,7 @@ App参数继承于[部署参数](#部署参数), [Web-UI参数](#Web-UI参数)
- mcore_adapters: mcore格式模型的adapter路径列表,默认为空列表。
- thread_count: `--to_mcore true`时的模型切片数。默认为None,根据模型大小自动设置,使得最大分片小于10GB。
- 🔥test_convert_precision: 测试HF和Megatron格式权重转换的精度误差。默认为False。
- test_convert_dtype: 转换精度测试使用的dtype,默认为'float32'。
- 🔥push_to_hub: 是否推送hub,默认为False。例子参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/export/push_to_hub.sh)。
- hub_model_id: 推送的model_id,默认为None。
- hub_private_repo: 是否是private repo,默认为False。
Expand Down Expand Up @@ -764,6 +765,7 @@ qwen2_5_omni除了包含qwen2_5_vl和qwen2_audio的模型特定参数外,还
- SPATIAL_MERGE_SIZE: 默认为2。
- IMAGE_MIN_TOKEN_NUM: 默认为`4`,代表一张图片最小图像tokens的个数。
- 🔥IMAGE_MAX_TOKEN_NUM: 默认为`16384`,代表一张图片最大图像tokens的个数。(用于避免OOM)
- 提示:等价最大图像像素为`IMAGE_MAX_TOKEN_NUM * 32 *32`。
- VIDEO_MIN_TOKEN_NUM: 默认为`128`,代表视频中一帧的最小视频tokens的个数。
- 🔥VIDEO_MAX_TOKEN_NUM: 默认为`768`,代表视频中一帧的最大视频tokens的个数。(用于避免OOM)
- MAX_RATIO: 默认为200。
Expand Down
275 changes: 275 additions & 0 deletions docs/source/Megatron-SWIFT/Mcore-Bridge.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
# Mcore Bridge

Megatron 以其卓越的训练速度和丰富的并行技术而著称,但也因此带来了较高的使用门槛。因此mcore-bridge 应运而生,旨在让 Megatron 训练像 transformers 一样简单易用。通过 Mcore-Bridge,用户可以:
1. 直接加载 safetensors 格式的模型权重,无缝使用 Megatron 进行高效训练。直接保存 训练权重为 safetensors 格式,无需额外转换。
2. 兼容 LoRA 增量权重的双向转换。
3. 兼容GRPO/GKD等算法的`Megatron->vLLM`权重同步。
4. 支持多机转换超大规模模型。

Mcore-Bridge 兼容 Dense/MoE/多模态等多种模型架构。训练完成后,转换后的模型可直接使用 transformers、vLLM、SGLang 等主流推理框架部署。

## 无缝训练
目前Mcore-Bridge已支持TP/PP/EP/ETP/VPP等并行技术,支持所有Megatron-SWIFT支持的模型架构,参考[支持的模型文档](../Instruction/支持的模型和数据集.md)。以下介绍Mcore-Bridge的无缝训练能力,分别介绍Dense模型和Moe模型。

### Dense模型
以下为多模态模型Qwen3-VL模型训练的例子:
```shell
# 2 * 76GiB
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
NPROC_PER_NODE=2 \
IMAGE_MAX_TOKEN_NUM=1024 \
VIDEO_MAX_TOKEN_NUM=128 \
FPS_MAX_FRAMES=16 \
CUDA_VISIBLE_DEVICES=0,1 \
megatron sft \
--model Qwen/Qwen3-VL-8B-Instruct \
--load_safetensors true \
--save_safetensors true \
--dataset 'AI-ModelScope/LaTeX_OCR:human_handwrite#5000' \
--load_from_cache_file true \
--tensor_model_parallel_size 2 \
--sequence_parallel true \
--packing true \
--freeze_llm false \
--freeze_vit true \
--freeze_aligner true \
--split_dataset_ratio 0.01 \
--micro_batch_size 1 \
--global_batch_size 4 \
--recompute_granularity full \
--recompute_method uniform \
--recompute_num_layers 1 \
--finetune true \
--cross_entropy_loss_fusion true \
--lr 1e-5 \
--lr_warmup_fraction 0.05 \
--min_lr 1e-6 \
--max_epochs 1 \
--save megatron_output/Qwen3-VL-8B-Instruct \
--save_interval 200 \
--vit_gradient_checkpointing false \
--max_length 2048 \
--num_workers 4 \
--no_save_optim true \
--no_save_rng true \
--dataset_num_proc 8
```

然后我们对验证集部分进行推理:
```shell
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
IMAGE_MAX_TOKEN_NUM=1024 \
VIDEO_MAX_TOKEN_NUM=128 \
FPS_MAX_FRAMES=16 \
CUDA_VISIBLE_DEVICES=0 \
swift infer \
--model megatron_output/Qwen3-VL-8B-Instruct/vx-xxx/checkpoint-xxx \
--load_data_args true \
--stream true
```

### Moe模型
以下为纯文本模型Qwen3-Moe模型CoT训练的例子:

```shell
# 8 * 76GiB, 3s/it
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
NPROC_PER_NODE=8 \
megatron sft \
--model Qwen/Qwen3-30B-A3B-Instruct-2507 \
--load_safetensors true \
--save_safetensors true \
--dataset 'swift/Chinese-Qwen3-235B-Thinking-2507-Distill-data-110k-SFT#20000' \
--load_from_cache_file true \
--split_dataset_ratio 0.01 \
--moe_permute_fusion true \
--pipeline_model_parallel_size 2 \
--decoder_first_pipeline_num_layers 25 \
--tensor_model_parallel_size 4 \
--expert_model_parallel_size 4 \
--moe_grouped_gemm true \
--moe_shared_expert_overlap true \
--moe_aux_loss_coeff 1e-6 \
--micro_batch_size 1 \
--global_batch_size 4 \
--recompute_granularity full \
--recompute_method uniform \
--recompute_num_layers 1 \
--max_epochs 1 \
--finetune true \
--cross_entropy_loss_fusion true \
--lr 1e-5 \
--lr_warmup_fraction 0.05 \
--min_lr 1e-6 \
--save megatron_output/Qwen3-30B-A3B-Instruct-2507 \
--eval_interval 500 \
--save_interval 500 \
--max_length 8192 \
--packing true \
--num_workers 8 \
--dataset_num_proc 8 \
--no_save_optim true \
--no_save_rng true \
--sequence_parallel true \
--moe_expert_capacity_factor 2 \
--attention_backend flash
```

对训练后的权重进行推理:
```shell
CUDA_VISIBLE_DEVICES=0 \
swift infer \
--model megatron_output/Qwen3-30B-A3B-Instruct-2507/vx-xxx/checkpoint-xxx \
--stream true \
--max_new_tokens 1024
```

## LoRA导出

Mcore-Bridge除了支持全参数的导入导出,还支持单独对LoRA增量模型进行导入导出。

以下为纯文本模型Qwen3-Moe模型使用LoRA自我认知训练的例子:
- 若你希望导出merge后的权重,而不是LoRA增量权重,请设置`--merge_lora true`。
- 注意:由于transformers和Megatron模型结构并不一定一致(例如transformers的Qwen3-VL-Moe的专家部分并不是Linear实现,而是Parameters),因此部分模型无法转换(若Qwen3-VL-Moe只设置linear_proj和linear_qkv训练LoRA也支持转换)。但大多数的模型支持LoRA转换,例如:Qwen3-Moe,Qwen3-Omni-Moe,GLM4.5-V等。
```shell
# 50GiB
PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
NPROC_PER_NODE=2 \
CUDA_VISIBLE_DEVICES=0,1 \
megatron sft \
--model Qwen/Qwen3-30B-A3B-Instruct-2507 \
--load_safetensors true \
--save_safetensors true \
--merge_lora false \
--dataset 'swift/Chinese-Qwen3-235B-2507-Distill-data-110k-SFT#2000' \
'swift/self-cognition#1000' \
--load_from_cache_file true \
--train_type lora \
--lora_rank 8 \
--lora_alpha 32 \
--target_modules all-linear \
--split_dataset_ratio 0.01 \
--moe_permute_fusion true \
--expert_model_parallel_size 2 \
--moe_grouped_gemm true \
--moe_shared_expert_overlap true \
--moe_aux_loss_coeff 1e-3 \
--micro_batch_size 8 \
--global_batch_size 16 \
--recompute_granularity full \
--recompute_method uniform \
--recompute_num_layers 1 \
--max_epochs 1 \
--finetune true \
--cross_entropy_loss_fusion true \
--lr 1e-4 \
--lr_warmup_fraction 0.05 \
--min_lr 1e-5 \
--save megatron_output/Qwen3-30B-A3B-Instruct-2507 \
--eval_interval 200 \
--save_interval 200 \
--max_length 2048 \
--num_workers 8 \
--dataset_num_proc 8 \
--no_save_optim true \
--no_save_rng true \
--sequence_parallel true \
--moe_expert_capacity_factor 2 \
--attention_backend flash \
--model_author swift \
--model_name swift-robot
```

对导出的LoRA权重进行推理:
```shell
CUDA_VISIBLE_DEVICES=0 \
swift infer \
--model Qwen/Qwen3-30B-A3B-Instruct-2507 \
--adapters megatron_output/Qwen3-30B-A3B-Instruct-2507/vx-xxx/checkpoint-xxx \
--stream true
```

## 导出与转换精度测试

Mcore-Bridge除了支持在训练中进行safetensors的转换和保存,也支持了`megatron export`命令用于单独的权重导出。`megatron export`支持在权重转换时,对转换精度进行测试,这在接入新模型时验证接入准确性很有帮助。通常,Megatron-SWIFT已经接入的模型不会出现精度不对齐的情况,你可以放心设置`--test_convert_precision false`。

全参数权重:
```shell
# safetensors -> torch_dist
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=4 \
megatron export \
--model Qwen/Qwen3-30B-A3B-Instruct-2507 \
--save Qwen3-30B-A3B-Instruct-2507-mcore \
--to_mcore true \
--tensor_model_parallel_size 2 \
--expert_model_parallel_size 2 \
--pipeline_model_parallel_size 2 \
--test_convert_precision true
```

```shell
# torch_dist -> safetensors
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=4 \
megatron export \
--load Qwen3-30B-A3B-Instruct-2507-mcore \
--save Qwen3-30B-A3B-Instruct-2507-hf \
--to_hf true \
--tensor_model_parallel_size 2 \
--expert_model_parallel_size 2 \
--pipeline_model_parallel_size 2 \
--test_convert_precision true
```

LoRA权重:
```shell
# torch_dist -> safetensors
# 若你需要进行merge-lora,并测试merge-lora后的精度对齐,你只需要设置`--merge_lora true`即可
# 你也可以将`--model safetensors-path`修改为`--load torch-dist-path`。这两种方式是等价的,mcore-bridge会自动处理。
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=4 \
megatron export \
--model Qwen/Qwen3-30B-A3B-Instruct-2507 \
--adapter_load megatron_output/Qwen3-30B-A3B-Instruct-2507/vx-xxx \
--save megatron_output/Qwen3-30B-A3B-Instruct-2507/vx-xxx-lora \
--merge_lora false \
--to_hf true \
--tensor_model_parallel_size 2 \
--expert_model_parallel_size 2 \
--pipeline_model_parallel_size 2 \
--test_convert_precision true
```

```shell
# safetensors -> torch_dist
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=4 \
megatron export \
--model Qwen/Qwen3-30B-A3B-Instruct-2507 \
--adapters megatron_output/Qwen3-30B-A3B-Instruct-2507/vx-xxx-lora \
--save megatron_output/Qwen3-30B-A3B-Instruct-2507/vx-xxx-mcore \
--merge_lora false \
--to_mcore true \
--tensor_model_parallel_size 2 \
--expert_model_parallel_size 2 \
--pipeline_model_parallel_size 2 \
--test_convert_precision true
```

Merge-LoRA:
```shell
# torch_dist -> torch_dist
CUDA_VISIBLE_DEVICES=0,1,2,3 \
NPROC_PER_NODE=4 \
megatron export \
--model Qwen/Qwen3-30B-A3B-Instruct-2507 \
--adapter_load megatron_output/Qwen3-30B-A3B-Instruct-2507/vx-xxx \
--save megatron_output/Qwen3-30B-A3B-Instruct-2507/vx-xxx-merged \
--merge_lora true \
--to_mcore true \
--tensor_model_parallel_size 2 \
--expert_model_parallel_size 2 \
--pipeline_model_parallel_size 2
```
19 changes: 19 additions & 0 deletions docs/source/Megatron-SWIFT/命令行参数.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,6 +232,7 @@ lora训练:
- 🔥target_modules: 指定lora模块的后缀,例如:你可以设置为`--target_modules linear_qkv linear_proj`。默认为`['all-linear']`,代表将所有的linear设置为target_modules。
- 注意:在LLM和多模态LLM中,'all-linear'的行为有所不同。若是LLM则自动寻找除lm_head外的linear并附加tuner;**若是多模态LLM,则默认只在LLM上附加tuner,该行为可以被`freeze_llm`、`freeze_vit`、`freeze_aligner`控制**。
- 注意:若需要将所有的router设置为target_modules, 你可以额外设置`--target_modules all-router ...`,例如:`--target_modules all-router all-linear`。
- transformers和Megatron的Linear层后缀名称不同,在Megatron中,`linear_proj`代表`o_proj`,`linear_qkv`代表`q_proj, k_proj, v_proj`的拼接,`linear_fc1`代表`gate_proj`, `up_proj`的拼接,`linear_fc2`代表`down_proj`。
- 🔥target_regex: 指定lora模块的regex表达式,默认为`None`。如果该值传入,则target_modules参数失效。
- 🔥modules_to_save: 在已附加tuner后,额外指定一部分原模型模块参与训练和存储。默认为`[]`。例如设置为`--modules_to_save word_embeddings output_layer`,在LoRA训练中解开`word_embeddings`和`output_layer`层进行训练,这两部分的权重信息最终会进行保存。
- 🔥lora_rank: 默认为`8`。
Expand Down Expand Up @@ -263,6 +264,13 @@ lora训练:
**RM参数**:
- center_rewards_coefficient: 用于激励奖励模型输出均值为零的奖励的系数,具体查看这篇[论文](https://huggingface.co/papers/2312.09244)。推荐值:0.01。

**Mcore-Bridge参数**
- 🔥load_safetensors: 默认为False,是否直接从safetensors加载权重。
- 🔥save_safetensors: 默认为False,是否直接保存成safetensors权重。注意,若该参数设置为True,则不会存储优化器权重、随机数状态等断点续训内容。
- model: safetensors权重的model_id或者model_path。默认为None。
- adapters: safetensors格式的LoRA增量权重的adapter_id或者adapter_path。默认为`[]`。
- merge_lora: 是否存储合并后的权重。默认为None,若`save_safetensors`设置为True,该参数默认值为`True`,否则为False。即默认情况下,存储为safetensors格式时会合并LoRA;存储为torch_dist格式时,不会合并LoRA。
- max_shard_size: safetensors格式存储文件最大大小,默认'5GB'。

## 训练参数

Expand Down Expand Up @@ -299,3 +307,14 @@ Megatron训练参数继承自Megatron参数和基本参数(**与ms-swift共用
- 🔥rlhf_type: 默认为'dpo'。目前可选择为'dpo'、'kto'和'rm'。
- loss_scale: 覆盖[基本参数](../Instruction/命令行参数.md)中的loss_scale。默认为'last_round'。
- calculate_per_token_loss: 覆盖Megatron参数,默认为False。


## 导出参数
这里介绍`megatron export`的参数(需"ms-swift>=3.10"),若要使用`swift export`导出命令,请参考[ms-swift命令行参数文档](../Instruction/命令行参数.md#导出参数)。`megatron export`相比`swift export`,支持分布式和多机导出。Megatron导出参数继承自Megatron参数和基本参数。
- 🔥to_mcore: HF格式权重转成Megatron格式。默认为False。
- 🔥to_hf: Megatron格式权重转成HF格式。默认为False。
- 🔥merge_lora: 默认为None,若`to_hf`设置为True,该参数默认值为`True`,否则为False。即默认情况下,存储为safetensors格式时会合并LoRA;存储为torch_dist格式时,不会合并LoRA。合并后的权重存储在`--save`目录下。
- 注意:由于transformers和Megatron模型结构并不一定一致(例如transformers的Qwen3-VL-Moe的专家部分并不是Linear实现,而是Parameters),因此部分模型无法转换(若Qwen3-VL-Moe只设置linear_proj和linear_qkv训练LoRA也支持转换)。但大多数的模型支持LoRA转换,例如:Qwen3-Moe,Qwen3-Omni-Moe,GLM4.5-V等。
- 🔥test_convert_precision: 测试HF和Megatron格式权重转换的精度误差。默认为False。
- test_convert_dtype: 转换精度测试使用的dtype,默认为'float32'。
- exist_ok: 如果`args.save`存在,不抛出异常,进行覆盖。默认为False。
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Swift DOCUMENTATION
Megatron-SWIFT/命令行参数.md
Megatron-SWIFT/LoRA训练.md
Megatron-SWIFT/多模态模型.md
Megatron-SWIFT/Mcore-Bridge.md

.. toctree::
:maxdepth: 2
Expand Down
2 changes: 2 additions & 0 deletions docs/source_en/Instruction/Command-line-parameters.md
Original file line number Diff line number Diff line change
Expand Up @@ -719,6 +719,7 @@ Export Arguments include the [basic arguments](#base-arguments) and [merge argum
- mcore_adapters: List of paths to mcore format model adapters, default is empty list.
- thread_count: The number of model slices when `--to_mcore true` is set. Defaults to None, and is automatically configured based on the model size, ensuring that the largest slice is less than 10GB.
- 🔥test_convert_precision: Test the precision error when converting weights between HF and Megatron formats. Default is False.
- test_convert_dtype: The dtype used for conversion precision testing, defaults to 'float32'.
- 🔥push_to_hub: Whether to push to the hub, with the default being False. Examples can be found [here](https://github.com/modelscope/ms-swift/blob/main/examples/export/push_to_hub.sh).
- hub_model_id: Model ID for pushing, default is None.
- hub_private_repo: Whether it is a private repo, default is False.
Expand Down Expand Up @@ -786,6 +787,7 @@ The parameter meanings are the same as in the `qwen_vl_utils>=0.0.14` library
- SPATIAL_MERGE_SIZE: default 2.
- IMAGE_MIN_TOKEN_NUM: default `4`, denotes the minimum number of image tokens per image.
- 🔥IMAGE_MAX_TOKEN_NUM: default `16384`, denotes the maximum number of image tokens per image. (used to avoid OOM)
- Note: The equivalent maximum image pixel count is `IMAGE_MAX_TOKEN_NUM * 32 * 32`.
- VIDEO_MIN_TOKEN_NUM: default `128`, denotes the minimum number of video tokens per frame.
- 🔥VIDEO_MAX_TOKEN_NUM: default `768`, denotes the maximum number of video tokens per frame. (used to avoid OOM)
- MAX_RATIO: default 200.
Expand Down
Loading
Loading