Skip to content

[feat] support fine-tuning of reranker models #4671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 24, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ You can contact us and communicate with us by adding our group:


## 🎉 News
- 🎁 2025.06.23: Fine-tuning of reranker models is supported. Training scripts can be found here: [Reranker](https://github.com/modelscope/ms-swift/blob/main/examples/train/reranker/train_reranker.sh).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

中文readme加一下

- 🎁 2025.06.18: Support for accelerating the ms-swift [inference](https://github.com/modelscope/ms-swift/blob/main/examples/infer/sglang), deployment, evaluation, and UI modules using the [sglang](https://github.com/sgl-project/sglang) inference acceleration engine. Simply set `--infer_backend sglang` to enable it.
- 🎁 2025.06.15: Support for GKD training on both pure text large models and multimodal models. Training scripts can be found here: [Pure Text](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd), [Multimodal](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd).
- 🎁 2025.06.11: Support for using Megatron parallelism techniques for RLHF training. The training script can be found [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/rlhf).
Expand Down Expand Up @@ -295,7 +296,7 @@ Supported Training Methods:
| ORPO Training | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/orpo.sh) | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/orpo.sh) | ✅ | ✅ |
| Classification Model Training | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/seq_cls/qwen2_5/sft.sh) | ✅ | ✅ | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/seq_cls/qwen2_vl/sft.sh) |
| Embedding Model Training | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_gte.sh) | ✅ | ✅ | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_gme.sh) |

| Reranker Model Training | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |


Pre-training:
Expand Down
2 changes: 2 additions & 0 deletions README_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@
- **模型量化**:支持AWQ、GPTQ和BNB的量化导出,导出的模型支持使用vLLM/SGLang/LmDeploy推理加速,并支持继续训练。

## 🎉 新闻
- 🎁 2025.06.23: 支持Reranker模型训练,训练脚本参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/reranker/train_reranker.sh)。
- 🎁 2025.06.18: 支持使用[sglang](https://github.com/sgl-project/sglang)推理加速引擎对ms-swift[推理](https://github.com/modelscope/ms-swift/blob/main/examples/infer/sglang)/部署/评测/ui模块进行加速,设置`--infer_backend sglang`即可。
- 🎁 2025.06.15: 支持对纯文本大模型和多模态模型进行GKD训练。训练脚本参考这里:[纯文本](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/gkd), [多模态](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/gkd)。
- 🎁 2025.06.11: 支持使用Megatron并行技术进行RLHF训练,训练脚本参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/megatron/rlhf)。
Expand Down Expand Up @@ -284,6 +285,7 @@ print(f'response: {resp_list[0].choices[0].message.content}')
| ORPO训练 | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/orpo.sh) | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/orpo.sh) | ✅ | ✅ |
| 分类模型训练 | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/seq_cls/qwen2_5/sft.sh) | ✅ | ✅ | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/seq_cls/qwen2_vl/sft.sh) |
| Embedding模型训练 | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_gte.sh) | ✅ | ✅ | ✅ | [✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/embedding/train_gme.sh) |
| Reranker模型训练 | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |


预训练:
Expand Down
104 changes: 104 additions & 0 deletions docs/source/BestPractices/Reranker训练.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Reranker训练

SWIFT已经支持Reranker模型的训练,目前已经支持的模型有:

1. modernbert reranker模型
- [ModelScope](https://www.modelscope.cn/models/iic/gte-reranker-modernbert-base) [Hugging Face](https://huggingface.co/Alibaba-NLP/gte-reranker-modernbert-base)
2. qwen3-reranker模型
- 0.6B: [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen3-Reranker-0.6B) [Hugging Face](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B)
- 4B: [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen3-Reranker-4B) [Hugging Face](https://huggingface.co/Qwen/Qwen3-Reranker-4B)
- 8B: [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen3-Reranker-8B) [Hugging Face](https://huggingface.co/Qwen/Qwen3-Reranker-8B)

## 实现方式

目前SWIFT支持两种Reranker模型的实现方式,二者在架构和损失函数计算上有显著差异:

### 1. 分类式Reranker(Classification Reranker)

**适用模型:** modernbert reranker模型(如gte-reranker-modernbert-base)

**核心原理:**
- 基于序列分类架构,在预训练模型基础上添加分类头
- 输入:query-document对,输出:单个相关性分数


### 2. 生成式Reranker(Generative Reranker)

**适用模型:** qwen3-reranker模型(0.6B/4B/8B)

**核心原理:**
- 基于生成式语言模型架构(CausalLM)
- 输入:query-document对,输出:特定token的概率(如"yes"/"no")
- 通过对比最后位置特定token的logits进行分类

## 损失函数类型

SWIFT支持多种损失函数来训练Reranker模型:

### Pointwise损失函数
Pointwise方法将排序问题转化为二分类问题,独立处理每个query-document对:

- **核心思想:** 对每个query-document对进行二分类,判断文档是否与查询相关
- **损失函数:** 二分类交叉熵
- **适用场景:** 简单高效,适合大规模数据训练

环境变量配置:
- `GENERATIVE_RERANKER_POSITIVE_TOKEN`:正例token(默认:"yes")
- `GENERATIVE_RERANKER_NEGATIVE_TOKEN`:负例token(默认:"no")

### Listwise损失函数
Listwise方法将排序问题转化为多分类问题,从多个候选文档中选择正例:

- **核心思想:** 对每个query的候选文档组(1个正例 + n个负例)进行多分类,识别正例文档
- **损失函数:** 多分类交叉熵
- **适用场景:** 学习文档间的相对排序关系,更符合信息检索的实际需求

环境变量配置:
- `LISTWISE_RERANKER_TEMPERATURE`:softmax温度参数(默认:1.0)
- `LISTWISE_RERANKER_MIN_GROUP_SIZE`:最小组大小(默认:2)
- `LISTWISE_GENERATIVE_RERANKER_TEMPERATURE`:listwise温度参数(默认:1.0)
- `LISTWISE_GENERATIVE_RERANKER_MIN_GROUP_SIZE`:最小组大小(默认:2)

**Listwise vs Pointwise:**
- **Pointwise:** 独立判断相关性,训练简单,但忽略了文档间的相对关系
- **Listwise:** 学习相对排序,性能更优,更适合排序任务的本质需求

## 评估指标

SWIFT为Reranker训练提供了专业的信息检索评估指标:

### MRR (Mean Reciprocal Rank)
- **定义:** 所有查询的倒数排名的平均值
- **计算方式:** MRR = (1/|Q|) × Σ(1/rank_i),其中rank_i是第i个查询的正例文档排名
- **取值范围:** [0, 1],越大越好
- **适用场景:** 关注正例文档在排序结果中的位置

### NDCG (Normalized Discounted Cumulative Gain)
- **定义:** 标准化折扣累积增益
- **计算方式:** NDCG = DCG / IDCG,考虑了排序位置对相关性的影响
- **取值范围:** [0, 1],越大越好
- **适用场景:** 综合评估排序质量,对top位置的相关性更敏感

**指标计算说明:**
- 指标基于query分组计算,每个query组以正例文档开始,后跟负例文档
- 数据格式:`[1,0,0,1,0,0,0]` 表示2个query:query1=[1,0,0],query2=[1,0,0,0]
- 自动识别query边界并分别计算每个query的指标,最后取平均值

loss的源代码可以在[这里](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/loss.py)找到。

## 数据集格式

```json lines
{"query": "query", "positive": ["relevant_doc1", "relevant_doc2", ...], "negative": ["irrelevant_doc1", "irrelevant_doc2", ...]}
```

> 参考[MTEB/scidocs-reranking](https://www.modelscope.cn/datasets/MTEB/scidocs-reranking)

## 脚手架

SWIFT提供了两个脚手架训练脚本:

- [Pointwise分类式Reranker](https://github.com/tastelikefeet/swift/blob/main/examples/train/reranker/train_reranker.sh)
- [Pointwise生成式Reranker](https://github.com/tastelikefeet/swift/blob/main/examples/train/reranker/train_generative_reranker.sh)
- [Listwise分类式Reranker](https://github.com/tastelikefeet/swift/blob/main/examples/train/reranker/train_reranker_listwise.sh)
- [Listwise生成式Reranker](https://github.com/tastelikefeet/swift/blob/main/examples/train/reranker/train_generative_reranker_listwise.sh)
4 changes: 4 additions & 0 deletions docs/source/Customization/自定义数据集.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,10 @@ query-response格式:

请参考[embedding训练文档](../BestPractices/Embedding训练.md#数据集格式)

### Reranker

请参考[Reranker训练文档](../BestPractices/Reranker训练.md#数据集格式)

### 多模态

对于多模态数据集,和上述任务的格式相同。区别在于增加了`images`, `videos`, `audios`几个key,分别代表多模态资源的url或者path(推荐使用绝对路径),`<image>` `<video>` `<audio>`标签代表了插入图片/视频/音频的位置,ms-swift支持多图片/视频/音频的情况。这些特殊tokens将在预处理的时候进行替换,参考[这里](https://github.com/modelscope/ms-swift/blob/main/swift/llm/template/template/qwen.py#L198)。下面给出的四条示例分别展示了纯文本,以及包含图像、视频和音频数据的数据格式。
Expand Down
4 changes: 4 additions & 0 deletions docs/source/Instruction/支持的模型和数据集.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,6 +214,9 @@
|[Qwen/Qwen3-Embedding-0.6B](https://modelscope.cn/models/Qwen/Qwen3-Embedding-0.6B)|qwen3_emb|qwen3_emb|-|&#x2718;|-|[Qwen/Qwen3-Embedding-0.6B](https://huggingface.co/Qwen/Qwen3-Embedding-0.6B)|
|[Qwen/Qwen3-Embedding-4B](https://modelscope.cn/models/Qwen/Qwen3-Embedding-4B)|qwen3_emb|qwen3_emb|-|&#x2718;|-|[Qwen/Qwen3-Embedding-4B](https://huggingface.co/Qwen/Qwen3-Embedding-4B)|
|[Qwen/Qwen3-Embedding-8B](https://modelscope.cn/models/Qwen/Qwen3-Embedding-8B)|qwen3_emb|qwen3_emb|-|&#x2718;|-|[Qwen/Qwen3-Embedding-8B](https://huggingface.co/Qwen/Qwen3-Embedding-8B)|
|[Qwen/Qwen3-Reranker-0.6B](https://modelscope.cn/models/Qwen/Qwen3-Reranker-0.6B)|qwen3_reranker|qwen3_reranker|-|&#x2718;|-|[Qwen/Qwen3-Reranker-0.6B](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B)|
|[Qwen/Qwen3-Reranker-4B](https://modelscope.cn/models/Qwen/Qwen3-Reranker-4B)|qwen3_reranker|qwen3_reranker|-|&#x2718;|-|[Qwen/Qwen3-Reranker-4B](https://huggingface.co/Qwen/Qwen3-Reranker-4B)|
|[Qwen/Qwen3-Reranker-8B](https://modelscope.cn/models/Qwen/Qwen3-Reranker-8B)|qwen3_reranker|qwen3_reranker|-|&#x2718;|-|[Qwen/Qwen3-Reranker-8B](https://huggingface.co/Qwen/Qwen3-Reranker-8B)|
|[iic/gte_Qwen2-1.5B-instruct](https://modelscope.cn/models/iic/gte_Qwen2-1.5B-instruct)|qwen2_gte|dummy|-|&#x2718;|-|[Alibaba-NLP/gte-Qwen2-1.5B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-1.5B-instruct)|
|[iic/gte_Qwen2-7B-instruct](https://modelscope.cn/models/iic/gte_Qwen2-7B-instruct)|qwen2_gte|dummy|-|&#x2718;|-|[Alibaba-NLP/gte-Qwen2-7B-instruct](https://huggingface.co/Alibaba-NLP/gte-Qwen2-7B-instruct)|
|[codefuse-ai/CodeFuse-QWen-14B](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B)|codefuse_qwen|codefuse|-|&#x2718;|coding|[codefuse-ai/CodeFuse-QWen-14B](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B)|
Expand Down Expand Up @@ -552,6 +555,7 @@
|[answerdotai/ModernBERT-base](https://modelscope.cn/models/answerdotai/ModernBERT-base)|modern_bert|dummy|transformers>=4.48|&#x2718;|bert|[answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base)|
|[answerdotai/ModernBERT-large](https://modelscope.cn/models/answerdotai/ModernBERT-large)|modern_bert|dummy|transformers>=4.48|&#x2718;|bert|[answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large)|
|[iic/gte-modernbert-base](https://modelscope.cn/models/iic/gte-modernbert-base)|modern_bert_gte|dummy|transformers>=4.48|&#x2718;|bert, embedding|[Alibaba-NLP/gte-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-modernbert-base)|
|[iic/gte-reranker-modernbert-base](https://modelscope.cn/models/iic/gte-reranker-modernbert-base)|modern_bert_gte_reranker|bert|transformers>=4.48|&#x2718;|bert, reranker|[Alibaba-NLP/gte-reranker-modernbert-base](https://huggingface.co/Alibaba-NLP/gte-reranker-modernbert-base)|
|[iic/nlp_structbert_backbone_base_std](https://modelscope.cn/models/iic/nlp_structbert_backbone_base_std)|bert|dummy|-|&#x2718;|bert|-|
|[Shanghai_AI_Laboratory/internlm2-1_8b-reward](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-1_8b-reward)|internlm2_reward|internlm2_reward|transformers>=4.38|&#x2718;|-|[internlm/internlm2-1_8b-reward](https://huggingface.co/internlm/internlm2-1_8b-reward)|
|[Shanghai_AI_Laboratory/internlm2-7b-reward](https://modelscope.cn/models/Shanghai_AI_Laboratory/internlm2-7b-reward)|internlm2_reward|internlm2_reward|transformers>=4.38|&#x2718;|-|[internlm/internlm2-7b-reward](https://huggingface.co/internlm/internlm2-7b-reward)|
Expand Down
103 changes: 103 additions & 0 deletions docs/source_en/BestPractices/Reranker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Reranker Training

SWIFT supports Reranker model training. Currently supported models include:

1. modernbert reranker model
- [ModelScope](https://www.modelscope.cn/models/iic/gte-reranker-modernbert-base) [Hugging Face](https://huggingface.co/Alibaba-NLP/gte-reranker-modernbert-base)
2. qwen3-reranker model
- 0.6B: [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen3-Reranker-0.6B) [Hugging Face](https://huggingface.co/Qwen/Qwen3-Reranker-0.6B)
- 4B: [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen3-Reranker-4B) [Hugging Face](https://huggingface.co/Qwen/Qwen3-Reranker-4B)
- 8B: [ModelScope](https://www.modelscope.cn/models/Qwen/Qwen3-Reranker-8B) [Hugging Face](https://huggingface.co/Qwen/Qwen3-Reranker-8B)

## Implementation Methods

SWIFT currently supports two implementation methods for Reranker models, which have significant differences in architecture and loss function computation:

### 1. Classification Reranker

**Applicable Models:** modernbert reranker models (e.g., gte-reranker-modernbert-base)

**Core Principles:**
- Based on sequence classification architecture, adding a classification head on top of pre-trained models
- Input: query-document pairs, Output: single relevance score

### 2. Generative Reranker

**Applicable Models:** qwen3-reranker models (0.6B/4B/8B)

**Core Principles:**
- Based on generative language model architecture (CausalLM)
- Input: query-document pairs, Output: probability of specific tokens (e.g., "yes"/"no")
- Classification is performed by comparing logits of specific tokens at the final position

## Loss Function Types

SWIFT supports multiple loss functions for training Reranker models:

### Pointwise Loss Functions
Pointwise methods transform the ranking problem into a binary classification problem, processing each query-document pair independently:

- **Core Idea:** Binary classification for each query-document pair to determine document relevance to the query
- **Loss Function:** Binary cross-entropy
- **Use Cases:** Simple and efficient, suitable for large-scale data training

Environment variable configuration:
- `GENERATIVE_RERANKER_POSITIVE_TOKEN`: Positive token (default: "yes")
- `GENERATIVE_RERANKER_NEGATIVE_TOKEN`: Negative token (default: "no")

### Listwise Loss Functions
Listwise methods transform the ranking problem into a multi-classification problem, selecting positive examples from multiple candidate documents:

- **Core Idea:** Multi-classification for each query's candidate document group (1 positive + n negative examples) to identify positive documents
- **Loss Function:** Multi-class cross-entropy
- **Use Cases:** Learning relative ranking relationships between documents, better aligned with the actual needs of information retrieval

Environment variable configuration:
- `LISTWISE_RERANKER_TEMPERATURE`: Softmax temperature parameter (default: 1.0)
- `LISTWISE_RERANKER_MIN_GROUP_SIZE`: Minimum group size (default: 2)
- `LISTWISE_GENERATIVE_RERANKER_TEMPERATURE`: Listwise temperature parameter (default: 1.0)
- `LISTWISE_GENERATIVE_RERANKER_MIN_GROUP_SIZE`: Minimum group size (default: 2)

**Listwise vs Pointwise:**
- **Pointwise:** Independent relevance judgment, simple training, but ignores relative relationships between documents
- **Listwise:** Learning relative ranking, better performance, more suitable for the essential needs of ranking tasks

## Evaluation Metrics

SWIFT provides professional information retrieval evaluation metrics for Reranker training:

### MRR (Mean Reciprocal Rank)
- **Definition:** Average of reciprocal ranks across all queries
- **Calculation:** MRR = (1/|Q|) × Σ(1/rank_i), where rank_i is the rank of the positive document for the i-th query
- **Range:** [0, 1], higher is better
- **Use Cases:** Focus on the position of positive documents in ranking results

### NDCG (Normalized Discounted Cumulative Gain)
- **Definition:** Normalized discounted cumulative gain
- **Calculation:** NDCG = DCG / IDCG, considering the impact of ranking position on relevance
- **Range:** [0, 1], higher is better
- **Use Cases:** Comprehensive evaluation of ranking quality, more sensitive to relevance at top positions

**Metric Calculation Notes:**
- Metrics are calculated based on query grouping, with each query group starting with a positive document followed by negative documents
- Data format: `[1,0,0,1,0,0,0]` represents 2 queries: query1=[1,0,0], query2=[1,0,0,0]
- Automatically identifies query boundaries and calculates metrics for each query separately, then takes the average

The loss function source code can be found [here](https://github.com/modelscope/ms-swift/blob/main/swift/plugin/loss.py).

## Dataset Format

```json lines
{"query": "query", "positive": ["relevant_doc1", "relevant_doc2", ...], "negative": ["irrelevant_doc1", "irrelevant_doc2", ...]}
```

> Reference: [MTEB/scidocs-reranking](https://www.modelscope.cn/datasets/MTEB/scidocs-reranking)

## Training Scripts

SWIFT provides four training script templates:

- [Pointwise Classification Reranker](https://github.com/tastelikefeet/swift/blob/main/examples/train/reranker/train_reranker.sh)
- [Pointwise Generative Reranker](https://github.com/tastelikefeet/swift/blob/main/examples/train/reranker/train_generative_reranker.sh)
- [Listwise Classification Reranker](https://github.com/tastelikefeet/swift/blob/main/examples/train/reranker/train_reranker_listwise.sh)
- [Listwise Generative Reranker](https://github.com/tastelikefeet/swift/blob/main/examples/train/reranker/train_generative_reranker_listwise.sh)
6 changes: 5 additions & 1 deletion docs/source_en/Customization/Custom-dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,11 @@ If `seq_kd` is enabled, the final round of the 'assistant' part is not required

### Embedding

Please refer to [embedding训练文档](../BestPractices/Embedding.md#dataset-format).
Please refer to [Embedding training document](../BestPractices/Embedding.md#dataset-format).

### Reranker

Please refer to [Reranker training document](../BestPractices/Reranker.md#dataset-format).

### Multimodal

Expand Down
Loading