Skip to content

Commit ba2733a

Browse files
jeejeeleeywang96
authored andcommitted
[Doc] Slight improvement to M2 and beyond (vllm-project#27554)
Signed-off-by: Jee Jee Li <[email protected]> Co-authored-by: Roger Wang <[email protected]>
1 parent 389549b commit ba2733a

File tree

4 files changed

+5
-18
lines changed

4 files changed

+5
-18
lines changed

docs/features/reasoning_outputs.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,12 @@ vLLM currently supports the following reasoning models:
1414
| [DeepSeek-V3.1](https://huggingface.co/collections/deepseek-ai/deepseek-v31-68a491bed32bd77e7fca048f) | `deepseek_v3` | `json`, `regex` ||
1515
| [ERNIE-4.5-VL series](https://huggingface.co/baidu/ERNIE-4.5-VL-28B-A3B-PT) | `ernie45` | `json`, `regex` ||
1616
| [ERNIE-4.5-21B-A3B-Thinking](https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking) | `ernie45` | `json`, `regex` ||
17-
| [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) | `deepseek_r1` | `json`, `regex` ||
17+
| [GLM-4.5 series](https://huggingface.co/collections/zai-org/glm-45-687c621d34bda8c9e4bf503b) | `glm45` | `json`, `regex` ||
18+
| [Hunyuan A13B series](https://huggingface.co/collections/tencent/hunyuan-a13b-685ec38e5b46321e3ea7c4be) | `hunyuan_a13b` | `json`, `regex` ||
1819
| [IBM Granite 3.2 language models](https://huggingface.co/collections/ibm-granite/granite-32-language-models-67b3bc8c13508f6d064cff9a) | `granite` |||
20+
| [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2) | `minimax_m2_append_think` | `json`, `regex` ||
1921
| [Qwen3 series](https://huggingface.co/collections/Qwen/qwen3-67dd247413f0e2e4f653967f) | `qwen3` | `json`, `regex` ||
20-
| [Hunyuan A13B series](https://huggingface.co/collections/tencent/hunyuan-a13b-685ec38e5b46321e3ea7c4be) | `hunyuan_a13b` | `json`, `regex` ||
21-
| [GLM-4.5 series](https://huggingface.co/collections/zai-org/glm-45-687c621d34bda8c9e4bf503b) | `glm45` | `json`, `regex` ||
22+
| [QwQ-32B](https://huggingface.co/Qwen/QwQ-32B) | `deepseek_r1` | `json`, `regex` ||
2223

2324
!!! note
2425
IBM Granite 3.2 and DeepSeek-V3.1 reasoning is disabled by default; to enable it, you must also pass `thinking=True` in your `chat_template_kwargs`.

docs/models/supported_models.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -390,6 +390,7 @@ th {
390390
| `MiMoForCausalLM` | MiMo | `XiaomiMiMo/MiMo-7B-RL`, etc. | ✅︎ | ✅︎ |
391391
| `MiniCPMForCausalLM` | MiniCPM | `openbmb/MiniCPM-2B-sft-bf16`, `openbmb/MiniCPM-2B-dpo-bf16`, `openbmb/MiniCPM-S-1B-sft`, etc. | ✅︎ | ✅︎ |
392392
| `MiniCPM3ForCausalLM` | MiniCPM3 | `openbmb/MiniCPM3-4B`, etc. | ✅︎ | ✅︎ |
393+
| `MiniMaxM2ForCausalLM` | MiniMax-M2 |`MiniMaxAI/MiniMax-M2`, etc. | | ✅︎ |
393394
| `MistralForCausalLM` | Mistral, Mistral-Instruct | `mistralai/Mistral-7B-v0.1`, `mistralai/Mistral-7B-Instruct-v0.1`, etc. | ✅︎ | ✅︎ |
394395
| `MixtralForCausalLM` | Mixtral-8x7B, Mixtral-8x7B-Instruct | `mistralai/Mixtral-8x7B-v0.1`, `mistralai/Mixtral-8x7B-Instruct-v0.1`, `mistral-community/Mixtral-8x22B-v0.1`, etc. | ✅︎ | ✅︎ |
395396
| `MPTForCausalLM` | MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter | `mosaicml/mpt-7b`, `mosaicml/mpt-7b-storywriter`, `mosaicml/mpt-30b`, etc. | | ✅︎ |

tests/models/registry.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -344,7 +344,6 @@ def check_available_online(
344344
"MiniMaxM2ForCausalLM": _HfExamplesInfo(
345345
"MiniMaxAI/MiniMax-M2",
346346
trust_remote_code=True,
347-
is_available_online=False,
348347
),
349348
"MistralForCausalLM": _HfExamplesInfo("mistralai/Mistral-7B-Instruct-v0.1"),
350349
"MixtralForCausalLM": _HfExamplesInfo(

vllm/model_executor/models/minimax_m2.py

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -551,20 +551,6 @@ def compute_logits(
551551
logits = self.logits_processor(self.lm_head, hidden_states)
552552
return logits
553553

554-
def make_empty_intermediate_tensors(
555-
self, batch_size: int, dtype: torch.dtype, device: torch.device
556-
) -> IntermediateTensors:
557-
return IntermediateTensors(
558-
{
559-
"hidden_states": torch.zeros(
560-
(batch_size, self.config.hidden_size), dtype=dtype, device=device
561-
),
562-
"residual": torch.zeros(
563-
(batch_size, self.config.hidden_size), dtype=dtype, device=device
564-
),
565-
}
566-
)
567-
568554
def load_weights(self, weights: Iterable[tuple[str, torch.Tensor]]) -> set[str]:
569555
loader = AutoWeightsLoader(self)
570556
return loader.load_weights(weights)

0 commit comments

Comments
 (0)