diff --git a/docs/en/get_started/installation.md b/docs/en/get_started/installation.md index c00111c2ab..8877d510cc 100644 --- a/docs/en/get_started/installation.md +++ b/docs/en/get_started/installation.md @@ -23,7 +23,7 @@ pip install lmdeploy The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by: ```shell -export LMDEPLOY_VERSION=0.6.4 +export LMDEPLOY_VERSION=0.6.5 export PYTHON_VERSION=38 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118 ``` diff --git a/docs/en/supported_models/supported_models.md b/docs/en/supported_models/supported_models.md index dd8ceb4ffa..cb9805bb0b 100644 --- a/docs/en/supported_models/supported_models.md +++ b/docs/en/supported_models/supported_models.md @@ -4,104 +4,107 @@ The following tables detail the models supported by LMDeploy's TurboMind engine ## TurboMind on CUDA Platform -| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 | -| :-------------------: | :------------: | :--: | :-------: | :-----: | :-----: | :---: | -| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | -| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | -| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | -| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | -| Llama3.2 | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes | -| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | -| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | -| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | -| InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes | -| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes | -| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | -| Qwen1.5 | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes | -| Qwen2 | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes | -| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes | -| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | Yes | Yes | -| Mistral | 7B | LLM | Yes | Yes | Yes | No | -| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes | -| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No | -| DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No | -| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes | -| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes | -| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes | -| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | -| Code Llama | 7B - 34B | LLM | Yes | Yes | Yes | No | -| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | -| LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes | -| InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes | -| InternVL2 | 1-2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes | -| ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes | -| MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes | -| MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes | -| MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes | -| GLM4 | 9B | LLM | Yes | Yes | Yes | Yes | -| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | -| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No | +| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 | +| :------------------------------: | :--------------: | :--: | :-------: | :-----: | :-----: | :---: | +| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | +| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | +| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | +| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | +| Llama3.2\[2\] | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes | +| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | +| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | +| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | +| InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes | +| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes | +| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | +| Qwen1.5\[1\] | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes | +| Qwen2\[2\] | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes | +| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes | +| Qwen2.5\[2\] | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes | +| Mistral\[1\] | 7B | LLM | Yes | Yes | Yes | No | +| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes | +| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No | +| DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No | +| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes | +| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes | +| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes | +| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | +| Code Llama | 7B - 34B | LLM | Yes | Yes | Yes | No | +| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | +| LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes | +| InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes | +| InternVL2\[2\] | 1 - 2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes | +| InternVL2.5(MPO)\[2\] | 1 - 78B | MLLM | Yes | Yes\* | Yes\* | Yes | +| ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes | +| MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes | +| MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes | +| MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes | +| GLM4 | 9B | LLM | Yes | Yes | Yes | Yes | +| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | +| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No | "-" means not verified yet. ```{note} -* The TurboMind engine doesn't support window attention. Therefore, for models that have applied window attention and have the corresponding switch "use_sliding_window" enabled, such as Mistral, Qwen1.5 and etc., please choose the PyTorch engine for inference. -* When the head_dim of a model is not 128, such as llama3.2-1B, qwen2-0.5B and internvl2-1B, turbomind doesn't support its kv cache 4/8 bit quantization and inference +* [1] The TurboMind engine doesn't support window attention. Therefore, for models that have applied window attention and have the corresponding switch "use_sliding_window" enabled, such as Mistral, Qwen1.5 and etc., please choose the PyTorch engine for inference. +* [2] When the head_dim of a model is not 128, such as llama3.2-1B, qwen2-0.5B and internvl2-1B, turbomind doesn't support its kv cache 4/8 bit quantization and inference ``` ## PyTorchEngine on CUDA Platform -| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 | -| :------------: | :---------: | :--: | :-------: | :-----: | :-----: | :--: | :---: | -| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes | -| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes | -| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes | -| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes | -| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes | -| Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - | -| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | -| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | -| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes | -| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No | -| Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No | -| ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No | -| Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No | -| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes | -| Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes | -| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No | -| QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes | -| QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes | -| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No | -| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes | -| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes | -| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | No | -| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No | -| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No | -| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No | -| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No | -| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes | -| Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No | -| Dbrx | 132B | LLM | Yes | Yes | Yes | No | No | -| StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No | -| Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes | -| Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - | -| CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - | -| CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - | -| LLaVA(1.5,1.6) | 7B-34B | MLLM | Yes | Yes | Yes | - | - | -| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes | -| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | - | - | -| Mono-InternVL | 2B | MLLM | Yes\* | Yes | Yes | - | - | -| ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - | -| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - | -| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No | -| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | No | -| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - | -| Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - | -| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - | -| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - | +| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 | +| :----------------------------: | :---------: | :--: | :-------: | :-----: | :-----: | :--: | :---: | +| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes | +| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes | +| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes | +| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes | +| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes | +| Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - | +| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | +| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | +| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes | +| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No | +| Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No | +| ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No | +| Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No | +| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes | +| Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes | +| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No | +| QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes | +| QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes | +| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No | +| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes | +| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes | +| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | Yes | +| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No | +| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No | +| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No | +| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No | +| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes | +| Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No | +| Dbrx | 132B | LLM | Yes | Yes | Yes | No | No | +| StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No | +| Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes | +| Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - | +| CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - | +| CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - | +| LLaVA(1.5,1.6)\[2\] | 7B-34B | MLLM | No | No | No | No | No | +| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes | +| InternVL2 | 1B-76B | MLLM | Yes | Yes | Yes | - | - | +| InternVL2.5(MPO) | 1B-78B | MLLM | Yes | Yes | Yes | - | - | +| Mono-InternVL\[1\] | 2B | MLLM | Yes | Yes | Yes | - | - | +| ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - | +| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - | +| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No | +| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | Yes | +| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - | +| Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - | +| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - | +| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - | ```{note} -* Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead. +* [1] Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead. +* [2] PyTorch engine removes the support of original llava models after v0.6.4. Please use their corresponding transformers models instead, which can be found in https://huggingface.co/llava-hf ``` ## PyTorchEngine on Huawei Ascend Platform diff --git a/docs/zh_cn/get_started/installation.md b/docs/zh_cn/get_started/installation.md index 0213fa6d15..501f8a13e8 100644 --- a/docs/zh_cn/get_started/installation.md +++ b/docs/zh_cn/get_started/installation.md @@ -23,7 +23,7 @@ pip install lmdeploy 默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3),你可以使用以下命令安装 lmdeploy: ```shell -export LMDEPLOY_VERSION=0.6.4 +export LMDEPLOY_VERSION=0.6.5 export PYTHON_VERSION=38 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118 ``` diff --git a/docs/zh_cn/supported_models/supported_models.md b/docs/zh_cn/supported_models/supported_models.md index 3ec3688e1b..83b7a9ca6f 100644 --- a/docs/zh_cn/supported_models/supported_models.md +++ b/docs/zh_cn/supported_models/supported_models.md @@ -4,104 +4,107 @@ ## TurboMind CUDA 平台 -| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 | -| :-------------------: | :------------: | :--: | :-------: | :-----: | :-----: | :---: | -| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | -| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | -| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | -| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | -| Llama3.2 | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes | -| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | -| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | -| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | -| InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes | -| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes | -| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | -| Qwen1.5 | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes | -| Qwen2 | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes | -| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes | -| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | Yes | Yes | -| Mistral | 7B | LLM | Yes | Yes | Yes | No | -| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes | -| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No | -| DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No | -| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes | -| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes | -| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes | -| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | -| Code Llama | 7B - 34B | LLM | Yes | Yes | Yes | No | -| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | -| LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes | -| InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes | -| InternVL2 | 1-2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes | -| ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes | -| MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes | -| MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes | -| MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes | -| GLM4 | 9B | LLM | Yes | Yes | Yes | Yes | -| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | -| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No | +| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 | +| :------------------------------: | :------------: | :--: | :-------: | :-----: | :-----: | :---: | +| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | +| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | +| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | +| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | +| Llama3.2\[2\] | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes | +| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | +| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | +| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | +| InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes | +| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes | +| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | +| Qwen1.5\[1\] | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes | +| Qwen2\[2\] | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes | +| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes | +| Qwen2.5\[2\] | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes | +| Mistral\[1\] | 7B | LLM | Yes | Yes | Yes | No | +| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes | +| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No | +| DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No | +| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes | +| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes | +| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes | +| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | +| Code Llama | 7B - 34B | LLM | Yes | Yes | Yes | No | +| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | +| LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes | +| InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes | +| InternVL2 | 1-2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes | +| InternVL2.5(MPO)\[2\] | 1 - 78B | MLLM | Yes | Yes\* | Yes\* | Yes | +| ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes | +| MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes | +| MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes | +| MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes | +| GLM4 | 9B | LLM | Yes | Yes | Yes | Yes | +| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | +| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No | “-” 表示还没有验证。 ```{note} -* turbomind 引擎不支持 window attention。所以,对于应用了 window attention,并开启了对应的开关"use_sliding_window"的模型,比如 Mistral、Qwen1.5 等,在推理时,请选择 pytorch engine -* 当模型的 head_dim 非 128 时,turbomind 不支持它的 kv cache 4/8 bit 量化和推理。比如,llama3.2-1B,qwen2-0.5B,internvl2-1B 等等 +* [1] turbomind 引擎不支持 window attention。所以,对于应用了 window attention,并开启了对应的开关"use_sliding_window"的模型,比如 Mistral、Qwen1.5 等,在推理时,请选择 pytorch engine +* [2] 当模型的 head_dim 非 128 时,turbomind 不支持它的 kv cache 4/8 bit 量化和推理。比如,llama3.2-1B,qwen2-0.5B,internvl2-1B 等等 ``` ## PyTorchEngine CUDA 平台 -| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 | -| :------------: | :---------: | :--: | :-------: | :-----: | :-----: | :--: | :---: | -| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes | -| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes | -| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes | -| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes | -| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes | -| Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - | -| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | -| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | -| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes | -| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No | -| Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No | -| ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No | -| Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No | -| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes | -| Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes | -| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No | -| QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes | -| QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes | -| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No | -| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes | -| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes | -| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | No | -| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No | -| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No | -| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No | -| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No | -| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes | -| Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No | -| Dbrx | 132B | LLM | Yes | Yes | Yes | No | No | -| StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No | -| Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes | -| Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - | -| CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - | -| CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - | -| LLaVA(1.5,1.6) | 7B-34B | MLLM | Yes | Yes | Yes | - | - | -| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes | -| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | - | - | -| Mono-InternVL | 2B | MLLM | Yes\* | Yes | Yes | - | - | -| ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - | -| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - | -| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No | -| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | No | -| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - | -| Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - | -| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - | -| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - | +| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 | +| :----------------------------: | :---------: | :--: | :-------: | :-----: | :-----: | :--: | :---: | +| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes | +| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes | +| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes | +| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes | +| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes | +| Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - | +| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | +| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | +| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes | +| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No | +| Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No | +| ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No | +| Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No | +| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes | +| Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes | +| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No | +| QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes | +| QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes | +| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No | +| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes | +| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes | +| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | Yes | +| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No | +| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No | +| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No | +| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No | +| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes | +| Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No | +| Dbrx | 132B | LLM | Yes | Yes | Yes | No | No | +| StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No | +| Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes | +| Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - | +| CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - | +| CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - | +| LLaVA(1.5,1.6)\[2\] | 7B-34B | MLLM | No | No | No | No | No | +| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes | +| InternVL2 | 1B-76B | MLLM | Yes | Yes | Yes | - | - | +| InternVL2.5(MPO) | 1B-78B | MLLM | Yes | Yes | Yes | - | - | +| Mono-InternVL\[1\] | 2B | MLLM | Yes\* | Yes | Yes | - | - | +| ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - | +| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - | +| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No | +| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | Yes | +| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - | +| Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - | +| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - | +| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - | ```{note} -* 目前,Mono-InternVL不支持FP16,因为数值不稳定。请改用BF16。 +* [1] 目前,Mono-InternVL不支持FP16,因为数值不稳定。请改用BF16 +* [2] 自 0.6.4 之后,PyTorch 引擎移除了对 llava 模型原始格式的支持。我们建议使用它们对应的 transformers 格式的模型。这些模型可以在 https://huggingface.co/llava-hf 中找到 ``` ## PyTorchEngine 华为昇腾平台 diff --git a/lmdeploy/version.py b/lmdeploy/version.py index f705fcb332..0b4b8a5379 100644 --- a/lmdeploy/version.py +++ b/lmdeploy/version.py @@ -1,7 +1,7 @@ # Copyright (c) OpenMMLab. All rights reserved. from typing import Tuple -__version__ = '0.6.4' +__version__ = '0.6.5' short_version = __version__