diff --git a/docs/en/get_started/installation.md b/docs/en/get_started/installation.md
index c00111c2ab..8877d510cc 100644
--- a/docs/en/get_started/installation.md
+++ b/docs/en/get_started/installation.md
@@ -23,7 +23,7 @@ pip install lmdeploy
The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by:
```shell
-export LMDEPLOY_VERSION=0.6.4
+export LMDEPLOY_VERSION=0.6.5
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
diff --git a/docs/en/supported_models/supported_models.md b/docs/en/supported_models/supported_models.md
index dd8ceb4ffa..cb9805bb0b 100644
--- a/docs/en/supported_models/supported_models.md
+++ b/docs/en/supported_models/supported_models.md
@@ -4,104 +4,107 @@ The following tables detail the models supported by LMDeploy's TurboMind engine
## TurboMind on CUDA Platform
-| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
-| :-------------------: | :------------: | :--: | :-------: | :-----: | :-----: | :---: |
-| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes |
-| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
-| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
-| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
-| Llama3.2 | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes |
-| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
-| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
-| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes |
-| InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes |
-| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes |
-| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes |
-| Qwen1.5 | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes |
-| Qwen2 | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
-| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes |
-| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | Yes | Yes |
-| Mistral | 7B | LLM | Yes | Yes | Yes | No |
-| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes |
-| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No |
-| DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No |
-| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
-| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
-| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes |
-| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes |
-| Code Llama | 7B - 34B | LLM | Yes | Yes | Yes | No |
-| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes |
-| LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes |
-| InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes |
-| InternVL2 | 1-2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes |
-| ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes |
-| MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes |
-| MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes |
-| MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes |
-| GLM4 | 9B | LLM | Yes | Yes | Yes | Yes |
-| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - |
-| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No |
+| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
+| :------------------------------: | :--------------: | :--: | :-------: | :-----: | :-----: | :---: |
+| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes |
+| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
+| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
+| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
+| Llama3.2\[2\] | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes |
+| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
+| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
+| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes |
+| InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes |
+| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes |
+| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes |
+| Qwen1.5\[1\] | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes |
+| Qwen2\[2\] | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
+| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes |
+| Qwen2.5\[2\] | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
+| Mistral\[1\] | 7B | LLM | Yes | Yes | Yes | No |
+| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes |
+| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No |
+| DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No |
+| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
+| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
+| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes |
+| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes |
+| Code Llama | 7B - 34B | LLM | Yes | Yes | Yes | No |
+| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes |
+| LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes |
+| InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes |
+| InternVL2\[2\] | 1 - 2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes |
+| InternVL2.5(MPO)\[2\] | 1 - 78B | MLLM | Yes | Yes\* | Yes\* | Yes |
+| ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes |
+| MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes |
+| MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes |
+| MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes |
+| GLM4 | 9B | LLM | Yes | Yes | Yes | Yes |
+| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - |
+| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No |
"-" means not verified yet.
```{note}
-* The TurboMind engine doesn't support window attention. Therefore, for models that have applied window attention and have the corresponding switch "use_sliding_window" enabled, such as Mistral, Qwen1.5 and etc., please choose the PyTorch engine for inference.
-* When the head_dim of a model is not 128, such as llama3.2-1B, qwen2-0.5B and internvl2-1B, turbomind doesn't support its kv cache 4/8 bit quantization and inference
+* [1] The TurboMind engine doesn't support window attention. Therefore, for models that have applied window attention and have the corresponding switch "use_sliding_window" enabled, such as Mistral, Qwen1.5 and etc., please choose the PyTorch engine for inference.
+* [2] When the head_dim of a model is not 128, such as llama3.2-1B, qwen2-0.5B and internvl2-1B, turbomind doesn't support its kv cache 4/8 bit quantization and inference
```
## PyTorchEngine on CUDA Platform
-| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 |
-| :------------: | :---------: | :--: | :-------: | :-----: | :-----: | :--: | :---: |
-| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - |
-| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
-| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
-| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No |
-| Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No |
-| ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No |
-| Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No |
-| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No |
-| QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes |
-| QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes |
-| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No |
-| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
-| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
-| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | No |
-| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
-| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
-| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
-| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
-| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
-| Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No |
-| Dbrx | 132B | LLM | Yes | Yes | Yes | No | No |
-| StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No |
-| Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - |
-| CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - |
-| CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - |
-| LLaVA(1.5,1.6) | 7B-34B | MLLM | Yes | Yes | Yes | - | - |
-| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
-| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | - | - |
-| Mono-InternVL | 2B | MLLM | Yes\* | Yes | Yes | - | - |
-| ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - |
-| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - |
-| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
-| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | No |
-| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - |
-| Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - |
-| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - |
-| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - |
+| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 |
+| :----------------------------: | :---------: | :--: | :-------: | :-----: | :-----: | :--: | :---: |
+| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - |
+| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
+| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
+| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No |
+| Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No |
+| ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No |
+| Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No |
+| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No |
+| QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes |
+| QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes |
+| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No |
+| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
+| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
+| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | Yes |
+| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
+| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
+| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
+| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
+| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
+| Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No |
+| Dbrx | 132B | LLM | Yes | Yes | Yes | No | No |
+| StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No |
+| Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - |
+| CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - |
+| CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - |
+| LLaVA(1.5,1.6)\[2\] | 7B-34B | MLLM | No | No | No | No | No |
+| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
+| InternVL2 | 1B-76B | MLLM | Yes | Yes | Yes | - | - |
+| InternVL2.5(MPO) | 1B-78B | MLLM | Yes | Yes | Yes | - | - |
+| Mono-InternVL\[1\] | 2B | MLLM | Yes | Yes | Yes | - | - |
+| ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - |
+| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - |
+| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
+| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | Yes |
+| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - |
+| Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - |
+| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - |
+| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - |
```{note}
-* Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead.
+* [1] Currently Mono-InternVL does not support FP16 due to numerical instability. Please use BF16 instead.
+* [2] PyTorch engine removes the support of original llava models after v0.6.4. Please use their corresponding transformers models instead, which can be found in https://huggingface.co/llava-hf
```
## PyTorchEngine on Huawei Ascend Platform
diff --git a/docs/zh_cn/get_started/installation.md b/docs/zh_cn/get_started/installation.md
index 0213fa6d15..501f8a13e8 100644
--- a/docs/zh_cn/get_started/installation.md
+++ b/docs/zh_cn/get_started/installation.md
@@ -23,7 +23,7 @@ pip install lmdeploy
默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3),你可以使用以下命令安装 lmdeploy:
```shell
-export LMDEPLOY_VERSION=0.6.4
+export LMDEPLOY_VERSION=0.6.5
export PYTHON_VERSION=38
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
diff --git a/docs/zh_cn/supported_models/supported_models.md b/docs/zh_cn/supported_models/supported_models.md
index 3ec3688e1b..83b7a9ca6f 100644
--- a/docs/zh_cn/supported_models/supported_models.md
+++ b/docs/zh_cn/supported_models/supported_models.md
@@ -4,104 +4,107 @@
## TurboMind CUDA 平台
-| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
-| :-------------------: | :------------: | :--: | :-------: | :-----: | :-----: | :---: |
-| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes |
-| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
-| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
-| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
-| Llama3.2 | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes |
-| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
-| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
-| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes |
-| InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes |
-| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes |
-| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes |
-| Qwen1.5 | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes |
-| Qwen2 | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
-| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes |
-| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | Yes | Yes |
-| Mistral | 7B | LLM | Yes | Yes | Yes | No |
-| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes |
-| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No |
-| DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No |
-| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
-| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
-| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes |
-| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes |
-| Code Llama | 7B - 34B | LLM | Yes | Yes | Yes | No |
-| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes |
-| LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes |
-| InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes |
-| InternVL2 | 1-2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes |
-| ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes |
-| MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes |
-| MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes |
-| MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes |
-| GLM4 | 9B | LLM | Yes | Yes | Yes | Yes |
-| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - |
-| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No |
+| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W4A16 |
+| :------------------------------: | :------------: | :--: | :-------: | :-----: | :-----: | :---: |
+| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes |
+| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes |
+| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
+| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes |
+| Llama3.2\[2\] | 1B, 3B | LLM | Yes | Yes\* | Yes\* | Yes |
+| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
+| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes |
+| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes |
+| InternLM-XComposer2 | 7B, 4khd-7B | MLLM | Yes | Yes | Yes | Yes |
+| InternLM-XComposer2.5 | 7B | MLLM | Yes | Yes | Yes | Yes |
+| Qwen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes |
+| Qwen1.5\[1\] | 1.8B - 110B | LLM | Yes | Yes | Yes | Yes |
+| Qwen2\[2\] | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
+| Qwen2-MoE | 57BA14B | LLM | Yes | Yes | Yes | Yes |
+| Qwen2.5\[2\] | 0.5B - 72B | LLM | Yes | Yes\* | Yes\* | Yes |
+| Mistral\[1\] | 7B | LLM | Yes | Yes | Yes | No |
+| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | Yes |
+| DeepSeek-V2 | 16B, 236B | LLM | Yes | Yes | Yes | No |
+| DeepSeek-V2.5 | 236B | LLM | Yes | Yes | Yes | No |
+| Qwen-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
+| DeepSeek-VL | 7B | MLLM | Yes | Yes | Yes | Yes |
+| Baichuan | 7B | LLM | Yes | Yes | Yes | Yes |
+| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes |
+| Code Llama | 7B - 34B | LLM | Yes | Yes | Yes | No |
+| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes |
+| LLaVA(1.5,1.6) | 7B - 34B | MLLM | Yes | Yes | Yes | Yes |
+| InternVL | v1.1 - v1.5 | MLLM | Yes | Yes | Yes | Yes |
+| InternVL2 | 1-2B, 8B - 76B | MLLM | Yes | Yes\* | Yes\* | Yes |
+| InternVL2.5(MPO)\[2\] | 1 - 78B | MLLM | Yes | Yes\* | Yes\* | Yes |
+| ChemVLM | 8B - 26B | MLLM | Yes | Yes | Yes | Yes |
+| MiniCPM-Llama3-V-2_5 | - | MLLM | Yes | Yes | Yes | Yes |
+| MiniCPM-V-2_6 | - | MLLM | Yes | Yes | Yes | Yes |
+| MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes |
+| GLM4 | 9B | LLM | Yes | Yes | Yes | Yes |
+| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - |
+| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | No |
“-” 表示还没有验证。
```{note}
-* turbomind 引擎不支持 window attention。所以,对于应用了 window attention,并开启了对应的开关"use_sliding_window"的模型,比如 Mistral、Qwen1.5 等,在推理时,请选择 pytorch engine
-* 当模型的 head_dim 非 128 时,turbomind 不支持它的 kv cache 4/8 bit 量化和推理。比如,llama3.2-1B,qwen2-0.5B,internvl2-1B 等等
+* [1] turbomind 引擎不支持 window attention。所以,对于应用了 window attention,并开启了对应的开关"use_sliding_window"的模型,比如 Mistral、Qwen1.5 等,在推理时,请选择 pytorch engine
+* [2] 当模型的 head_dim 非 128 时,turbomind 不支持它的 kv cache 4/8 bit 量化和推理。比如,llama3.2-1B,qwen2-0.5B,internvl2-1B 等等
```
## PyTorchEngine CUDA 平台
-| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 |
-| :------------: | :---------: | :--: | :-------: | :-----: | :-----: | :--: | :---: |
-| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - |
-| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
-| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
-| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No |
-| Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No |
-| ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No |
-| Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No |
-| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No |
-| QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes |
-| QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes |
-| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No |
-| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
-| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
-| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | No |
-| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
-| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
-| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
-| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
-| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
-| Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No |
-| Dbrx | 132B | LLM | Yes | Yes | Yes | No | No |
-| StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No |
-| Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes |
-| Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - |
-| CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - |
-| CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - |
-| LLaVA(1.5,1.6) | 7B-34B | MLLM | Yes | Yes | Yes | - | - |
-| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
-| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | - | - |
-| Mono-InternVL | 2B | MLLM | Yes\* | Yes | Yes | - | - |
-| ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - |
-| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - |
-| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
-| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | No |
-| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - |
-| Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - |
-| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - |
-| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - |
+| Model | Size | Type | FP16/BF16 | KV INT8 | KV INT4 | W8A8 | W4A16 |
+| :----------------------------: | :---------: | :--: | :-------: | :-----: | :-----: | :--: | :---: |
+| Llama | 7B - 65B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Llama3 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Llama3.1 | 8B, 70B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Llama3.2 | 1B, 3B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Llama3.2-VL | 11B, 90B | MLLM | Yes | Yes | Yes | - | - |
+| InternLM | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
+| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes |
+| InternLM2.5 | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Baichuan2 | 7B | LLM | Yes | Yes | Yes | Yes | No |
+| Baichuan2 | 13B | LLM | Yes | Yes | Yes | No | No |
+| ChatGLM2 | 6B | LLM | Yes | Yes | Yes | No | No |
+| Falcon | 7B - 180B | LLM | Yes | Yes | Yes | No | No |
+| YI | 6B - 34B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Mistral | 7B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Mixtral | 8x7B, 8x22B | LLM | Yes | Yes | Yes | No | No |
+| QWen | 1.8B - 72B | LLM | Yes | Yes | Yes | Yes | Yes |
+| QWen1.5 | 0.5B - 110B | LLM | Yes | Yes | Yes | Yes | Yes |
+| QWen1.5-MoE | A2.7B | LLM | Yes | Yes | Yes | No | No |
+| QWen2 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
+| Qwen2.5 | 0.5B - 72B | LLM | Yes | Yes | No | Yes | Yes |
+| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | No | No | Yes |
+| DeepSeek-MoE | 16B | LLM | Yes | No | No | No | No |
+| DeepSeek-V2 | 16B, 236B | LLM | Yes | No | No | No | No |
+| DeepSeek-V2.5 | 236B | LLM | Yes | No | No | No | No |
+| MiniCPM3 | 4B | LLM | Yes | Yes | Yes | No | No |
+| MiniCPM-V-2_6 | 8B | LLM | Yes | No | No | No | Yes |
+| Gemma | 2B-7B | LLM | Yes | Yes | Yes | No | No |
+| Dbrx | 132B | LLM | Yes | Yes | Yes | No | No |
+| StarCoder2 | 3B-15B | LLM | Yes | Yes | Yes | No | No |
+| Phi-3-mini | 3.8B | LLM | Yes | Yes | Yes | Yes | Yes |
+| Phi-3-vision | 4.2B | MLLM | Yes | Yes | Yes | - | - |
+| CogVLM-Chat | 17B | MLLM | Yes | Yes | Yes | - | - |
+| CogVLM2-Chat | 19B | MLLM | Yes | Yes | Yes | - | - |
+| LLaVA(1.5,1.6)\[2\] | 7B-34B | MLLM | No | No | No | No | No |
+| InternVL(v1.5) | 2B-26B | MLLM | Yes | Yes | Yes | No | Yes |
+| InternVL2 | 1B-76B | MLLM | Yes | Yes | Yes | - | - |
+| InternVL2.5(MPO) | 1B-78B | MLLM | Yes | Yes | Yes | - | - |
+| Mono-InternVL\[1\] | 2B | MLLM | Yes\* | Yes | Yes | - | - |
+| ChemVLM | 8B-26B | MLLM | Yes | Yes | No | - | - |
+| Gemma2 | 9B-27B | LLM | Yes | Yes | Yes | - | - |
+| GLM4 | 9B | LLM | Yes | Yes | Yes | No | No |
+| GLM-4V | 9B | MLLM | Yes | Yes | Yes | No | Yes |
+| CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | - |
+| Phi-3.5-mini | 3.8B | LLM | Yes | Yes | No | - | - |
+| Phi-3.5-MoE | 16x3.8B | LLM | Yes | Yes | No | - | - |
+| Phi-3.5-vision | 4.2B | MLLM | Yes | Yes | No | - | - |
```{note}
-* 目前,Mono-InternVL不支持FP16,因为数值不稳定。请改用BF16。
+* [1] 目前,Mono-InternVL不支持FP16,因为数值不稳定。请改用BF16
+* [2] 自 0.6.4 之后,PyTorch 引擎移除了对 llava 模型原始格式的支持。我们建议使用它们对应的 transformers 格式的模型。这些模型可以在 https://huggingface.co/llava-hf 中找到
```
## PyTorchEngine 华为昇腾平台
diff --git a/lmdeploy/version.py b/lmdeploy/version.py
index f705fcb332..0b4b8a5379 100644
--- a/lmdeploy/version.py
+++ b/lmdeploy/version.py
@@ -1,7 +1,7 @@
# Copyright (c) OpenMMLab. All rights reserved.
from typing import Tuple
-__version__ = '0.6.4'
+__version__ = '0.6.5'
short_version = __version__