Skip to content

Commit

Permalink
bump version to v0.4.2 (#1644)
Browse files Browse the repository at this point in the history
* bump version to v0.4.2

* update latest news
  • Loading branch information
lvhan028 authored May 27, 2024
1 parent d7bf13f commit 54b7230
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 5 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ ______________________________________________________________________
<details open>
<summary><b>2024</b></summary>

- \[2024/05\] Support VLMs quantization, such as InternVL v1.5, LLaVa, InternLMXComposer2.
- \[2024/05\] Balance vision model when deploying VLMs with multiple GPUs
- \[2024/05\] Support 4-bits weight-only quantization and inference on VMLs, such as InternVL v1.5, LLaVa, InternLMXComposer2
- \[2024/04\] Support Llama3 and more VLMs, such as InternVL v1.1, v1.2, MiniGemini, InternLMXComposer2.
- \[2024/04\] TurboMind adds online int8/int4 KV cache quantization and inference for all supported devices. Refer [here](docs/en/quantization/kv_quant.md) for detailed guide
- \[2024/04\] TurboMind latest upgrade boosts GQA, rocketing the [internlm2-20b](https://huggingface.co/internlm/internlm2-20b) model inference to 16+ RPS, about 1.8x faster than vLLM.
Expand Down Expand Up @@ -122,6 +123,7 @@ For detailed inference benchmarks in more devices and more settings, please refe
<li>Gemma (2B - 7B)</li>
<li>Dbrx (132B)</li>
<li>Phi-3-mini (3.8B)</li>
<li>StarCoder2 (3B - 15B)</li>
</ul>
</td>
<td>
Expand All @@ -133,7 +135,6 @@ For detailed inference benchmarks in more devices and more settings, please refe
<li>DeepSeek-VL (7B)</li>
<li>InternVL-Chat (v1.1-v1.5)</li>
<li>MiniGeminiLlama (7B)</li>
<li>StarCoder2 (3B - 15B)</li>
</ul>
</td>
</tr>
Expand Down
5 changes: 3 additions & 2 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ ______________________________________________________________________
<details open>
<summary><b>2024</b></summary>

- \[2024/05\] 支持 InternVL v1.5, LLaVa, InternLMXComposer2 等 VLM 模型的量化与推理。
- \[2024/05\] 在多 GPU 上部署 VLM 模型时,支持把视觉部分的模型均分到多卡上
- \[2024/05\] 支持InternVL v1.5, LLaVa, InternLMXComposer2 等 VLMs 模型的 4bit 权重量化和推理
- \[2024/04\] 支持 Llama3 和 InternVL v1.1, v1.2,MiniGemini,InternLM-XComposer2 等 VLM 模型
- \[2024/04\] TurboMind 支持 kv cache int4/int8 在线量化和推理,适用已支持的所有型号显卡。详情请参考[这里](docs/zh_cn/quantization/kv_quant.md)
- \[2024/04\] TurboMind 引擎升级,优化 GQA 推理。[internlm2-20b](https://huggingface.co/internlm/internlm2-20b) 推理速度达 16+ RPS,约是 vLLM 的 1.8 倍
Expand Down Expand Up @@ -123,6 +124,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>Gemma (2B - 7B)</li>
<li>Dbrx (132B)</li>
<li>Phi-3-mini (3.8B)</li>
<li>StarCoder2 (3B - 15B)</li>
</ul>
</td>
<td>
Expand All @@ -134,7 +136,6 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>DeepSeek-VL (7B)</li>
<li>InternVL-Chat (v1.1-v1.5)</li>
<li>MiniGeminiLlama (7B)</li>
<li>StarCoder2 (3B - 15B)</li>
</ul>
</td>
</tr>
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/version.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright (c) OpenMMLab. All rights reserved.
from typing import Tuple

__version__ = '0.4.1'
__version__ = '0.4.2'
short_version = __version__


Expand Down

0 comments on commit 54b7230

Please sign in to comment.