diff --git a/README.md b/README.md index 5b6ad47bdf..d160338aa6 100644 --- a/README.md +++ b/README.md @@ -167,6 +167,7 @@ For detailed inference benchmarks in more devices and more settings, please refe
  • Phi-3.5-vision (4.2B)
  • GLM-4V (9B)
  • Llama3.2-vision (11B, 90B)
  • +
  • Molmo (7B-D,72B)
  • diff --git a/README_ja.md b/README_ja.md index bdd9ddb02d..fda176229e 100644 --- a/README_ja.md +++ b/README_ja.md @@ -163,6 +163,7 @@ LMDeploy TurboMindエンジンは卓越した推論能力を持ち、さまざ
  • Phi-3.5-vision (4.2B)
  • GLM-4V (9B)
  • Llama3.2-vision (11B, 90B)
  • +
  • Molmo (7B-D,72B)
  • diff --git a/README_zh-CN.md b/README_zh-CN.md index 550922d081..6c24b2e500 100644 --- a/README_zh-CN.md +++ b/README_zh-CN.md @@ -168,6 +168,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
  • Phi-3.5-vision (4.2B)
  • GLM-4V (9B)
  • Llama3.2-vision (11B, 90B)
  • +
  • Molmo (7B-D,72B)
  • diff --git a/docs/en/get_started/installation.md b/docs/en/get_started/installation.md index b7d03b28a6..b3e8bb8abd 100644 --- a/docs/en/get_started/installation.md +++ b/docs/en/get_started/installation.md @@ -23,7 +23,7 @@ pip install lmdeploy The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by: ```shell -export LMDEPLOY_VERSION=0.6.2 +export LMDEPLOY_VERSION=0.6.3 export PYTHON_VERSION=38 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118 ``` diff --git a/docs/en/multi_modal/vl_pipeline.md b/docs/en/multi_modal/vl_pipeline.md index 4881b99071..9632c9e6df 100644 --- a/docs/en/multi_modal/vl_pipeline.md +++ b/docs/en/multi_modal/vl_pipeline.md @@ -2,24 +2,14 @@ LMDeploy abstracts the complex inference process of multi-modal Vision-Language Models (VLM) into an easy-to-use pipeline, similar to the Large Language Model (LLM) inference [pipeline](../llm/pipeline.md). -Currently, it supports the following models. - -- [Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) -- LLaVA series: [v1.5](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e), [v1.6](https://huggingface.co/collections/liuhaotian/llava-16-65b9e40155f60fd046a5ccf2) -- [Yi-VL](https://huggingface.co/01-ai/Yi-VL-6B) -- [DeepSeek-VL](https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat) -- [InternVL](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5) -- [Mono-InternVL](https://huggingface.co/OpenGVLab/Mono-InternVL-2B) -- [MGM](https://huggingface.co/YanweiLi/MGM-7B) -- [XComposer](https://huggingface.co/internlm/internlm-xcomposer2-vl-7b) -- [CogVLM](https://github.com/InternLM/lmdeploy/tree/main/docs/en/multi_modal/cogvlm.md) - -We genuinely invite the community to contribute new VLM support to LMDeploy. Your involvement is truly appreciated. +The supported models are listed [here](../supported_models/supported_models.md). We genuinely invite the community to contribute new VLM support to LMDeploy. Your involvement is truly appreciated. This article showcases the VLM pipeline using the [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) model as a case study. You'll learn about the simplest ways to leverage the pipeline and how to gradually unlock more advanced features by adjusting engine parameters and generation arguments, such as tensor parallelism, context window sizing, random sampling, and chat template customization. Moreover, we will provide practical inference examples tailored to scenarios with multiple images, batch prompts etc. +Using the pipeline interface to infer other VLM models is similar, with the main difference being the configuration and installation dependencies of the models. You can read [here](https://lmdeploy.readthedocs.io/en/latest/multi_modal/index.html) for environment installation and configuration methods for different models. + ## A 'Hello, world' example ```python diff --git a/docs/en/supported_models/supported_models.md b/docs/en/supported_models/supported_models.md index 90ca90388b..a122f10ec8 100644 --- a/docs/en/supported_models/supported_models.md +++ b/docs/en/supported_models/supported_models.md @@ -36,6 +36,7 @@ The following tables detail the models supported by LMDeploy's TurboMind engine | MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes | | GLM4 | 9B | LLM | Yes | Yes | Yes | Yes | | CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | +| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | NO | "-" means not verified yet. diff --git a/docs/zh_cn/get_started/installation.md b/docs/zh_cn/get_started/installation.md index 3108d64815..12562c51d5 100644 --- a/docs/zh_cn/get_started/installation.md +++ b/docs/zh_cn/get_started/installation.md @@ -23,7 +23,7 @@ pip install lmdeploy 默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3),你可以使用以下命令安装 lmdeploy: ```shell -export LMDEPLOY_VERSION=0.6.2 +export LMDEPLOY_VERSION=0.6.3 export PYTHON_VERSION=38 pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118 ``` diff --git a/docs/zh_cn/multi_modal/vl_pipeline.md b/docs/zh_cn/multi_modal/vl_pipeline.md index 570598311a..35f647e36c 100644 --- a/docs/zh_cn/multi_modal/vl_pipeline.md +++ b/docs/zh_cn/multi_modal/vl_pipeline.md @@ -2,24 +2,14 @@ LMDeploy 把视觉-语言模型(VLM)复杂的推理过程,抽象为简单好用的 pipeline。它的用法与大语言模型(LLM)推理 [pipeline](../llm/pipeline.md) 类似。 -目前,VLM pipeline 支持以下模型: - -- [Qwen-VL-Chat](https://huggingface.co/Qwen/Qwen-VL-Chat) -- LLaVA series: [v1.5](https://huggingface.co/collections/liuhaotian/llava-15-653aac15d994e992e2677a7e), [v1.6](https://huggingface.co/collections/liuhaotian/llava-16-65b9e40155f60fd046a5ccf2) -- [Yi-VL](https://huggingface.co/01-ai/Yi-VL-6B) -- [DeepSeek-VL](https://huggingface.co/deepseek-ai/deepseek-vl-7b-chat) -- [InternVL](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5) -- [Mono-InternVL](https://huggingface.co/OpenGVLab/Mono-InternVL-2B) -- [MGM](https://huggingface.co/YanweiLi/MGM-7B) -- [XComposer](https://huggingface.co/internlm/internlm-xcomposer2-vl-7b) -- [CogVLM](https://github.com/InternLM/lmdeploy/tree/main/docs/zh_cn/multi_modal/cogvlm.md) - -我们诚挚邀请社区在 LMDeploy 中添加更多 VLM 模型的支持。 +在[这个列表中](../supported_models/supported_models.md),你可以查阅每个推理引擎支持的 VLM 模型。我们诚挚邀请社区在 LMDeploy 中添加更多 VLM 模型。 本文将以 [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) 模型为例,展示 VLM pipeline 的用法。你将了解它的最基础用法,以及如何通过调整引擎参数和生成条件来逐步解锁更多高级特性,如张量并行,上下文窗口大小调整,随机采样,以及对话模板的定制。 此外,我们还提供针对多图、批量提示词等场景的实际推理示例。 +使用 pipeline 接口推理其他 VLM 模型,大同小异,主要区别在于模型依赖的配置和安装。你可以阅读[此处](https://lmdeploy.readthedocs.io/zh-cn/latest/multi_modal/),查看不同模型的环境安装和配置方式 + ## "Hello, world" 示例 ```python diff --git a/docs/zh_cn/supported_models/supported_models.md b/docs/zh_cn/supported_models/supported_models.md index fecfdee200..f3ffd4311d 100644 --- a/docs/zh_cn/supported_models/supported_models.md +++ b/docs/zh_cn/supported_models/supported_models.md @@ -36,6 +36,7 @@ | MiniGeminiLlama | 7B | MLLM | Yes | - | - | Yes | | GLM4 | 9B | LLM | Yes | Yes | Yes | Yes | | CodeGeeX4 | 9B | LLM | Yes | Yes | Yes | - | +| Molmo | 7B-D,72B | MLLM | Yes | Yes | Yes | NO | “-” 表示还没有验证。 diff --git a/lmdeploy/version.py b/lmdeploy/version.py index b9f76b5761..d9f4307a78 100644 --- a/lmdeploy/version.py +++ b/lmdeploy/version.py @@ -1,7 +1,7 @@ # Copyright (c) OpenMMLab. All rights reserved. from typing import Tuple -__version__ = '0.6.2' +__version__ = '0.6.3' short_version = __version__