You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/llm/pipeline.md
+43-14
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ You can overview the detailed pipeline API in [this](https://lmdeploy.readthedoc
6
6
7
7
## Usage
8
8
9
-
-**An example using default parameters:**
9
+
### A 'Hello, world' example
10
10
11
11
```python
12
12
from lmdeploy import pipeline
@@ -40,7 +40,7 @@ There have been alterations to the strategy for setting the k/v cache ratio thro
40
40
41
41
The allocation strategy for k/v cache is changed to reserve space from the **GPU free memory** proportionally. The ratio `TurbomindEngineConfig.cache_max_entry_count` has been adjusted to 0.8 by default. If OOM error happens, similar to the method mentioned above, please consider reducing the ratio value to decrease the memory usage of the k/v cache.
42
42
43
-
-**An example showing how to set tensor parallel num**:
43
+
### Set tensor parallelism
44
44
45
45
```python
46
46
from lmdeploy import pipeline, TurbomindEngineConfig
Copy file name to clipboardExpand all lines: docs/en/multi_modal/vl_pipeline.md
+17-20
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ LMDeploy abstracts the complex inference process of multi-modal Vision-Language
4
4
5
5
The supported models are listed [here](../supported_models/supported_models.md). We genuinely invite the community to contribute new VLM support to LMDeploy. Your involvement is truly appreciated.
6
6
7
-
This article showcases the VLM pipeline using the [liuhaotian/llava-v1.6-vicuna-7b](https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b) model as a case study.
7
+
This article showcases the VLM pipeline using the [OpenGVLab/InternVL2_5-8B](https://huggingface.co/OpenGVLab/InternVL2_5-8B) model as a case study.
8
8
You'll learn about the simplest ways to leverage the pipeline and how to gradually unlock more advanced features by adjusting engine parameters and generation arguments, such as tensor parallelism, context window sizing, random sampling, and chat template customization.
9
9
Moreover, we will provide practical inference examples tailored to scenarios with multiple images, batch prompts etc.
10
10
@@ -16,7 +16,7 @@ Using the pipeline interface to infer other VLM models is similar, with the main
0 commit comments