Skip to content

Commit dd4bfae

Browse files
Add MiniCPM-V-4.0 to MiniCPM-V notebook (#3047)
Co-authored-by: Aleksandr Mokrov <[email protected]>
1 parent 561690c commit dd4bfae

File tree

3 files changed

+155
-44
lines changed

3 files changed

+155
-44
lines changed

notebooks/minicpm-v-multimodal-chatbot/README.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,7 @@
1-
# Visual-language assistant with MiniCPM-V2 and OpenVINO
1+
# Visual-language assistant with MiniCPM-V and OpenVINO
22

3-
MiniCPM-V 2 is a strong multimodal large language model for efficient end-side deployment. MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over previous versions, and introduces new features for multi-image and video understanding.
4-
5-
More details about model can be found in [model card](https://huggingface.co/openbmb/MiniCPM-V-2_6) and original [repo](https://github.com/OpenBMB/MiniCPM-V).
3+
MiniCPM-V 4.0 is the latest efficient model in the MiniCPM-V series. The model is built based on SigLIP2-400M and MiniCPM4-3B with a total of 4.1B parameters. It inherits the strong single-image, multi-image and video understanding performance of MiniCPM-V 2.6 with largely improved efficiency.
4+
More details about model can be found in [model card](https://huggingface.co/openbmb/MiniCPM-V-4) and original [repo](https://github.com/OpenBMB/MiniCPM-V).
65

76
In this tutorial we consider how to convert and optimize MiniCPM-V2.6 model for creating multimodal chatbot. Additionally, we demonstrate how to apply stateful transformation on LLM part and model optimization techniques like weights compression using [NNCF](https://github.com/openvinotoolkit/nncf)
87

notebooks/minicpm-v-multimodal-chatbot/gradio_helper.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
Image.open(requests.get(url, stream=True).raw).save(file_name)
2525

2626

27-
def make_demo(model):
27+
def make_demo(model, mode_name):
2828
import openvino_genai as ov_genai
2929
import openvino as ov
3030

@@ -119,7 +119,7 @@ def generate_and_signal_complete():
119119
additional_buttons = {"undo_button": None, "retry_button": None}
120120
demo = gr.ChatInterface(
121121
fn=bot_streaming,
122-
title="MiniCPMV2 OpenVINO Chatbot",
122+
title=f"{mode_name} OpenVINO Chatbot",
123123
examples=[
124124
{"text": "What is on the flower?", "files": ["./bee.jpg"]},
125125
{"text": "How to make this pastry?", "files": ["./baklava.png"]},

notebooks/minicpm-v-multimodal-chatbot/minicpm-v-multimodal-chatbot.ipynb

Lines changed: 150 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,19 @@
66
"id": "5918b41c-dad7-4f7b-9e39-b3026933dddf",
77
"metadata": {},
88
"source": [
9-
"# Visual-language assistant with MiniCPM-V2 and OpenVINO\n",
9+
"# Visual-language assistant with MiniCPM-V and OpenVINO\n",
1010
"\n",
11-
"MiniCPM-V 2 is a strong multimodal large language model for efficient end-side deployment. MiniCPM-V 2.6 is the latest and most capable model in the MiniCPM-V series. The model is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over previous versions, and introduces new features for multi-image and video understanding.\n",
11+
"MiniCPM-V 4.0 is the latest efficient model in the MiniCPM-V series. The model is built based on SigLIP2-400M and MiniCPM4-3B with a total of 4.1B parameters. It inherits the strong single-image, multi-image and video understanding performance of MiniCPM-V 2.6 with largely improved efficiency. \n",
12+
"More details about model can be found in [model card](https://huggingface.co/openbmb/MiniCPM-V-4) and original [repo](https://github.com/OpenBMB/MiniCPM-V).\n",
1213
"\n",
13-
"More details about model can be found in [model card](https://huggingface.co/openbmb/MiniCPM-V-2_6) and original [repo](https://github.com/OpenBMB/MiniCPM-V).\n",
1414
"\n",
15-
"In this tutorial we consider how to convert and optimize MiniCPM-V2 model for creating multimodal chatbot. Additionally, we demonstrate how to apply stateful transformation on LLM part and model optimization techniques like weights compression using [NNCF](https://github.com/openvinotoolkit/nncf)\n",
15+
"In this tutorial we consider how to convert and optimize MiniCPM-V model for creating multimodal chatbot. Additionally, we demonstrate how to apply stateful transformation on LLM part and model optimization techniques like weights compression using [NNCF](https://github.com/openvinotoolkit/nncf)\n",
1616
"\n",
1717
"#### Table of contents:\n",
1818
"\n",
1919
"- [Prerequisites](#Prerequisites)\n",
2020
"- [Convert model to OpenVINO Intermediate Representation](#Convert-model-to-OpenVINO-Intermediate-Representation)\n",
21+
" - [Select model](#Select-model)\n",
2122
" - [Compress Language Model Weights to 4 bits](#Compress-Language-Model-Weights-to-4-bits)\n",
2223
"- [Prepare model inference pipeline](#Prepare-model-inference-pipeline)\n",
2324
" - [Select device](#Select-device)\n",
@@ -47,14 +48,49 @@
4748
},
4849
{
4950
"cell_type": "code",
50-
"execution_count": 1,
51+
"execution_count": null,
5152
"id": "0116846d-da6f-4e81-b6be-0a882a3eb872",
5253
"metadata": {},
53-
"outputs": [],
54+
"outputs": [
55+
{
56+
"name": "stdout",
57+
"output_type": "stream",
58+
"text": [
59+
"\n",
60+
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m25.1.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.2\u001b[0m\n",
61+
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
62+
"Note: you may need to restart the kernel to use updated packages.\n",
63+
"\n",
64+
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m25.1.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.2\u001b[0m\n",
65+
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
66+
"Note: you may need to restart the kernel to use updated packages.\n",
67+
"\n",
68+
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m25.1.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.2\u001b[0m\n",
69+
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
70+
"Note: you may need to restart the kernel to use updated packages.\n",
71+
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
72+
"optimum-intel 1.26.0.dev0+7c64417 requires optimum==1.27.*, but you have optimum 2.0.0.dev0 which is incompatible.\u001b[0m\u001b[31m\n",
73+
"\u001b[0m\n",
74+
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m25.1.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.2\u001b[0m\n",
75+
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
76+
"Note: you may need to restart the kernel to use updated packages.\n",
77+
"\n",
78+
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m25.1.1\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.2\u001b[0m\n",
79+
"\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
80+
"Note: you may need to restart the kernel to use updated packages.\n"
81+
]
82+
}
83+
],
5484
"source": [
85+
"import platform\n",
86+
"\n",
87+
"if platform.system() == \"Darwin\":\n",
88+
" %pip install -q \"numpy<2.0.0\"\n",
89+
"\n",
5590
"%pip install -q \"torch>=2.1\" \"torchvision\" \"timm>=0.9.2\" \"transformers>=4.45\" \"Pillow\" \"gradio>=4.40\" \"tqdm\" \"sentencepiece\" \"peft\" \"huggingface-hub>=0.24.0\" --extra-index-url https://download.pytorch.org/whl/cpu\n",
5691
"%pip install -q \"nncf>=2.14.0\"\n",
57-
"%pip install -q \"git+https://github.com/huggingface/optimum-intel.git\" --extra-index-url https://download.pytorch.org/whl/cpu\n",
92+
"%pip install -q \"git+https://github.com/openvino-dev-samples/optimum-intel.git@minicpm4v\" --extra-index-url https://download.pytorch.org/whl/cpu\n",
93+
"%pip install -q \"git+https://github.com/openvino-dev-samples/optimum.git@minicpm4v\" --extra-index-url https://download.pytorch.org/whl/cpu\n",
5894
"%pip install -q -U --pre \"openvino>=2025.0\" \"openvino-tokenizers>=2025.0\" \"openvino-genai>=2025.0\" --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly"
5995
]
6096
},
@@ -110,6 +146,59 @@
110146
"\n",
111147
"where task is task to export the model for, if not specified, the task will be auto-inferred based on the model. You can find a mapping between tasks and model classes in Optimum TaskManager [documentation](https://huggingface.co/docs/optimum/exporters/task_manager). Additionally, you can specify weights compression using `--weight-format` argument with one of following options: `fp32`, `fp16`, `int8` and `int4`. Fro int8 and int4 [nncf](https://github.com/openvinotoolkit/nncf) will be used for weight compression. More details about model export provided in [Optimum Intel documentation](https://huggingface.co/docs/optimum/intel/openvino/export#export-your-model).\n",
112148
"\n",
149+
"## Select model\n",
150+
"[back to top ⬆️](#Table-of-contents:)\n",
151+
"\n",
152+
"* **MiniCPM-V-4**: MiniCPM-V 4.0 is the latest efficient model in the MiniCPM-V series. The model is built based on SigLIP2-400M and MiniCPM4-3B with a total of 4.1B parameters. It inherits the strong single-image, multi-image and video understanding performance of MiniCPM-V 2.6 with largely improved efficiency. \n",
153+
"* **MiniCPM-V-2_6**: MiniCPM-V 2.6 is built on SigLip-400M and Qwen2-7B with a total of 8B parameters. It exhibits a significant performance improvement over MiniCPM-Llama3-V 2.5, and introduces new features for multi-image and video understanding."
154+
]
155+
},
156+
{
157+
"cell_type": "code",
158+
"execution_count": 3,
159+
"id": "a0851b3c",
160+
"metadata": {
161+
"test_replace": {
162+
"openbmb/MiniCPM-V-4": "katuni4ka/tiny-random-minicpmv-2_6"
163+
}
164+
},
165+
"outputs": [
166+
{
167+
"data": {
168+
"application/vnd.jupyter.widget-view+json": {
169+
"model_id": "289c2574f5604076bdcd8eccabc4a14f",
170+
"version_major": 2,
171+
"version_minor": 0
172+
},
173+
"text/plain": [
174+
"Dropdown(description='Model:', options=('openbmb/MiniCPM-V-4', 'openbmb/MiniCPM-V-2_6'), value='openbmb/MiniCP…"
175+
]
176+
},
177+
"execution_count": 3,
178+
"metadata": {},
179+
"output_type": "execute_result"
180+
}
181+
],
182+
"source": [
183+
"import ipywidgets as widgets\n",
184+
"\n",
185+
"model_ids = [\"openbmb/MiniCPM-V-4\", \"openbmb/MiniCPM-V-2_6\"]\n",
186+
"\n",
187+
"model_selector = widgets.Dropdown(\n",
188+
" options=model_ids,\n",
189+
" default=model_ids[0],\n",
190+
" description=\"Model:\",\n",
191+
")\n",
192+
"\n",
193+
"\n",
194+
"model_selector"
195+
]
196+
},
197+
{
198+
"cell_type": "markdown",
199+
"id": "59dcd94b",
200+
"metadata": {},
201+
"source": [
113202
"### Compress Language Model Weights to 4 bits\n",
114203
"[back to top ⬆️](#Table-of-contents:)\n",
115204
"\n",
@@ -134,20 +223,60 @@
134223
},
135224
{
136225
"cell_type": "code",
137-
"execution_count": 3,
226+
"execution_count": null,
138227
"id": "82e846bb",
139-
"metadata": {
140-
"test_replace": {
141-
"openbmb/MiniCPM-V-2_6": "katuni4ka/tiny-random-minicpmv-2_6"
142-
}
143-
},
228+
"metadata": {},
144229
"outputs": [
230+
{
231+
"data": {
232+
"text/markdown": [
233+
"**Export command:**"
234+
],
235+
"text/plain": [
236+
"<IPython.core.display.Markdown object>"
237+
]
238+
},
239+
"metadata": {},
240+
"output_type": "display_data"
241+
},
242+
{
243+
"data": {
244+
"text/markdown": [
245+
"`optimum-cli export openvino --model openbmb/MiniCPM-V-4 MiniCPM-V-4-ov --trust-remote-code --weight-format fp16 --task image-text-to-text`"
246+
],
247+
"text/plain": [
248+
"<IPython.core.display.Markdown object>"
249+
]
250+
},
251+
"metadata": {},
252+
"output_type": "display_data"
253+
},
145254
{
146255
"name": "stdout",
147256
"output_type": "stream",
148257
"text": [
149-
"INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino\n"
258+
"WARNING:nncf:NNCF provides best results with torch==2.7.*, while current torch version is 2.5.1+cpu. If you encounter issues, consider switching to torch==2.7.*\n",
259+
"INFO:nncf:Statistics of the bitwidth distribution:\n",
260+
"┍━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┯━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┑\n",
261+
"│ Weight compression mode │ % all parameters (layers) │ % ratio-defining parameters (layers) │\n",
262+
"┝━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┿━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┥\n",
263+
"│ int4_sym │ 100% (225 / 225) │ 100% (225 / 225) │\n",
264+
"┕━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┷━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┙\n"
150265
]
266+
},
267+
{
268+
"data": {
269+
"application/vnd.jupyter.widget-view+json": {
270+
"model_id": "e5a6ec13d42f41109d029aced33475ff",
271+
"version_major": 2,
272+
"version_minor": 0
273+
},
274+
"text/plain": [
275+
"Output()"
276+
]
277+
},
278+
"metadata": {},
279+
"output_type": "display_data"
151280
}
152281
],
153282
"source": [
@@ -174,11 +303,10 @@
174303
" shutil.move(ov_int4_model_path.with_suffix(\".bin\"), ov_model_path.with_suffix(\".bin\"))\n",
175304
"\n",
176305
"\n",
177-
"model_id = \"openbmb/MiniCPM-V-2_6\"\n",
178-
"model_dir = Path(model_id.split(\"/\")[-1] + \"-ov\")\n",
306+
"model_dir = Path(model_selector.value.split(\"/\")[-1] + \"-ov\")\n",
179307
"\n",
180308
"if not model_dir.exists():\n",
181-
" optimum_cli(model_id, model_dir, additional_args={\"trust-remote-code\": \"\", \"weight-format\": \"fp16\", \"task\": \"image-text-to-text\"})\n",
309+
" optimum_cli(model_selector.value, model_dir, additional_args={\"trust-remote-code\": \"\", \"weight-format\": \"fp16\", \"task\": \"image-text-to-text\"})\n",
182310
" compress_lm_weights(model_dir)"
183311
]
184312
},
@@ -213,26 +341,10 @@
213341
},
214342
{
215343
"cell_type": "code",
216-
"execution_count": 1,
344+
"execution_count": null,
217345
"id": "626fef57",
218346
"metadata": {},
219-
"outputs": [
220-
{
221-
"data": {
222-
"application/vnd.jupyter.widget-view+json": {
223-
"model_id": "2362638a795340e6b3effb0805848768",
224-
"version_major": 2,
225-
"version_minor": 0
226-
},
227-
"text/plain": [
228-
"Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')"
229-
]
230-
},
231-
"execution_count": 1,
232-
"metadata": {},
233-
"output_type": "execute_result"
234-
}
235-
],
347+
"outputs": [],
236348
"source": [
237349
"from notebook_utils import device_widget\n",
238350
"\n",
@@ -243,7 +355,7 @@
243355
},
244356
{
245357
"cell_type": "code",
246-
"execution_count": 5,
358+
"execution_count": null,
247359
"id": "e7af404b",
248360
"metadata": {},
249361
"outputs": [],
@@ -394,7 +506,7 @@
394506
"source": [
395507
"from gradio_helper import make_demo\n",
396508
"\n",
397-
"demo = make_demo(ov_model)\n",
509+
"demo = make_demo(ov_model, model_selector.value.split(\"/\")[-1])\n",
398510
"\n",
399511
"try:\n",
400512
" demo.launch(debug=True, height=600)\n",
@@ -422,7 +534,7 @@
422534
"name": "python",
423535
"nbconvert_exporter": "python",
424536
"pygments_lexer": "ipython3",
425-
"version": "3.11.4"
537+
"version": "3.10.12"
426538
},
427539
"openvino_notebooks": {
428540
"imageUrl": "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/7b0919ea-6fe4-4c8f-8395-cb0ee6e87937",

0 commit comments

Comments
 (0)