[Question] Deepseek R1 Distill Qwen 1.5B converted models have very large VRAM requirement. #3112

bhushangawde · 2025-01-28T10:12:37Z

I checked multiple converted deepseek r1 distill qwen 1.5B models on MLCChat app on iPhone 15 Plus and Google Pixel 8 pro. But all of them have a very high GPU memory requirement due to which it fails on iOS and Android both.

I tried with 3 models
https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q4f16_1-MLC
https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q0f16-MLC
https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q4f32_1-MLC

Is there a way to make this run on smartphone?

FATAL EXCEPTION: Thread-4
Process: ai.mlc.mlcchat, PID: 14195
org.apache.tvm.Base$TVMError: TVMError: Check failed: (output_res.IsOk()) is false: Insufficient GPU memory error: The available single GPU memory is 4352.000 MB, which is less than the sum of model weight size (1059.693 MB) and temporary buffer size (11891.183 MB).

kynasln · 2025-02-03T04:54:06Z

I successfully deployed and conducted Q&A on a Huawei Mate 60 phone with 16GB of memory by setting the context-window-size to 768.
https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q4f16_1-MLC

bhushangawde added the question Question about the usage label Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Deepseek R1 Distill Qwen 1.5B converted models have very large VRAM requirement. #3112

[Question] Deepseek R1 Distill Qwen 1.5B converted models have very large VRAM requirement. #3112

bhushangawde commented Jan 28, 2025

kynasln commented Feb 3, 2025

[Question] Deepseek R1 Distill Qwen 1.5B converted models have very large VRAM requirement. #3112

[Question] Deepseek R1 Distill Qwen 1.5B converted models have very large VRAM requirement. #3112

Comments

bhushangawde commented Jan 28, 2025

kynasln commented Feb 3, 2025