Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Deepseek R1 Distill Qwen 1.5B converted models have very large VRAM requirement. #3112

Open
bhushangawde opened this issue Jan 28, 2025 · 1 comment
Labels
question Question about the usage

Comments

@bhushangawde
Copy link

I checked multiple converted deepseek r1 distill qwen 1.5B models on MLCChat app on iPhone 15 Plus and Google Pixel 8 pro. But all of them have a very high GPU memory requirement due to which it fails on iOS and Android both.

I tried with 3 models
https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q4f16_1-MLC
https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q0f16-MLC
https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q4f32_1-MLC

Is there a way to make this run on smartphone?

FATAL EXCEPTION: Thread-4
Process: ai.mlc.mlcchat, PID: 14195
org.apache.tvm.Base$TVMError: TVMError: Check failed: (output_res.IsOk()) is false: Insufficient GPU memory error: The available single GPU memory is 4352.000 MB, which is less than the sum of model weight size (1059.693 MB) and temporary buffer size (11891.183 MB).

@bhushangawde bhushangawde added the question Question about the usage label Jan 28, 2025
@kynasln
Copy link

kynasln commented Feb 3, 2025

I successfully deployed and conducted Q&A on a Huawei Mate 60 phone with 16GB of memory by setting the context-window-size to 768.
https://huggingface.co/mlc-ai/DeepSeek-R1-Distill-Qwen-1.5B-q4f16_1-MLC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question about the usage
Projects
None yet
Development

No branches or pull requests

2 participants