[Bugfix][IPEX] Add `VLLM_CPU_MOE_PREPACK` to allow disabling MoE prepack when CPU does not support it #14681

gau-nernst · 2025-03-12T13:43:40Z

On my Ryzen 5600

vllm serve deepseek-ai/DeepSeek-V2-Lite-Chat --trust-remote-code --max-model-len 4000

curl http://0.0.0.0:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "deepseek-ai/DeepSeek-V2-Lite-Chat", "prompt": "Hello"}'

ERROR 03-12 21:39:46 [engine.py:141]   File "/home/thien/code/vllm_cpu/.venv/lib/python3.11/site-packages/intel_extension_for_pytorch/frontend.py", line 557, in optimize
ERROR 03-12 21:39:46 [engine.py:141]     assert core.onednn_has_bf16_support(), (
ERROR 03-12 21:39:46 [engine.py:141]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 21:39:46 [engine.py:141] AssertionError: BF16 weight prepack needs the cpu support avx_ne_convert or avx512bw, avx512vl and avx512dq, but the desired instruction sets are not available. Please set dtype to torch.float or set weights_prepack to False.

Hence, I use the value of core.onednn_has_bf16_support() to decide whether to enable weight prepack.

Signed-off-by: Thien Tran <[email protected]>

github-actions · 2025-03-12T13:43:50Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gau-nernst · 2025-03-13T00:14:52Z

@bigPYJ1151 Do you mind taking a look at this PR? Thank you

bigPYJ1151 · 2025-03-13T08:34:21Z

vllm/model_executor/layers/fused_moe/layer.py

                    layer.w13_weight,
                    layer.w2_weight,
-                    use_prepack=True,
+                    use_prepack=ipex._C.onednn_has_bf16_support(),


Perhaps make this config as a environment variable is better, such as VLLM_CPU_MOE_PREPACK, by default is False.

You can refer to VLLM_CPU_OMP_THREADS_BIND for adding an env in vLLM.

I have made the change. I set True by default to maintain current behavior, unless you want to change the default to False instead? (Then we probably need to document this flag somewhere e.g. CPU perf)

No problem.
Please also update the section Related runtime environment variables in docs/source/getting_started/installation/cpu.md, thanks :)

Signed-off-by: Thien Tran <[email protected]>

bigPYJ1151 · 2025-03-13T09:14:38Z

LGTM, thanks for your fix!
@Isotr0py Please help to take a look, thanks :)

…ack when CPU does not support it (vllm-project#14681) Signed-off-by: Thien Tran <[email protected]> Signed-off-by: Richard Liu <[email protected]>

…ack when CPU does not support it (vllm-project#14681) Signed-off-by: Thien Tran <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

…ack when CPU does not support it (vllm-project#14681) Signed-off-by: Thien Tran <[email protected]>

…ack when CPU does not support it (vllm-project#14681) Signed-off-by: Thien Tran <[email protected]> Signed-off-by: Mu Huai <[email protected]>

don't use prepack without support

d8d4d5b

Signed-off-by: Thien Tran <[email protected]>

Merge branch 'main' into ipex_moe_prepack

9cb84a7

bigPYJ1151 reviewed Mar 13, 2025

View reviewed changes

gau-nernst added 2 commits March 13, 2025 16:55

add VLLM_CPU_MOE_PREPACK env var

62d1c0f

Signed-off-by: Thien Tran <[email protected]>

update doc

1504650

Signed-off-by: Thien Tran <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Mar 13, 2025

gau-nernst changed the title ~~[Bugfix][IPEX] use_prepack=False for MoE when it's not supported~~ [Bugfix][IPEX] Add VLLM_CPU_MOE_PREPACK to allow disabling MoE prepack when CPU does not support it Mar 13, 2025

Isotr0py approved these changes Mar 13, 2025

View reviewed changes

Isotr0py enabled auto-merge (squash) March 13, 2025 10:42

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 13, 2025

Merge branch 'main' into ipex_moe_prepack

1b16188

vllm-bot merged commit 95d680b into vllm-project:main Mar 14, 2025
28 of 31 checks passed

gau-nernst deleted the ipex_moe_prepack branch March 14, 2025 03:45

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

[Bugfix][IPEX] Add VLLM_CPU_MOE_PREPACK to allow disabling MoE prep…

63870f9

…ack when CPU does not support it (vllm-project#14681) Signed-off-by: Thien Tran <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix][IPEX] Add `VLLM_CPU_MOE_PREPACK` to allow disabling MoE prepack when CPU does not support it #14681

[Bugfix][IPEX] Add `VLLM_CPU_MOE_PREPACK` to allow disabling MoE prepack when CPU does not support it #14681

Uh oh!

gau-nernst commented Mar 12, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 12, 2025

Uh oh!

gau-nernst commented Mar 13, 2025

Uh oh!

bigPYJ1151 Mar 13, 2025

Uh oh!

gau-nernst Mar 13, 2025

Uh oh!

bigPYJ1151 Mar 13, 2025

Uh oh!

bigPYJ1151 commented Mar 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

[Bugfix][IPEX] Add VLLM_CPU_MOE_PREPACK to allow disabling MoE prepack when CPU does not support it #14681

[Bugfix][IPEX] Add VLLM_CPU_MOE_PREPACK to allow disabling MoE prepack when CPU does not support it #14681

Uh oh!

Conversation

gau-nernst commented Mar 12, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 12, 2025

Uh oh!

gau-nernst commented Mar 13, 2025

Uh oh!

bigPYJ1151 Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

gau-nernst Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

bigPYJ1151 Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

bigPYJ1151 commented Mar 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Bugfix][IPEX] Add `VLLM_CPU_MOE_PREPACK` to allow disabling MoE prepack when CPU does not support it #14681

[Bugfix][IPEX] Add `VLLM_CPU_MOE_PREPACK` to allow disabling MoE prepack when CPU does not support it #14681

gau-nernst commented Mar 12, 2025 •

edited by github-actions bot

Loading