Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix][IPEX] Add VLLM_CPU_MOE_PREPACK to allow disabling MoE prepack when CPU does not support it #14681

Merged
merged 5 commits into from
Mar 14, 2025

Conversation

gau-nernst
Copy link
Contributor

@gau-nernst gau-nernst commented Mar 12, 2025

On my Ryzen 5600

vllm serve deepseek-ai/DeepSeek-V2-Lite-Chat --trust-remote-code --max-model-len 4000
curl http://0.0.0.0:8000/v1/completions -H "Content-Type: application/json" -d '{"model": "deepseek-ai/DeepSeek-V2-Lite-Chat", "prompt": "Hello"}'
ERROR 03-12 21:39:46 [engine.py:141]   File "/home/thien/code/vllm_cpu/.venv/lib/python3.11/site-packages/intel_extension_for_pytorch/frontend.py", line 557, in optimize
ERROR 03-12 21:39:46 [engine.py:141]     assert core.onednn_has_bf16_support(), (
ERROR 03-12 21:39:46 [engine.py:141]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 03-12 21:39:46 [engine.py:141] AssertionError: BF16 weight prepack needs the cpu support avx_ne_convert or avx512bw, avx512vl and avx512dq, but the desired instruction sets are not available. Please set dtype to torch.float or set weights_prepack to False.

Hence, I use the value of core.onednn_has_bf16_support() to decide whether to enable weight prepack.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@gau-nernst
Copy link
Contributor Author

@bigPYJ1151 Do you mind taking a look at this PR? Thank you

@@ -104,7 +104,7 @@ def process_weights_after_loading(self, layer: torch.nn.Module) -> None:
layer.ipex_fusion = ipex.llm.modules.GatedMLPMOE(
layer.w13_weight,
layer.w2_weight,
use_prepack=True,
use_prepack=ipex._C.onednn_has_bf16_support(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make this config as a environment variable is better, such as VLLM_CPU_MOE_PREPACK, by default is False.

You can refer to VLLM_CPU_OMP_THREADS_BIND for adding an env in vLLM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the change. I set True by default to maintain current behavior, unless you want to change the default to False instead? (Then we probably need to document this flag somewhere e.g. CPU perf)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem.
Please also update the section Related runtime environment variables in docs/source/getting_started/installation/cpu.md, thanks :)

Signed-off-by: Thien Tran <[email protected]>
@mergify mergify bot added the documentation Improvements or additions to documentation label Mar 13, 2025
@bigPYJ1151
Copy link
Contributor

LGTM, thanks for your fix!
@Isotr0py Please help to take a look, thanks :)

@gau-nernst gau-nernst changed the title [Bugfix][IPEX] use_prepack=False for MoE when it's not supported [Bugfix][IPEX] Add VLLM_CPU_MOE_PREPACK to allow disabling MoE prepack when CPU does not support it Mar 13, 2025
@Isotr0py Isotr0py enabled auto-merge (squash) March 13, 2025 10:42
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 13, 2025
@vllm-bot vllm-bot merged commit 95d680b into vllm-project:main Mar 14, 2025
28 of 31 checks passed
@gau-nernst gau-nernst deleted the ipex_moe_prepack branch March 14, 2025 03:45
richardsliu pushed a commit to richardsliu/vllm that referenced this pull request Mar 14, 2025
…ack when CPU does not support it (vllm-project#14681)

Signed-off-by: Thien Tran <[email protected]>
Signed-off-by: Richard Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants