-
Notifications
You must be signed in to change notification settings - Fork 303
[NPU] Support NPUW for text-embedding models #3088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for NPUW (Neural Processing Unit Workload) optimization for text embedding models, enabling long context support and performance improvements through prefill-chunk handling.
Key changes:
- Added NPU-specific compilation path for text embedding models with dynamic shapes
- Introduced new configuration parameter
emb_pad_to_max_lengthto control padding behavior - Refactored NPU compilation logic to support both LLM and text embedding model types
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/llm_bench/llm_bench_utils/ov_utils.py | Fixed parameter name and moved padding configuration to support new padding control |
| tools/llm_bench/llm_bench_utils/model_utils.py | Added mapping for new emb_pad_to_max_length parameter |
| tools/llm_bench/benchmark.py | Added command-line argument for embedding padding control |
| src/cpp/src/utils.hpp | Declared new function for NPU text embedding compilation |
| src/cpp/src/utils.cpp | Refactored NPU compilation logic and added text embedding-specific configuration |
| src/cpp/src/rag/text_embedding_pipeline.cpp | Implemented NPU compilation path for dynamic text embedding models |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d1b6ce1 to
94495b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
1b7e667 to
3120bdf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f713558 to
a591398
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@mengweiguo You applied formatting for updated files. We have several PRs to enable formatting. I'm a bit concerned if your changes would match our fromatting rules. |
628bc9e to
7f6ba1e
Compare
Reverted. Thanks. |
| pooling_type = kwargs.get("emb_pooling_type") | ||
| max_length = kwargs.get("emb_max_length") | ||
| padding_side = kwargs.get("embedding_padding_side") | ||
| padding_side = kwargs.get("emb_padding_side") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbalandi argument names were not aligned, thus option was lost. Consider arguments review and potentially introduce types.
@mengweiguo thanks!
|
@mengweiguo we also need tests for NPUW, could you please implement simple test at: https://github.com/openvinotoolkit/openvino.genai/blob/master/tests/python_tests/test_rag.py |
Sure, will do that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
054354b to
0817705
Compare
Description
The benefits handled by prefill-chunk in NPUW:
Note:
CVS-177453
Checklist: