-
-
Couldn't load subscription status.
- Fork 10.9k
[perf] Optimize Qwen2-VL Startup Performance with LRU Cache #27533
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[perf] Optimize Qwen2-VL Startup Performance with LRU Cache #27533
Conversation
885be92 to
7d09de0
Compare
💡 Codex Reviewvllm/vllm/model_executor/models/qwen2_vl.py Lines 1037 to 1038 in 885be92
ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to optimize the startup performance of Qwen2-VL models by caching the result of get_image_size_with_most_features(). The change introduces the @lru_cache decorator to this method. While the optimization is valid, the implementation has a critical issue: the lru_cache decorator is used without being imported, which will cause a NameError at runtime. This needs to be fixed by adding the corresponding import statement.
…artup Add @lru_cache decorator to get_image_size_with_most_features() to avoid repeated expensive smart_resize() calculations during profiling. The method is called twice during startup (once for image tokens, once for video tokens). Caching eliminates the duplicate smart_resize computation, which is the primary performance bottleneck. Performance impact: - Avoids 1 redundant smart_resize call (~10ms) - 2x speedup for this specific operation - Simpler implementation (only 1 line added) Follows pattern from qwen2_5_vl.py which uses instance method caching. Signed-off-by: zitian.zhao <[email protected]>
7d09de0 to
44d189b
Compare
|
@codex review |
|
Looking at the code of Probably the overhead is coming from somewhere else. |
|
OK,I‘ll find out the actual overhead. |
Summary
Add
@lru_cachetoget_image_size_with_most_features()to eliminate redundantsmart_resize()calls during profiling, improving startup performance for Qwen2-VL family models.Problem
During profiling/startup,
get_image_size_with_most_features()is called twice:Each call triggers expensive
smart_resize()computation , causing slow startup as noted invllm/multimodal/processing.py:1178-1181.Solution
Add
@lru_cache(maxsize=128) # noqa: B019decorator to cache the result:The second call hits cache instead of recomputing, eliminating duplicate work.
Performance Impact
smart_resize()callsAffected Models
Testing
Files Changed
vllm/model_executor/models/qwen2_vl.py: +1 line