-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
[Frontend] Added chat-style multimodal support to /classify. #27516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: WorldExplored <[email protected]> adds support to the ServingClassification class and its initialization. Updates preprocessing logic to handle chat messages in classification requests. Co-Authored-By: vnadathur <[email protected]>
Signed-off-by: WorldExplored <[email protected]>
Signed-off-by: WorldExplored <[email protected]> Co-Authored-By: vnadathur <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request modifies the API server to support multi-modal inputs for the /classify endpoint by enabling chat-style messages input. The changes are logical and align with the goal of achieving feature parity with other endpoints like /chat. My review has identified one high-severity issue in the handling of empty messages lists, which could lead to unexpected behavior for users of this new feature. A code suggestion is provided to address this. Otherwise, the changes are well-implemented.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Srreyansh Sethi <[email protected]>
Signed-off-by: WorldExplored <[email protected]>
Signed-off-by: WorldExplored <[email protected]> Co-Authored-By: vnadathur <[email protected]>
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also suggest separating this into a separate protocol class, just like EmbeddingChatRequest vs EmbeddingCompletionRequest
Signed-off-by: WorldExplored <[email protected]> Co-Authored-By: vnadathur <[email protected]>
Signed-off-by: WorldExplored <[email protected]>
Signed-off-by: WorldExplored <[email protected]>
Signed-off-by: WorldExplored <[email protected]> Co-Authored-By: vnadathur <[email protected]>
Signed-off-by: WorldExplored <[email protected]>
Signed-off-by: vnadathur <[email protected]>
Signed-off-by: vnadathur <[email protected]>
Head branch was pushed to by a user without write access
|
Thanks for the prompt reviews! Issue has been addressed, and auto merge is disabled. @noooop |
|
Are there any issues with running the vllm local test now? Look forward to your feedback on this new feature. |
Sorry to reply so late. I still have problems preparing the environment. I can follow the below command to install the vllm package, but the flash_attn is incompatible. And I tried to uninstall the flash_attn and use the xformers as the backend, a new error occurs. And when I tried to reinstall the flash_attn, the process still failed. And it seems that you don't have a VLM model trained for classification? I can upload one to the Huggingface. Can you help me test it? |
|
Please fix the ci Failed |
Signed-off-by: WorldExplored <[email protected]> Co-Authored-By: vnadathur <[email protected]>
Head branch was pushed to by a user without write access
|
I don't believe the CI fails are due to me, the files of complaint aren't anything I touched. @noooop |
Signed-off-by: WorldExplored <[email protected]> Co-Authored-By: vnadathur <[email protected]>
https://buildkite.com/vllm/ci/builds/38782/steps/canvas?sid=019a7ba8-0df5-421e-82e3-a3d47208f960 ValueError: At most 0 video(s) may be provided in one prompt. google/gemma-3-4b-it does not support video input. Please select a model that supports video input for testing. |
This model can be used for test |
working on it.... |
Signed-off-by: wang.yuqi <[email protected]>
I am not 100% sure that this PR can satisfy your use case. It's best for you to test it locally. tests/entrypoints/pooling/openai/test_vision_classification.py I feel that the installation of vllm main is smoother. |
Thanks for your patient efforts. |
test can pass locally
vllm still needs a lot of improvement in multimodal usage, Feel free to raise issues, let us know more about the user scenarios. |
|
We have already cut the branch so it won't be in the upcoming release |
Signed-off-by: wang.yuqi <[email protected]>
…roject#27516) Signed-off-by: WorldExplored <[email protected]> Signed-off-by: Srreyansh Sethi <[email protected]> Signed-off-by: vnadathur <[email protected]> Signed-off-by: wang.yuqi <[email protected]> Co-authored-by: vnadathur <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: vnadathur <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Signed-off-by: George D. Torres <[email protected]>
…roject#27516) Signed-off-by: WorldExplored <[email protected]> Signed-off-by: Srreyansh Sethi <[email protected]> Signed-off-by: vnadathur <[email protected]> Signed-off-by: wang.yuqi <[email protected]> Co-authored-by: vnadathur <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: vnadathur <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Co-authored-by: wang.yuqi <[email protected]> Signed-off-by: Bram Wasti <[email protected]>
issue: #27413
modified api server for multi-modal inputs:
smoke test:
output:
cc @noooop @DarkLight1337