Skip to content

Conversation

@WorldExplored
Copy link
Contributor

@WorldExplored WorldExplored commented Oct 25, 2025

issue: #27413

modified api server for multi-modal inputs:

  1. Added chat-style multimodal support to /classify. (Made input optional, and wired chat template configs into ServingClassification for parity with /pooling and /chat.)
  2. The original chat and multimodal inputs (e.g., video_url) now work end-to-end without 400 errors.

smoke test:


import argparse
import json
import sys
import urllib.error
import urllib.request


def build_payload(
    model: str,
    prompt: str,
    *,
    video_url: str | None,
) -> dict:
    if video_url:
        messages = [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": prompt},
                    {
                        "type": "video_url",
                        "video_url": {"url": video_url},
                    },
                ],
            }
        ]
        return {"model": model, "messages": messages}

    return {"model": model, "input": [prompt]}


def run_smoke_test(args: argparse.Namespace) -> int:
    url = f"http://{args.host}:{args.port}/classify"
    payload = build_payload(
        model=args.model,
        prompt=args.prompt,
        video_url=args.video_url,
    )

    request = urllib.request.Request(
        url=url,
        data=json.dumps(payload).encode("utf-8"),
        headers={
            "Content-Type": "application/json",
            "User-Agent": "classification-smoke-test/0.1",
        },
        method="POST",
    )

    try:
        with urllib.request.urlopen(request, timeout=args.timeout) as response:
            print(f"{response.status} {response.reason}")
            body = response.read().decode("utf-8")
            print(body)
            return 0
    except urllib.error.HTTPError as exc:
        print(f"HTTP {exc.code}: {exc.reason}", file=sys.stderr)
        print(exc.read().decode("utf-8"), file=sys.stderr)
    except urllib.error.URLError as exc:
        print(f"Request failed: {exc}", file=sys.stderr)

    return 1


def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument("--host", default="localhost")
    parser.add_argument("--port", type=int, default=8000)
    parser.add_argument("--model", required=True)
    parser.add_argument(
        "--prompt",
        default="请判断该视频是否存在质量问题,存在返回0,不存在返回1。",
    )
    parser.add_argument("--video-url", dest="video_url")
    parser.add_argument("--timeout", type=float, default=30.0)
    return parser.parse_args()


def main() -> None:
    args = parse_args()
    raise SystemExit(run_smoke_test(args))


if __name__ == "__main__":
    main()

output:

python vllm/examples/online_serving/classification_smoke_test.py \
  --host 127.0.0.1 \
  --port 8080 \
  --model test-model \
  --prompt "classify this input"

200 OK
{"error":{"message":"The model does not support Classification API","type":"BadRequestError","param":null,"code":400}}

cc @noooop @DarkLight1337

WorldExplored and others added 3 commits October 24, 2025 23:43
Signed-off-by: WorldExplored <[email protected]>
adds support to the ServingClassification class and its initialization.
Updates preprocessing logic to handle chat messages in classification requests.
Co-Authored-By: vnadathur <[email protected]>
Signed-off-by: WorldExplored <[email protected]>
Signed-off-by: WorldExplored <[email protected]>
Co-Authored-By: vnadathur <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the API server to support multi-modal inputs for the /classify endpoint by enabling chat-style messages input. The changes are logical and align with the goal of achieving feature parity with other endpoints like /chat. My review has identified one high-severity issue in the handling of empty messages lists, which could lead to unexpected behavior for users of this new feature. A code suggestion is provided to address this. Otherwise, the changes are well-implemented.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Srreyansh Sethi <[email protected]>
Signed-off-by: WorldExplored <[email protected]>
@mergify mergify bot added the ci/build label Oct 26, 2025
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest separating this into a separate protocol class, just like EmbeddingChatRequest vs EmbeddingCompletionRequest

Signed-off-by: WorldExplored <[email protected]>
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 10, 2025
auto-merge was automatically disabled November 11, 2025 05:41

Head branch was pushed to by a user without write access

@WorldExplored
Copy link
Contributor Author

Thanks for the prompt reviews! Issue has been addressed, and auto merge is disabled. @noooop

@noooop
Copy link
Collaborator

noooop commented Nov 11, 2025

@muziyongshixin

Are there any issues with running the vllm local test now?

Look forward to your feedback on this new feature.

@muziyongshixin
Copy link

@muziyongshixin

Are there any issues with running the vllm local test now?

Look forward to your feedback on this new feature.

Sorry to reply so late. I still have problems preparing the environment. I can follow the below command to install the vllm package, but the flash_attn is incompatible. And I tried to uninstall the flash_attn and use the xformers as the backend, a new error occurs. And when I tried to reinstall the flash_attn, the process still failed.

conda create -n vllm_main python=3.12 anaconda
conda activate vllm_main

git clone https://github.com/vllm-project/vllm.git vllm_main

cd vllm_main/

pip install uv
pip install numpy==2.2.6

# You may need to manually remove xformers and flashinfer-python from requirements/cuda.txt
VLLM_USE_PRECOMPILED=1 uv pip install -v --editable .

And it seems that you don't have a VLM model trained for classification? I can upload one to the Huggingface. Can you help me test it?
Because I have been really busy recent days, I may not have enough time to fix the environment issues.
If you can help me to test it, that will be very helpful.
The url of the model checkpoint is: https://huggingface.co/muziyongshixin/Qwen2.5-VL-7B-for-VideoCls

@noooop noooop enabled auto-merge (squash) November 13, 2025 05:19
@noooop
Copy link
Collaborator

noooop commented Nov 13, 2025

@WorldExplored

Please fix the ci Failed

Signed-off-by: WorldExplored <[email protected]>
Co-Authored-By: vnadathur <[email protected]>
auto-merge was automatically disabled November 13, 2025 23:04

Head branch was pushed to by a user without write access

@WorldExplored
Copy link
Contributor Author

I don't believe the CI fails are due to me, the files of complaint aren't anything I touched. @noooop

Signed-off-by: WorldExplored <[email protected]>
Co-Authored-By: vnadathur <[email protected]>
@noooop
Copy link
Collaborator

noooop commented Nov 14, 2025

I don't believe the CI fails are due to me, the files of complaint aren't anything I touched. @noooop

https://buildkite.com/vllm/ci/builds/38782/steps/canvas?sid=019a7ba8-0df5-421e-82e3-a3d47208f960

ValueError: At most 0 video(s) may be provided in one prompt.

google/gemma-3-4b-it does not support video input. Please select a model that supports video input for testing.

@muziyongshixin
Copy link

I don't believe the CI fails are due to me, the files of complaint aren't anything I touched. @noooop

https://buildkite.com/vllm/ci/builds/38782/steps/canvas?sid=019a7ba8-0df5-421e-82e3-a3d47208f960

ValueError: At most 0 video(s) may be provided in one prompt.

google/gemma-3-4b-it does not support video input. Please select a model that supports video input for testing.

This model can be used for test
https://huggingface.co/muziyongshixin/Qwen2.5-VL-7B-for-VideoCls

@noooop
Copy link
Collaborator

noooop commented Nov 14, 2025

I don't believe the CI fails are due to me, the files of complaint aren't anything I touched. @noooop

https://buildkite.com/vllm/ci/builds/38782/steps/canvas?sid=019a7ba8-0df5-421e-82e3-a3d47208f960
ValueError: At most 0 video(s) may be provided in one prompt.
google/gemma-3-4b-it does not support video input. Please select a model that supports video input for testing.

This model can be used for test https://huggingface.co/muziyongshixin/Qwen2.5-VL-7B-for-VideoCls

working on it....

Signed-off-by: wang.yuqi <[email protected]>
@noooop noooop changed the title [bugfix] modify api server for multi-modal inputs [Frontend] Added chat-style multimodal support to /classify. Nov 14, 2025
@noooop noooop enabled auto-merge (squash) November 14, 2025 06:58
@noooop
Copy link
Collaborator

noooop commented Nov 14, 2025

I don't believe the CI fails are due to me, the files of complaint aren't anything I touched. @noooop

https://buildkite.com/vllm/ci/builds/38782/steps/canvas?sid=019a7ba8-0df5-421e-82e3-a3d47208f960
ValueError: At most 0 video(s) may be provided in one prompt.
google/gemma-3-4b-it does not support video input. Please select a model that supports video input for testing.

This model can be used for test https://huggingface.co/muziyongshixin/Qwen2.5-VL-7B-for-VideoCls

working on it....

I am not 100% sure that this PR can satisfy your use case. It's best for you to test it locally.

tests/entrypoints/pooling/openai/test_vision_classification.py

I feel that the installation of vllm main is smoother.

conda create -n vllm_main python=3.12 anaconda
conda activate vllm_main

git clone https://github.com/vllm-project/vllm.git vllm_main

cd vllm_main/

pip install uv
pip install numpy==2.2.6

VLLM_USE_PRECOMPILED=1 uv pip install -v --editable .

@muziyongshixin
Copy link

muziyongshixin commented Nov 14, 2025

I don't believe the CI fails are due to me, the files of complaint aren't anything I touched. @noooop

https://buildkite.com/vllm/ci/builds/38782/steps/canvas?sid=019a7ba8-0df5-421e-82e3-a3d47208f960
ValueError: At most 0 video(s) may be provided in one prompt.
google/gemma-3-4b-it does not support video input. Please select a model that supports video input for testing.

This model can be used for test https://huggingface.co/muziyongshixin/Qwen2.5-VL-7B-for-VideoCls

working on it....

I am not 100% sure that this PR can satisfy your use case. It's best for you to test it locally.

tests/entrypoints/pooling/openai/test_vision_classification.py

I feel that the installation of vllm main is smoother.

conda create -n vllm_main python=3.12 anaconda
conda activate vllm_main

git clone https://github.com/vllm-project/vllm.git vllm_main

cd vllm_main/

pip install uv
pip install numpy==2.2.6

VLLM_USE_PRECOMPILED=1 uv pip install -v --editable .

I don't believe the CI fails are due to me, the files of complaint aren't anything I touched. @noooop

https://buildkite.com/vllm/ci/builds/38782/steps/canvas?sid=019a7ba8-0df5-421e-82e3-a3d47208f960
ValueError: At most 0 video(s) may be provided in one prompt.
google/gemma-3-4b-it does not support video input. Please select a model that supports video input for testing.

This model can be used for test https://huggingface.co/muziyongshixin/Qwen2.5-VL-7B-for-VideoCls

working on it....

I am not 100% sure that this PR can satisfy your use case. It's best for you to test it locally.

tests/entrypoints/pooling/openai/test_vision_classification.py

I feel that the installation of vllm main is smoother.

conda create -n vllm_main python=3.12 anaconda
conda activate vllm_main

git clone https://github.com/vllm-project/vllm.git vllm_main

cd vllm_main/

pip install uv
pip install numpy==2.2.6

VLLM_USE_PRECOMPILED=1 uv pip install -v --editable .

Thanks for your patient efforts.
If the code presented below can function correctly, then this pull request (PR) will be able to satisfy my requirements.

def test_classify_accepts_chat_video_url(
    server_vlm_classify: RemoteOpenAIServer, model_name: str
) -> None:
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please classify this video."},
                {"type": "video_url", "video_url": {"url": TEST_VIDEO_URL}},
            ],
        }
    ]

    response = requests.post(
        server_vlm_classify.url_for("classify"),
        json={"model": model_name, "messages": messages},
    )
    response.raise_for_status()

    output = ClassificationResponse.model_validate(response.json())

    assert output.object == "list"
    assert output.model == model_name
    assert len(output.data) == 1
    assert len(output.data[0].probs) == 2
    assert output.usage.prompt_tokens == 4807

@noooop
Copy link
Collaborator

noooop commented Nov 14, 2025

If the code presented below can function correctly, then this pull request (PR) will be able to satisfy my requirements.

test can pass locally

I think this PR can make it for the upcoming vllm 0.11.1 release.

vllm still needs a lot of improvement in multimodal usage, Feel free to raise issues, let us know more about the user scenarios.

@DarkLight1337
Copy link
Member

We have already cut the branch so it won't be in the upcoming release

Signed-off-by: wang.yuqi <[email protected]>
@noooop noooop merged commit 360bd87 into vllm-project:main Nov 14, 2025
45 checks passed
geodavic pushed a commit to geodavic/vllm that referenced this pull request Nov 16, 2025
…roject#27516)

Signed-off-by: WorldExplored <[email protected]>
Signed-off-by: Srreyansh Sethi <[email protected]>
Signed-off-by: vnadathur <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Co-authored-by: vnadathur <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: vnadathur <[email protected]>
Co-authored-by: wang.yuqi <[email protected]>
Co-authored-by: wang.yuqi <[email protected]>
Signed-off-by: George D. Torres <[email protected]>
bwasti pushed a commit to bwasti/vllm that referenced this pull request Nov 17, 2025
…roject#27516)

Signed-off-by: WorldExplored <[email protected]>
Signed-off-by: Srreyansh Sethi <[email protected]>
Signed-off-by: vnadathur <[email protected]>
Signed-off-by: wang.yuqi <[email protected]>
Co-authored-by: vnadathur <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: vnadathur <[email protected]>
Co-authored-by: wang.yuqi <[email protected]>
Co-authored-by: wang.yuqi <[email protected]>
Signed-off-by: Bram Wasti <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build frontend ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants