Skip to content

Conversation

@alhridoy
Copy link
Contributor

@alhridoy alhridoy commented Oct 25, 2025

Fixes #25450

Summary

 - add a standalone multimodal encoder benchmark
 (`vllm/benchmarks/encoder.py`) that loads dummy-weight models, builds
 HF-processor inputs, and measures `get_multimodal_embeddings` latency across
  configurable batch/image sizes
 - wire the runner into the CLI as `vllm bench encoder` and keep a legacy
 shim at `benchmarks/benchmark_encoder.py`
 - support JSON export, optional preprocessing timing, and sane defaults for
 batch sizes and image resolutions

Signed-off-by: Al-Ekram Elahee Hridoy <[email protected]>
@mergify mergify bot added frontend performance Performance-related issues labels Oct 25, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable standalone benchmark for multimodal encoders, integrating it into the vLLM CLI. The implementation is well-structured and uses appropriate techniques for performance measurement, such as torch.cuda.Event for accurate GPU timing. My review focuses on improving robustness by handling a couple of edge cases in user input that could lead to unexpected crashes. The suggested changes will make the benchmark more resilient to varied inputs.

Comment on lines 32 to 37
if "x" in size_str:
width_str, height_str = size_str.lower().split("x", maxsplit=1)
width = int(width_str)
height = int(height_str)
else:
width = height = int(size_str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation for parsing image sizes does not handle malformed inputs like "224x" or "x224". In such cases, split would produce an empty string, and int('') would raise a ValueError, causing the benchmark to crash. It's better to handle this with a try-except block to provide a more informative error message to the user.

Suggested change
if "x" in size_str:
width_str, height_str = size_str.lower().split("x", maxsplit=1)
width = int(width_str)
height = int(height_str)
else:
width = height = int(size_str)
try:
if "x" in size_str:
width_str, height_str = size_str.lower().split("x", maxsplit=1)
width = int(width_str)
height = int(height_str)
else:
width = height = int(size_str)
except ValueError as exc:
raise ValueError(f"Invalid image size format: {size_str!r}") from exc

Comment on lines 265 to 273
def _summarize(times_ms: Sequence[float]) -> dict[str, float]:
arr = np.array(times_ms, dtype=np.float64)
return {
"avg": float(arr.mean()),
"p50": float(np.percentile(arr, 50.0)),
"p90": float(np.percentile(arr, 90.0)),
"p99": float(np.percentile(arr, 99.0)),
"stdev": float(arr.std(ddof=0)) if arr.size > 1 else 0.0,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If a user provides --num-iters 0, the _time_encoder function will return an empty list. This causes the _summarize function to crash with an IndexError when np.percentile is called on an empty array. The function should be made more robust to handle an empty input sequence, for instance, by returning zero for all statistics.

def _summarize(times_ms: Sequence[float]) -> dict[str, float]:
    if not times_ms:
        return {
            "avg": 0.0,
            "p50": 0.0,
            "p90": 0.0,
            "p99": 0.0,
            "stdev": 0.0,
        }

    arr = np.array(times_ms, dtype=np.float64)
    return {
        "avg": float(arr.mean()),
        "p50": float(np.percentile(arr, 50.0)),
        "p90": float(np.percentile(arr, 90.0)),
        "p99": float(np.percentile(arr, 99.0)),
        "stdev": float(arr.std(ddof=0)) if arr.size > 1 else 0.0,
    }

@alhridoy
Copy link
Contributor Author

Hi @ywang96 , would you look into this PR?

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +10 to +22
class BenchmarkEncoderSubcommand(BenchmarkSubcommandBase):
"""The `encoder` subcommand for vllm bench."""

name = "encoder"
help = "Benchmark standalone multimodal encoder forward latency."

@classmethod
def add_cli_args(cls, parser: argparse.ArgumentParser) -> None:
add_cli_args(parser)

@staticmethod
def cmd(args: argparse.Namespace) -> None:
main(args)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Register encoder subcommand with vllm CLI

The new BenchmarkEncoderSubcommand is defined here, but nothing imports this module when the CLI starts. vllm.entrypoints.cli.benchmark.main discovers bench commands via BenchmarkSubcommandBase.__subclasses__() after loading vllm.entrypoints.cli, whose __init__ currently only imports latency/throughput/serve. Because encoder.py is never imported, the class is absent from __subclasses__() and vllm bench will not expose the encoder command at all, defeating the purpose of the new benchmark. Import the module in vllm.entrypoints.cli.__init__ or otherwise ensure it is loaded before command discovery.

Useful? React with 👍 / 👎.

Signed-off-by: Al-Ekram Elahee Hridoy <[email protected]>
@@ -0,0 +1,26 @@
# SPDX-License-Identifier: Apache-2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to add this deprecation since it's a new script

tokenization_kwargs=inputs.tokenization_kwargs,
)
preprocess_ms = None
if include_preproc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just always include this, since we can get this metric essentially for free

@DarkLight1337 DarkLight1337 requested a review from ywang96 October 26, 2025 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend performance Performance-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Standalone Encoder Benchmark

2 participants