-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Add standalone multimodal encoder benchmark #27511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add standalone multimodal encoder benchmark #27511
Conversation
Signed-off-by: Al-Ekram Elahee Hridoy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable standalone benchmark for multimodal encoders, integrating it into the vLLM CLI. The implementation is well-structured and uses appropriate techniques for performance measurement, such as torch.cuda.Event for accurate GPU timing. My review focuses on improving robustness by handling a couple of edge cases in user input that could lead to unexpected crashes. The suggested changes will make the benchmark more resilient to varied inputs.
vllm/benchmarks/encoder.py
Outdated
| if "x" in size_str: | ||
| width_str, height_str = size_str.lower().split("x", maxsplit=1) | ||
| width = int(width_str) | ||
| height = int(height_str) | ||
| else: | ||
| width = height = int(size_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation for parsing image sizes does not handle malformed inputs like "224x" or "x224". In such cases, split would produce an empty string, and int('') would raise a ValueError, causing the benchmark to crash. It's better to handle this with a try-except block to provide a more informative error message to the user.
| if "x" in size_str: | |
| width_str, height_str = size_str.lower().split("x", maxsplit=1) | |
| width = int(width_str) | |
| height = int(height_str) | |
| else: | |
| width = height = int(size_str) | |
| try: | |
| if "x" in size_str: | |
| width_str, height_str = size_str.lower().split("x", maxsplit=1) | |
| width = int(width_str) | |
| height = int(height_str) | |
| else: | |
| width = height = int(size_str) | |
| except ValueError as exc: | |
| raise ValueError(f"Invalid image size format: {size_str!r}") from exc |
| def _summarize(times_ms: Sequence[float]) -> dict[str, float]: | ||
| arr = np.array(times_ms, dtype=np.float64) | ||
| return { | ||
| "avg": float(arr.mean()), | ||
| "p50": float(np.percentile(arr, 50.0)), | ||
| "p90": float(np.percentile(arr, 90.0)), | ||
| "p99": float(np.percentile(arr, 99.0)), | ||
| "stdev": float(arr.std(ddof=0)) if arr.size > 1 else 0.0, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user provides --num-iters 0, the _time_encoder function will return an empty list. This causes the _summarize function to crash with an IndexError when np.percentile is called on an empty array. The function should be made more robust to handle an empty input sequence, for instance, by returning zero for all statistics.
def _summarize(times_ms: Sequence[float]) -> dict[str, float]:
if not times_ms:
return {
"avg": 0.0,
"p50": 0.0,
"p90": 0.0,
"p99": 0.0,
"stdev": 0.0,
}
arr = np.array(times_ms, dtype=np.float64)
return {
"avg": float(arr.mean()),
"p50": float(np.percentile(arr, 50.0)),
"p90": float(np.percentile(arr, 90.0)),
"p99": float(np.percentile(arr, 99.0)),
"stdev": float(arr.std(ddof=0)) if arr.size > 1 else 0.0,
}|
Hi @ywang96 , would you look into this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| class BenchmarkEncoderSubcommand(BenchmarkSubcommandBase): | ||
| """The `encoder` subcommand for vllm bench.""" | ||
|
|
||
| name = "encoder" | ||
| help = "Benchmark standalone multimodal encoder forward latency." | ||
|
|
||
| @classmethod | ||
| def add_cli_args(cls, parser: argparse.ArgumentParser) -> None: | ||
| add_cli_args(parser) | ||
|
|
||
| @staticmethod | ||
| def cmd(args: argparse.Namespace) -> None: | ||
| main(args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Register encoder subcommand with vllm CLI
The new BenchmarkEncoderSubcommand is defined here, but nothing imports this module when the CLI starts. vllm.entrypoints.cli.benchmark.main discovers bench commands via BenchmarkSubcommandBase.__subclasses__() after loading vllm.entrypoints.cli, whose __init__ currently only imports latency/throughput/serve. Because encoder.py is never imported, the class is absent from __subclasses__() and vllm bench will not expose the encoder command at all, defeating the purpose of the new benchmark. Import the module in vllm.entrypoints.cli.__init__ or otherwise ensure it is loaded before command discovery.
Useful? React with 👍 / 👎.
Signed-off-by: Al-Ekram Elahee Hridoy <[email protected]>
| @@ -0,0 +1,26 @@ | |||
| # SPDX-License-Identifier: Apache-2.0 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to add this deprecation since it's a new script
| tokenization_kwargs=inputs.tokenization_kwargs, | ||
| ) | ||
| preprocess_ms = None | ||
| if include_preproc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just always include this, since we can get this metric essentially for free
Fixes #25450
Summary