[Feature]: Standalone Encoder Benchmark

### 🚀 The feature, motivation and pitch

We want to be able to measure the performance for multimodal encoder alone quantitatively, which requires a benchmark that focuses on encoder forward pass.

Ideally this benchmark should do:
1. Import encoder modules from model file and initialize it with dummy weights & vLLM config
2. Convert dummy images to inputs of the encoder via models's HF processor
3. Measure latency with processing a batch of `X` images with `Y` sizes.

`X` and `Y` should be configurable.

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: Standalone Encoder Benchmark #25450

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: Standalone Encoder Benchmark #25450

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions