Skip to content

Commit 2a309a5

Browse files
Add documentation about the benchmark configuration (#13)
1 parent 07628c9 commit 2a309a5

File tree

1 file changed

+88
-0
lines changed

1 file changed

+88
-0
lines changed

README.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,3 +87,91 @@ kubectl cp <latency-profile-generator-pod-name>:benchmark-<timestamp>.json repor
8787
```
8888
kubectl delete -f deploy/deployment.yaml
8989
```
90+
91+
## Configuring the Benchmark
92+
93+
The following are the set of flags the benchmarking script takes in. These are all exposed as environment variables in the `deploy/deployment.yaml` file that you can configure.
94+
95+
* `--backend`:
96+
* Type: `str`
97+
* Default: `"vllm"`
98+
* Choices: `["vllm", "tgi", "naive_transformers", "tensorrt_llm_triton", "sax", "jetstream"]`
99+
* Description: Specifies the backend model server to benchmark.
100+
* `--file-prefix`:
101+
* Type: `str`
102+
* Default: `"benchmark"`
103+
* Description: Prefix for output files.
104+
* `--endpoint`:
105+
* Type: `str`
106+
* Default: `"generate"`
107+
* Description: The endpoint to send requests to.
108+
* `--host`:
109+
* Type: `str`
110+
* Default: `"localhost"`
111+
* Description: The host address of the server.
112+
* `--port`:
113+
* Type: `int`
114+
* Default: `7080`
115+
* Description: The port number of the server.
116+
* `--dataset`:
117+
* Type: `str`
118+
* Description: Path to the dataset. The default dataset used is ShareGPT from HuggingFace.
119+
* `--models`:
120+
* Type: `str`
121+
* Description: Comma separated list of models to benchmark.
122+
* `--traffic-split`:
123+
* Type: parsed traffic split (comma separated list of floats that sum to 1.0)
124+
* Default: None
125+
* Description: Comma-separated list of traffic split proportions for the models, e.g. '0.9,0.1'. Sum must equal 1.0.
126+
* `--stream-request`:
127+
* Action: `store_true`
128+
* Description: Whether to stream the request. Needed for TTFT metric.
129+
* `--request-timeout`:
130+
* Type: `float`
131+
* Default: `3.0 * 60.0 * 60.0` (3 hours)
132+
* Description: Individual request timeout.
133+
* `--tokenizer`:
134+
* Type: `str`
135+
* Required: `True`
136+
* Description: Name or path of the tokenizer. You can specify the model ID in HuggingFace for the tokenizer of a model.
137+
* `--num-prompts`:
138+
* Type: `int`
139+
* Default: `1000`
140+
* Description: Number of prompts to process.
141+
* `--max-input-length`:
142+
* Type: `int`
143+
* Default: `1024`
144+
* Description: Maximum number of input tokens for filtering the benchmark dataset.
145+
* `--max-output-length`:
146+
* Type: `int`
147+
* Default: `1024`
148+
* Description: Maximum number of output tokens.
149+
* `--request-rate`:
150+
* Type: `float`
151+
* Default: `float("inf")`
152+
* Description: Number of requests per second. If this is inf, then all the requests are sent at time 0. Otherwise, we use Poisson process to synthesize the request arrival times.
153+
* `--save-json-results`:
154+
* Action: `store_true`
155+
* Description: Whether to save benchmark results to a json file.
156+
* `--output-bucket`:
157+
* Type: `str`
158+
* Default: `None`
159+
* Description: Specifies the Google Cloud Storage bucket to which JSON-format results will be uploaded. If not provided, no upload will occur.
160+
* `--output-bucket-filepath`:
161+
* Type: `str`
162+
* Default: `None`
163+
* Description: Specifies the destination path within the bucket provided by --output-bucket for uploading the JSON results. This argument requires --output-bucket to be set. If not specified, results will be uploaded to the root of the bucket. If the filepath doesnt exist, it will be created for you.
164+
* `--additional-metadata-metrics-to-save`:
165+
* Type: `str`
166+
* Description: Additional metadata about the workload. Should be a dictionary in the form of a string.
167+
* `--scrape-server-metrics`:
168+
* Action: `store_true`
169+
* Description: Whether to scrape server metrics.
170+
* `--pm-namespace`:
171+
* Type: `str`
172+
* Default: `default`
173+
* Description: namespace of the pod monitoring object, ignored if scrape-server-metrics is false
174+
* `--pm-job`:
175+
* Type: `str`
176+
* Default: `vllm-podmonitoring`
177+
* Description: name of the pod monitoring object, ignored if scrape-server-metrics is false.

0 commit comments

Comments
 (0)