You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following are the set of flags the benchmarking script takes in. These are all exposed as environment variables in the `deploy/deployment.yaml` file that you can configure.
* Description: Specifies the backend model server to benchmark.
100
+
*`--file-prefix`:
101
+
* Type: `str`
102
+
* Default: `"benchmark"`
103
+
* Description: Prefix for output files.
104
+
*`--endpoint`:
105
+
* Type: `str`
106
+
* Default: `"generate"`
107
+
* Description: The endpoint to send requests to.
108
+
*`--host`:
109
+
* Type: `str`
110
+
* Default: `"localhost"`
111
+
* Description: The host address of the server.
112
+
*`--port`:
113
+
* Type: `int`
114
+
* Default: `7080`
115
+
* Description: The port number of the server.
116
+
*`--dataset`:
117
+
* Type: `str`
118
+
* Description: Path to the dataset. The default dataset used is ShareGPT from HuggingFace.
119
+
*`--models`:
120
+
* Type: `str`
121
+
* Description: Comma separated list of models to benchmark.
122
+
*`--traffic-split`:
123
+
* Type: parsed traffic split (comma separated list of floats that sum to 1.0)
124
+
* Default: None
125
+
* Description: Comma-separated list of traffic split proportions for the models, e.g. '0.9,0.1'. Sum must equal 1.0.
126
+
*`--stream-request`:
127
+
* Action: `store_true`
128
+
* Description: Whether to stream the request. Needed for TTFT metric.
129
+
*`--request-timeout`:
130
+
* Type: `float`
131
+
* Default: `3.0 * 60.0 * 60.0` (3 hours)
132
+
* Description: Individual request timeout.
133
+
*`--tokenizer`:
134
+
* Type: `str`
135
+
* Required: `True`
136
+
* Description: Name or path of the tokenizer. You can specify the model ID in HuggingFace for the tokenizer of a model.
137
+
*`--num-prompts`:
138
+
* Type: `int`
139
+
* Default: `1000`
140
+
* Description: Number of prompts to process.
141
+
*`--max-input-length`:
142
+
* Type: `int`
143
+
* Default: `1024`
144
+
* Description: Maximum number of input tokens for filtering the benchmark dataset.
145
+
*`--max-output-length`:
146
+
* Type: `int`
147
+
* Default: `1024`
148
+
* Description: Maximum number of output tokens.
149
+
*`--request-rate`:
150
+
* Type: `float`
151
+
* Default: `float("inf")`
152
+
* Description: Number of requests per second. If this is inf, then all the requests are sent at time 0. Otherwise, we use Poisson process to synthesize the request arrival times.
153
+
*`--save-json-results`:
154
+
* Action: `store_true`
155
+
* Description: Whether to save benchmark results to a json file.
156
+
*`--output-bucket`:
157
+
* Type: `str`
158
+
* Default: `None`
159
+
* Description: Specifies the Google Cloud Storage bucket to which JSON-format results will be uploaded. If not provided, no upload will occur.
160
+
*`--output-bucket-filepath`:
161
+
* Type: `str`
162
+
* Default: `None`
163
+
* Description: Specifies the destination path within the bucket provided by --output-bucket for uploading the JSON results. This argument requires --output-bucket to be set. If not specified, results will be uploaded to the root of the bucket. If the filepath doesnt exist, it will be created for you.
164
+
*`--additional-metadata-metrics-to-save`:
165
+
* Type: `str`
166
+
* Description: Additional metadata about the workload. Should be a dictionary in the form of a string.
167
+
*`--scrape-server-metrics`:
168
+
* Action: `store_true`
169
+
* Description: Whether to scrape server metrics.
170
+
*`--pm-namespace`:
171
+
* Type: `str`
172
+
* Default: `default`
173
+
* Description: namespace of the pod monitoring object, ignored if scrape-server-metrics is false
174
+
*`--pm-job`:
175
+
* Type: `str`
176
+
* Default: `vllm-podmonitoring`
177
+
* Description: name of the pod monitoring object, ignored if scrape-server-metrics is false.
0 commit comments