@@ -29,6 +29,14 @@ Where the lines are randomly sampled from a collection of lines from Shakespeare
29
29
30
30
To run the most basic load test you can the token_benchmark_ray script.
31
31
32
+
33
+ ### Caveats and Disclaimers
34
+
35
+ - The endpoints provider backend might vary widely, so this is not a reflection on how the software runs on a particular hardware.
36
+ - The results may vary with time of day.
37
+ - The results may vary with the load.
38
+ - The results may not correlate with users’ workloads.
39
+
32
40
### OpenAI Compatible APIs
33
41
``` bash
34
42
export OPENAI_API_KEY=secret_abcdefg
@@ -88,7 +96,7 @@ python token_benchmark_ray.py \
88
96
89
97
```
90
98
91
- ### HuggingFace API
99
+ ### Hugging Face
92
100
93
101
``` bash
94
102
export HUGGINGFACE_API_KEY=" YOUR_HUGGINGFACE_API_KEY"
@@ -164,9 +172,9 @@ python token_benchmark_ray.py \
164
172
165
173
```
166
174
167
- # Sagemaker
175
+ ### SageMaker
168
176
169
- Sagemaker doesn't return the total number of tokens that are generated by their endpoint, so tokens are counted using the LLama tokenizer.
177
+ SageMaker doesn't return the total number of tokens that are generated by their endpoint, so tokens are counted using the LLama tokenizer.
170
178
171
179
``` bash
172
180
@@ -244,7 +252,7 @@ python llm_correctness.py \
244
252
245
253
```
246
254
247
- ### HuggingFacAPI
255
+ ### Hugging Face
248
256
249
257
``` bash
250
258
export HUGGINGFACE_API_KEY=" YOUR_HUGGINGFACE_API_KEY"
@@ -309,9 +317,9 @@ python llm_correctness.py \
309
317
310
318
```
311
319
312
- ### Sagemaker
320
+ ### SageMaker
313
321
314
- Sagemaker doesn't return the total number of tokens that are generated by their endpoint, so tokens are counted using the LLama tokenizer.
322
+ SageMaker doesn't return the total number of tokens that are generated by their endpoint, so tokens are counted using the LLama tokenizer.
315
323
316
324
``` bash
317
325
0 commit comments