Skip to content

Commit 1918e88

Browse files
committed
wip
1 parent 8e27082 commit 1918e88

File tree

1 file changed

+103
-0
lines changed

1 file changed

+103
-0
lines changed

benchmarking/benchmarking.md

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Benchmarking Harness
2+
3+
Author: @biswapanda
4+
5+
# Problem statement:
6+
In current state, benchmarking has a few problems:
7+
8+
1. UX: Experimentation, debugging, and iteration is hard.
9+
use case: As a user, I want to easily experiment with different configs, and get results quickly and compare them.
10+
11+
2. Reproducibility is hard: we don't store the input configs and results.
12+
use case: As a user, I want to be able to reproduce my experiments and share them with others.
13+
14+
3. benchmarking steps are tightly coupled. If a sinlge step/benchmark config fails the entire process is aborted/retried.
15+
16+
4. port-forwarding and benchmarking has non-deterministic latency characteristics.
17+
18+
## Proposed plan:
19+
20+
1. decouple all steps and then compose them together: prep a model, deploy k8s cr, benchmark, collect data
21+
22+
2. capture configs for the experiment: deploy (config or a reference to deployment), benchmark, model etc
23+
24+
3. we'd run benchmarks inside k8s cluster in k8s native approach.
25+
26+
27+
## Steps:
28+
Following steps are executed by the harness:
29+
30+
Note: These steps are reusable across different tests (LLM benchmarking, Accuracy testing, Functional testing etc)
31+
32+
Since these steps are reusable across different tests, we can swap the container used for each step.
33+
34+
1. Initialize experiment
35+
36+
a. (Optional) deploy model
37+
38+
b. wait for the model to be ready
39+
40+
2. Run Benchmarking test using configs and benchmark container (genai-perf, ai perf or 3rd party tool)
41+
42+
a. Prepare configs (matrix of params: isl/osl, concurrency, etc)
43+
pass as a config file to the harness container
44+
45+
b. Run test for each config
46+
47+
3. Teardown
48+
49+
a. (Optional) Collect artifacts - push files to upstream storage (s3/minio)
50+
51+
b. Collect output results:
52+
Push benchmark metrics to a data storage layer (s3/minio/database) using a cli tool
53+
54+
4. Analytics:
55+
a. Generate charts, graphs, and tables from the benchmark metrics
56+
57+
58+
## Config
59+
60+
Benchmarking config file:
61+
```yaml
62+
name: "blueprint-name"
63+
model:
64+
name: "RedHat/Llama-3.3-70B-Instruct"
65+
path: "/path/to/model"
66+
concurrency: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
67+
endpoint: "/v1/chat/completions"
68+
endpoint_type: "chat"
69+
benchmark:
70+
isl_osl
71+
- [8192, 1024]
72+
- [1024, 1024]
73+
- [1024, 8192]
74+
```
75+
76+
## Alternatives:
77+
78+
### Alternative 1: Benchmarking as a first class citizen in dynamo
79+
80+
```
81+
kind: DynamoBenchmark
82+
metadata:
83+
name: vllm-agg-benchmark
84+
spec:
85+
model:
86+
modelRef: llama-3-70b-instruct-v1
87+
config:
88+
model: "RedHat/Llama-3.3-70B-Instruct"
89+
path: "/path/to/model"
90+
concurrency: [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024]
91+
endpoint: "/v1/chat/completions"
92+
endpoint_type: "chat"
93+
benchmark:
94+
isl_osl
95+
- [8192, 1024]
96+
- [1024, 1024]
97+
- [1024, 8192]
98+
```
99+
100+
### Alternative 2: Benchmarking helm chart + workflow manager
101+
102+
Simpler to manage and deploy.
103+
Reuse Argo workflows for the workflow manager to orchestrate deps and workflow.

0 commit comments

Comments
 (0)