|
1 |
| -This folder contains resources to run performance benchmarks. Pls follow the benchmark guide here https://gateway-api-inference-extension.sigs.k8s.io/performance/benchmark. |
| 1 | +This folder contains resources to run performance benchmarks. Pls follow the benchmark guide here https://gateway-api-inference-extension.sigs.k8s.io/performance/benchmark. |
| 2 | + |
| 3 | +## Features |
| 4 | + |
| 5 | +1. **Config driven benchmarks**. Use the `./proto/benchmark.proto` API to write benchmark configurations, without the need to craft complex yamls. |
| 6 | +2. **Reproducibility**. The tool will snapshot all the manifests needed for the benchmark run and mark them immutable (unless the user explicitly overrides it). |
| 7 | +3. **Benchmark inheritance**. Extend an existing benchmark configuration by overriding a subset of parameters, instead of re-writing everything from scratch. |
| 8 | +4. **Benchmark orchestration**. The tool automatically deploys benchmark environment into a cluster, and waits to collects results, and then tears down the environment. The tool deploys the benchmark resources in new namespaces so each benchmark runs independently. |
| 9 | +5. **Auto generated request rate**. The tool can automatically generate request rates for known models and accelerators to cover a wide range of model server load from low latency to fully saturated throughput. |
| 10 | +6. **Visulization tools**. The results can be analyzed with a jupyter notebook. |
| 11 | +7. **Model server metrics**. The tool uses the latency profile generator benchmark tool to scrape metrics from Google Cloud monitoring. It also provides a link to a Google Cloud monitoring dashboard for detailed analysis. |
| 12 | + |
| 13 | +### Future Improvements |
| 14 | + |
| 15 | +1. The benchmark config and results are stored in protobuf format. The results can be persisted in a database such as Google Cloud Spanner to allow complex query and dashboarding use cases. |
| 16 | +2. Support running benchmarks in parallel with user configured parallelism. |
| 17 | + |
| 18 | +## Prerequisite |
| 19 | + |
| 20 | +1. [Install helm](https://helm.sh/docs/intro/quickstart/#install-helm) |
| 21 | +2. Install InferenceModel and InferencePool [CRDs](https://gateway-api-inference-extension.sigs.k8s.io/guides/#install-the-inference-extension-crds) |
| 22 | +3. [Enable Envoy patch policy](https://gateway-api-inference-extension.sigs.k8s.io/guides/#update-envoy-gateway-config-to-enable-patch-policy). |
| 23 | +4. Install [RBACs](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/12bcc9a85dad828b146758ad34a69053dca44fa9/config/manifests/inferencepool.yaml#L78) for EPP to read pods. |
| 24 | +5. Create a secret in the default namespace containing the HuggingFace token. |
| 25 | + |
| 26 | + ```bash |
| 27 | + kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2 |
| 28 | + ``` |
| 29 | + |
| 30 | +6. [Optional, GCP only] Create a `gmp-test-sa` service account with `monitoring.Viewer` role to read additional model server metrics from cloud monitoring. |
| 31 | + |
| 32 | + ```bash |
| 33 | + gcloud iam service-accounts create gmp-test-sa \ |
| 34 | + && |
| 35 | + gcloud projects add-iam-policy-binding ${BENCHMARK_PROJECT} \ |
| 36 | + --member=serviceAccount:gmp-test-sa@${BENCHMARK_PROJECT}.iam.gserviceaccount.com \ |
| 37 | + --role=roles/monitoring.viewer |
| 38 | + ``` |
| 39 | + |
| 40 | +## Get started |
| 41 | + |
| 42 | +Run all existing benchmarks: |
| 43 | + |
| 44 | +```bash |
| 45 | +# Run all benchmarks in the ./catalog/benchmark folder |
| 46 | +./scripts/run_all_benchmarks.bash |
| 47 | +``` |
| 48 | + |
| 49 | +View the benchmark results: |
| 50 | + |
| 51 | +* To view raw results, watch for a new results folder to be created `./output/{run_id}/`. |
| 52 | +* To visualize the results, use the jupyter notebook. |
| 53 | + |
| 54 | +## Common usage |
| 55 | + |
| 56 | +### Run all benchmarks in a particular benchmark config file and upload results to GCS |
| 57 | + |
| 58 | +```bash |
| 59 | +gcs_bucket='my-bucket' benchmarks=benchmarks ./scripts/run_benchmarks_file.bash |
| 60 | +``` |
| 61 | + |
| 62 | +### Generate benchmark manifests only |
| 63 | + |
| 64 | +```bash |
| 65 | +# All available environment variables. |
| 66 | +benchmarks=benchmarks ./scripts/generate_manifests.bash |
| 67 | +``` |
| 68 | + |
| 69 | +### Run particular benchmarks in a benchmark config file, by matching a benchmark name refex |
| 70 | + |
| 71 | +```bash |
| 72 | +# Run all benchmarks with Nvidia H100 |
| 73 | +gcs_bucket='my-bucket' benchmarks=benchmarks benchmark_name_regex='.*h100.*' ./scripts/run_benchmarks_file.bash |
| 74 | +``` |
| 75 | + |
| 76 | +### Resume a benchmark run from an existing run_id |
| 77 | + |
| 78 | +You may resume benchmarks from previously generated manifests. The tool will skip benchmarks which have the `results` folder, and continue those without results. |
| 79 | + |
| 80 | +```bash |
| 81 | +run_id='existing-run-id' benchmarks=benchmarks ./scripts/run_benchmarks_file.bash |
| 82 | +``` |
| 83 | + |
| 84 | +### Keep the benchmark environment after benchmark is complete (for debugging) |
| 85 | + |
| 86 | +```bash |
| 87 | +# All available environment variables. |
| 88 | +skip_tear_down='true' benchmarks=benchmarks ./scripts/run_benchmarks_file.bash |
| 89 | +``` |
| 90 | + |
| 91 | +## Command references |
| 92 | + |
| 93 | +```bash |
| 94 | +# All available environment variables |
| 95 | +regex='my-benchmark-file-name-regex' dry_run='false' gcs_bucket='my-bucket' skip_tear_down='false' benchmark_name_regex='my-benchmark-name-regex' ./scripts/run_all_benchmarks.bash |
| 96 | +``` |
| 97 | + |
| 98 | +```bash |
| 99 | +# All available environment variables. |
| 100 | +run_id='existing-run-id' dry_run='false' gcs_bucket='my-bucket' skip_tear_down='false' benchmarks=benchmarks benchmark_name_regex='my-benchmark-name-regex' ./scripts/run_benchmarks_file.bash |
| 101 | +``` |
| 102 | + |
| 103 | +```bash |
| 104 | +# All available environment variables. |
| 105 | +run_id='existing-run-id' benchmarks=benchmarks ./scripts/generate_manifests.bash |
| 106 | +``` |
| 107 | + |
| 108 | +## How does it work? |
| 109 | + |
| 110 | +The tool will automate the following steps: |
| 111 | + |
| 112 | +1. Read the benchmark config file in `./catalog/{benchmarks_config_file}`. The file contains a list of benchmarks. The config API is defined in `./proto/benchmark.proto`. |
| 113 | +2. Generates a new run_id and namespace `{benchmark_name}-{run_id}` to run the benchmarks. If the `run_id` environment variable is provided, it will reuse it instead of creating a new one. This is useful when resuming a previous benchmark run, or run multiple sets of benchmarks in parallel (e.g., run benchmarks on different accelerator types in parallel using the same run_id). |
| 114 | +3. Based on the config, generates manifests in `./output/{run_id}/{benchmark_name}-{run_id}/manifests` |
| 115 | +4. Applies the manifests to the cluster, and wait for resources to be ready. |
| 116 | +5. Once the benchmark finishes, downloads benchmark results to `./output/{run_id}/{benchmark}-{run_id}/results` |
| 117 | +6. [Optional] If a GCS bucket is specified, uploads the output folder to a GCS bucket. |
| 118 | + |
| 119 | +## Create a new benchmark |
| 120 | + |
| 121 | +You can either add new benchmarks to an existing benchmark config file, or create new benchmark config files. Each benchmark config file contains a list of benchmarks. |
| 122 | + |
| 123 | +An example benchmark with all available parameters is as follows: |
| 124 | + |
| 125 | +``` |
| 126 | +benchmarks { |
| 127 | + name: "base-benchmark" |
| 128 | + config { |
| 129 | + model_server { |
| 130 | + image: "vllm/vllm-openai@sha256:8672d9356d4f4474695fd69ef56531d9e482517da3b31feb9c975689332a4fb0" |
| 131 | + accelerator: "nvidia-h100-80gb" |
| 132 | + replicas: 1 |
| 133 | + vllm { |
| 134 | + tensor_parallelism: "1" |
| 135 | + model: "meta-llama/Llama-2-7b-hf" |
| 136 | + } |
| 137 | + } |
| 138 | + load_balancer { |
| 139 | + gateway { |
| 140 | + envoy { |
| 141 | + epp { |
| 142 | + image: "us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:v0.1.0" |
| 143 | + } |
| 144 | + } |
| 145 | + } |
| 146 | + } |
| 147 | + benchmark_tool { |
| 148 | + image: "us-docker.pkg.dev/gke-inference-gateway-dev/benchmark/benchmark-tool@sha256:1fe4991ec1e9379b261a62631e1321b8ea15772a6d9a74357932771cea7b0500" |
| 149 | + lpg { |
| 150 | + dataset: "sharegpt_v3_unfiltered_cleaned_split" |
| 151 | + models: "meta-llama/Llama-2-7b-hf" |
| 152 | + ip: "to-be-populated-automatically" |
| 153 | + port: "8081" |
| 154 | + benchmark_time_seconds: "60" |
| 155 | + output_length: "1024" |
| 156 | + } |
| 157 | + } |
| 158 | + } |
| 159 | +} |
| 160 | +``` |
| 161 | + |
| 162 | +### Create a benchmark from a base benchmark |
| 163 | + |
| 164 | +It's recommended to create a benchmark from an existing benchmark by overriding a few parameters. This inheritance feature is powerful in creating a large number of benchmarks conveniently. Below is an example that overrides the replica count of a base benchmark: |
| 165 | + |
| 166 | +``` |
| 167 | +benchmarks { |
| 168 | + name: "new-benchmark" |
| 169 | + base_benchmark_name: "base-benchmark" |
| 170 | + config { |
| 171 | + model_server { |
| 172 | + replicas: 2 |
| 173 | + } |
| 174 | + } |
| 175 | +} |
| 176 | +``` |
| 177 | + |
| 178 | +## Environment configurations |
| 179 | + |
| 180 | +The tool has default configurations (such as the cluster name) in `./scripts/env.sh`. You can tweak those for your own needs. |
| 181 | + |
| 182 | +## The benchmark.proto |
| 183 | + |
| 184 | +The `./proto/benchmark.proto` is the core of this tool, it drives the generation of the benchmark manifests, as well as the query and dashboarding of the results. |
| 185 | + |
| 186 | +Why do we need it? |
| 187 | + |
| 188 | +* An API to clearly capture the intent, instead of making various assumptions. |
| 189 | +* It lets the user to focus only on the core parameters of the benchmark itself, rather than the toil of configuring the environment and crafting the manifests. |
| 190 | +* It is the single source of truth that drives the entre lifecycle of the benchmark, including post analysis. |
| 191 | + |
| 192 | +## Contribute |
| 193 | + |
| 194 | +Refer to the [dev guide](./dev.md). |
0 commit comments