Qps observability #32

jjk-g · 2025-04-01T22:50:26Z

Adds two QPS observability featueres

Prometheus request counter to allow promQL observability
Add async singleton counter to report QPS in benchmark_result

Tested:

python3 benchmark_serving.py --save-json-results --host=llama3-8b-vllm-service --port=8000 --dataset=ShareGPT_V3_unfiltered_cleaned_split.json --tokenizer=meta-llama/Meta-Llama-3-8B --request-rate=20 --backend=vllm --num-prompts=400 --max-input-length=1024 --max-output-length=1024 --file-prefix=benchmark --models=meta-llama/Meta-Llama-3-8B --scrape-server-metrics
Namespace(backend='vllm', sax_model='', file_prefix='benchmark', endpoint='generate', host='llama3-8b-vllm-service', port=8000, dataset='ShareGPT_V3_unfiltered_cleaned_split.json', models='meta-llama/Meta-Llama-3-8B', traffic_split=None, stream_request=False, request_timeout=10800.0, tokenizer='meta-llama/Meta-Llama-3-8B', best_of=1, use_beam_search=False, num_prompts=400, max_input_length=1024, max_output_length=1024, top_k=32000, request_rate=20.0, seed=1743547367, trust_remote_code=False, machine_cost=None, use_dummy_text=False, save_json_results=True, output_bucket='', output_bucket_filepath=None, save_aggregated_result=False, additional_metadata_metrics_to_save=None, scrape_server_metrics=True, pm_namespace='default', pm_job='vllm-podmonitoring')
Models to benchmark: ['meta-llama/Meta-Llama-3-8B']
No traffic split specified. Defaulting to uniform traffic split.
Starting Prometheus Server on port 9090
====Result for Model: weighted====
Errors: {'ClientConnectorError': 0, 'TimeoutError': 0, 'ContentTypeError': 0, 'ClientOSError': 0, 'ServerDisconnectedError': 0, 'unknown_error': 0}
Total time: 93.23 s
Successful/total requests: 400/400
Requests/min: 257.42
Queries/sec: 20.49
Output_tokens/min: 24535.02
Input_tokens/min: 64095.85
Tokens/min: 88630.87
Average seconds/token (includes waiting time on server): 0.12
Average milliseconds/request (includes waiting time on server): 23495.42
Average milliseconds/output_token (includes waiting time on server): 2225.20
Average input length: 248.99
Average output length: 95.31
====Result for Model: meta-llama/Meta-Llama-3-8B====
Errors: {'ClientConnectorError': 0, 'TimeoutError': 0, 'ContentTypeError': 0, 'ClientOSError': 0, 'ServerDisconnectedError': 0, 'unknown_error': 0}
Total time: 93.23 s
Successful/total requests: 400/400
Requests/min: 257.42
Queries/sec: 20.49
Output_tokens/min: 24535.02
Input_tokens/min: 64095.85
Tokens/min: 88630.87
Average seconds/token (includes waiting time on server): 0.12
Average milliseconds/request (includes waiting time on server): 23495.42
Average milliseconds/output_token (includes waiting time on server): 2225.20
Average input length: 248.99
Average output length: 95.31

achandrasekar · 2025-04-02T16:10:11Z

benchmark_serving.py

+    return cls._instance
+
+  async def increment(self):
+    async with self._lock:


Since we have to lock to increment the counter each time, does it lead to any slowdowns waiting for this to happen when the QPS is high? Is there a way to check? I'm wondering if this can slow the rate at which we send requests.

Adds singleton counter that allows for calculating QPS.

achandrasekar reviewed Apr 2, 2025

View reviewed changes

achandrasekar approved these changes Jun 17, 2025

View reviewed changes

jjk-g added 2 commits June 17, 2025 16:51

Add prometheus metric for request count

7e11c3a

Add singleton request counter

91aac0b

Adds singleton counter that allows for calculating QPS.

jjk-g force-pushed the qps-observability branch from 80a6e5a to 91aac0b Compare June 17, 2025 16:52

achandrasekar merged commit fd39679 into AI-Hypercomputer:main Jun 17, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qps observability #32

Qps observability #32

Uh oh!

jjk-g commented Apr 1, 2025 •

edited

Loading

Uh oh!

achandrasekar Apr 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Qps observability #32

Qps observability #32

Uh oh!

Conversation

jjk-g commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

achandrasekar Apr 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jjk-g commented Apr 1, 2025 •

edited

Loading