Limit concurrent requests to 28000 #19

Bslabe123 · 2025-03-26T23:11:20Z

This PR fixes the ClientConnectorErrors seen at high qps due to port exhaustion by upper bounding the number of concurrent requests to 28000. This is a temporary fix until the approach for high qps benchmarking is decided on. 28000 is roughly the number of ephemeral ports available when containerized. Added active_connections_metric gauge.

LatencyProfileGenerator:time_to_first_token_created 1.7431152424978657e+09
# HELP LatencyProfileGenerator:active_requests How many requests actively being processed
# TYPE LatencyProfileGenerator:active_requests gauge
LatencyProfileGenerator:active_requests 29888.0
# HELP LatencyProfileGenerator:active_connections How many active connections
# TYPE LatencyProfileGenerator:active_connections gauge
LatencyProfileGenerator:active_connections 28000.0

benchmark_serving.py

jjk-g · 2025-03-28T15:06:40Z

benchmark_serving.py

-        prompts_sent += 1
-
-    results = await asyncio.gather(*tasks)
+    async with aiohttp.ClientSession(trust_env=False, connector=aiohttp.TCPConnector(keepalive_timeout=30, enable_cleanup_closed=True, limit=28000,),timeout=None, trace_configs=[trace_config]) as clientSession:


I am less inclined to hard code 28k vs set to 0 (no limit) here and catch the appropriate error, log, and retry.

The added metric and logging will help observability. The retry is effectively the same outcome (qps slowdown).

yes if we can log failures due to ephemeral port exhaustion so that we know the experiment is not valid and the user needs to reduce the QPS or num_prompts

+1 to logging when we exhaust ports, added and prevented including server metrics when we exhaust ports since the wait time invalidates these. Non-server metrics could still be valuable since the measured e2e latency includes the time waiting to send the request, if no requests are ever being queued on any model server and the bottleneck is this tool then yes the experiment data would for certain be invalid.

Bslabe123 added 19 commits March 12, 2025 21:48

first commit

6725b70

refactor

d976ec8

first commit

f583747

revert

945b76a

revert

c3b40ca

improper error handling

3989da0

add comment

5321440

remove dummy call

ed6e305

deduplicate flags

20008f1

configurable sleep time

1f65d63

extra log

614de11

revert latency_throughput_curve.sh

6a5196e

more reversions

099f730

more reversions

a319668

more reversions

945cd1b

more reversions

87794d2

more reversions

39fdba7

log level debug -> info

b1dbefb

Merge branch 'main' into ephemeral-ports-fix

d7caf8c

Bslabe123 changed the title ~~[WIP] Limit concurrent requests to prevent exhausting ephemeral ports~~ Limit concurrent requests to 28000 prevent exhausting ephemeral ports Mar 27, 2025

Bslabe123 changed the title ~~Limit concurrent requests to 28000 prevent exhausting ephemeral ports~~ Limit concurrent requests to 28000 Mar 27, 2025

Bslabe123 added 3 commits March 27, 2025 21:57

remove log line

817f172

readd traceconfig

67932bb

added active_connections_metric

218b2d4

jjk-g reviewed Mar 28, 2025

View reviewed changes

Bslabe123 added 5 commits March 28, 2025 17:05

revert ignore_eos

b1beefa

added log

08c2954

fix log

bcf6cc7

fix comment

be3a2d9

remove comment

927d366

Bslabe123 added 2 commits March 28, 2025 17:55

omit server metrics if connection limit reached

53fa0e5

omit server metrics if connection limit reached

fdf1024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Limit concurrent requests to 28000 #19

Limit concurrent requests to 28000 #19

Uh oh!

Bslabe123 commented Mar 26, 2025 •

edited

Loading

Uh oh!

Uh oh!

jjk-g Mar 28, 2025

Uh oh!

kaushikmitr Mar 28, 2025

Uh oh!

Bslabe123 Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Limit concurrent requests to 28000 #19

Are you sure you want to change the base?

Limit concurrent requests to 28000 #19

Uh oh!

Conversation

Bslabe123 commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jjk-g Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

kaushikmitr Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

Bslabe123 Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Bslabe123 commented Mar 26, 2025 •

edited

Loading