Skip to content

Commit 9b12146

Browse files
committed
docs: add documentation for request plane (#4491)
1 parent f40a844 commit 9b12146

File tree

2 files changed

+299
-0
lines changed

2 files changed

+299
-0
lines changed

docs/guides/request_plane.md

Lines changed: 298 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,298 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
18+
# Dynamo Request Planes User Guide
19+
20+
## Overview
21+
22+
Dynamo supports multiple transport mechanisms for its request plane (the communication layer between services). You can choose from three different request plane modes based on your deployment requirements:
23+
24+
- **NATS** (default): Message broker-based request plane
25+
- **TCP**: Direct TCP connection for optimal performance
26+
- **HTTP**: HTTP/2-based request plane
27+
28+
This guide explains how to configure and use request plane in your Dynamo deployment.
29+
30+
## What is a Request Plane?
31+
32+
The request plane is the transport layer that handles communication between Dynamo services (e.g., frontend to backend, worker to worker). Different request planes offer different trade-offs:
33+
34+
| Request Plane | Suitable For | Characteristics |
35+
|--------------|----------|-----------------|
36+
| **NATS** | Production deployments with KV routing | Requires NATS infrastructure, provides pub/sub patterns, highest flexibility |
37+
| **TCP** | Low-latency direct communication | Direct connections, minimal overhead |
38+
| **HTTP** | Standard deployments, debugging | HTTP/2 protocol, easier observability with standard tools, widely compatible |
39+
40+
## KV Routing and NATS
41+
42+
Dynamo's Key-Value (KV) cache based routing optimizes large language model inference by intelligently directing requests to workers with the most relevant KV cache data. KV-aware routing improves both Time To First Token (TTFT) through better cache locality and Inter-Token Latency (ITL) through intelligent load balancing.
43+
44+
Please refer to the [KV Cache Routing documentation](../router/kv_cache_routing.md) for more details.
45+
46+
There are two modes of KV based routing:
47+
- Exact KV routing (needs NATS): KV routing is based KV events indexing in a radix tree scoring the best match for the request. *This requires NATS* to persist and distribute KV events across routers.
48+
49+
- Approximate KV routing (does not need NATS): KV routing is based on approximate load heuristics. *This does not require NATS*.
50+
51+
## Configuration
52+
53+
### Environment Variable
54+
55+
Set the request plane mode using the `DYN_REQUEST_PLANE` environment variable:
56+
57+
```bash
58+
export DYN_REQUEST_PLANE=<mode>
59+
```
60+
61+
Where `<mode>` is one of:
62+
- `nats` (default)
63+
- `tcp`
64+
- `http`
65+
66+
The value is case-insensitive.
67+
68+
### Default Behavior
69+
70+
If `DYN_REQUEST_PLANE` is not set or contains an invalid value, Dynamo defaults to `nats`.
71+
72+
## Usage Examples
73+
74+
### Using NATS (Default)
75+
76+
NATS is the default request plane and provides the most flexibility for complex deployments.
77+
78+
**Prerequisites:**
79+
- NATS server must be running and accessible
80+
- Configure NATS connection via standard Dynamo NATS environment variables
81+
82+
```bash
83+
# Explicitly set to NATS (optional, as it's the default)
84+
85+
# Run your Dynamo service
86+
DYN_REQUEST_PLANE=nats python -m dynamo.frontend --http-port=8000 &
87+
DYN_REQUEST_PLANE=nats python -m dynamo.vllm --model Qwen/Qwen3-0.6B
88+
```
89+
90+
**When to use NATS:**
91+
- Production deployments with service discovery
92+
- Currently (HA) highly available routers require durable messages persisted in NATS message broker. If you want to completely disable NATS, KV based routing won't be available
93+
- Multiple frontends and backends
94+
- Need for message replay and persistence features
95+
96+
Limitations:
97+
- NATS does not support payloads beyond 16MB (use TCP for larger payloads)
98+
99+
### Using TCP
100+
101+
TCP provides direct, low-latency communication between services.
102+
103+
**Configuration:**
104+
105+
```bash
106+
# Set request plane to TCP
107+
export DYN_REQUEST_PLANE=tcp
108+
109+
# Optional: Configure TCP server host and port
110+
export DYN_TCP_RPC_HOST=0.0.0.0 # Default host
111+
export DYN_TCP_RPC_PORT=9999 # Default port
112+
113+
# Run your Dynamo service
114+
DYN_REQUEST_PLANE=tcp python -m dynamo.frontend --http-port=8000 &
115+
DYN_REQUEST_PLANE=tcp python -m dynamo.vllm --model Qwen/Qwen3-0.6B
116+
```
117+
118+
**When to use TCP:**
119+
- Simple deployments with direct service-to-service communication (e.g. frontend to backend)
120+
- Minimal infrastructure requirements (no NATS needed)
121+
- Low-latency requirements
122+
123+
**TCP Configuration Options:**
124+
125+
Additional TCP-specific environment variables:
126+
- `DYN_TCP_RPC_HOST`: Server host address (default: auto-detected)
127+
- `DYN_TCP_RPC_PORT`: Server port (default: 9999)
128+
- `DYN_TCP_MAX_MESSAGE_SIZE`: Maximum message size for TCP client (default: 32MB)
129+
- `DYN_TCP_REQUEST_TIMEOUT`: Request timeout for TCP client (default: 10 seconds)
130+
- `DYN_TCP_POOL_SIZE`: Connection pool size for TCP client (default: 50)
131+
- `DYN_TCP_CONNECT_TIMEOUT`: Connect timeout for TCP client (default: 3 seconds)
132+
- `DYN_TCP_CHANNEL_BUFFER`: Request channel buffer size for TCP client (default: 100)
133+
134+
### Using HTTP
135+
136+
HTTP/2 provides a standards-based request plane that's easy to debug and widely compatible.
137+
138+
**Configuration:**
139+
140+
```bash
141+
# Optional: Configure HTTP server host and port
142+
export DYN_HTTP_RPC_HOST=0.0.0.0 # Default host
143+
export DYN_HTTP_RPC_PORT=8888 # Default port
144+
export DYN_HTTP_RPC_ROOT_PATH=/v1/rpc # Default path
145+
146+
# Run your Dynamo service
147+
DYN_REQUEST_PLANE=http python -m dynamo.frontend --http-port=8000 &
148+
DYN_REQUEST_PLANE=http python -m dynamo.vllm --model Qwen/Qwen3-0.6B
149+
```
150+
151+
**When to use HTTP:**
152+
- Standard deployments requiring HTTP compatibility
153+
- Debugging scenarios (use curl, browser tools, etc.)
154+
- Integration with HTTP-based infrastructure
155+
- Load balancers and proxies that work with HTTP
156+
157+
**HTTP Configuration Options:**
158+
159+
Additional HTTP-specific environment variables:
160+
- `DYN_HTTP_RPC_HOST`: Server host address (default: auto-detected)
161+
- `DYN_HTTP_RPC_PORT`: Server port (default: 8888)
162+
- `DYN_HTTP_RPC_ROOT_PATH`: Root path for RPC endpoints (default: /v1/rpc)
163+
164+
`DYN_HTTP2_*`: Various HTTP/2 client configuration options
165+
- `DYN_HTTP2_MAX_FRAME_SIZE`: Maximum frame size for HTTP client (default: 1MB)
166+
- `DYN_HTTP2_MAX_CONCURRENT_STREAMS`: Maximum concurrent streams for HTTP client (default: 1000)
167+
- `DYN_HTTP2_POOL_MAX_IDLE_PER_HOST`: Maximum idle connections per host for HTTP client (default: 100)
168+
- `DYN_HTTP2_POOL_IDLE_TIMEOUT_SECS`: Idle timeout for HTTP client (default: 90 seconds)
169+
- `DYN_HTTP2_KEEP_ALIVE_INTERVAL_SECS`: Keep-alive interval for HTTP client (default: 30 seconds)
170+
- `DYN_HTTP2_KEEP_ALIVE_TIMEOUT_SECS`: Keep-alive timeout for HTTP client (default: 10 seconds)
171+
- `DYN_HTTP2_ADAPTIVE_WINDOW`: Enable adaptive flow control (default: true)
172+
173+
## Complete Example
174+
175+
Here's a complete example showing how to launch a Dynamo deployment with different request planes:
176+
177+
See [`examples/backends/vllm/launch/agg_request_planes.sh`](../../examples/backends/vllm/launch/agg_request_planes.sh) for a complete working example that demonstrates launching Dynamo with TCP, HTTP, or NATS request planes.
178+
179+
180+
## Real-World Example
181+
182+
The Dynamo repository includes a complete example demonstrating all three request planes:
183+
184+
**Location:** `examples/backends/vllm/launch/agg_request_planes.sh`
185+
186+
```bash
187+
cd examples/backends/vllm/launch
188+
189+
# Run with TCP
190+
./agg_request_planes.sh --tcp
191+
192+
# Run with HTTP
193+
./agg_request_planes.sh --http
194+
195+
# Run with NATS
196+
./agg_request_planes.sh --nats
197+
```
198+
199+
## Architecture Details
200+
201+
### Network Manager
202+
203+
The request plane implementation is centralized in the Network Manager (`lib/runtime/src/pipeline/network/manager.rs`), which:
204+
205+
1. Reads the `DYN_REQUEST_PLANE` environment variable at startup
206+
2. Creates the appropriate server and client implementations
207+
3. Provides a transport-agnostic interface to the rest of the codebase
208+
4. Manages all network configuration and lifecycle
209+
210+
### Transport Abstraction
211+
212+
All request plane implementations conform to common trait interfaces:
213+
- `RequestPlaneServer`: Server-side interface for receiving requests
214+
- `RequestPlaneClient`: Client-side interface for sending requests
215+
216+
This abstraction means your application code doesn't need to change when switching request planes.
217+
218+
### Configuration Loading
219+
220+
Request plane configuration is loaded from environment variables at startup and cached globally. The configuration hierarchy is:
221+
222+
1. **Mode Selection**: `DYN_REQUEST_PLANE` (defaults to `nats`)
223+
2. **Transport-Specific Config**: Mode-specific environment variables (e.g., `DYN_TCP_*`, `DYN_HTTP2_*`)
224+
225+
## Migration Guide
226+
227+
### From NATS to TCP
228+
229+
1. Stop your Dynamo services
230+
2. Set environment variable `DYN_REQUEST_PLANE=tcp`
231+
3. Optionally configure TCP-specific settings (`DYN_TCP_RPC_PORT`, etc.)
232+
4. Restart your services
233+
234+
235+
### From NATS to HTTP
236+
237+
1. Stop your Dynamo services
238+
2. Set environment variable `DYN_REQUEST_PLANE=http`
239+
3. Optionally configure HTTP-specific settings (`DYN_HTTP_RPC_PORT`, etc.)
240+
4. Restart your services
241+
242+
### Testing the Migration
243+
244+
After switching request planes, verify your deployment:
245+
246+
```bash
247+
# Test with a simple request
248+
curl http://localhost:8000/v1/chat/completions \
249+
-H "Content-Type: application/json" \
250+
-d '{
251+
"model": "Qwen/Qwen3-0.6B",
252+
"messages": [{"role": "user", "content": "Hello!"}]
253+
}'
254+
```
255+
256+
## Troubleshooting
257+
258+
### Issue: Services Can't Communicate
259+
260+
**Symptoms:** Requests timeout or fail to reach the backend
261+
262+
**Solutions:**
263+
- Verify all services use the same `DYN_REQUEST_PLANE` setting
264+
- Check that server ports are not blocked by k8s network policies or firewalls
265+
- For TCP/HTTP: Ensure host/port configurations are correct and accessible
266+
- For NATS: Verify NATS server is running and accessible
267+
268+
### Issue: "Invalid request plane mode" Error
269+
270+
**Symptoms:** Service fails to start with configuration error
271+
272+
**Solutions:**
273+
- Check `DYN_REQUEST_PLANE` spelling (valid values: `nats`, `tcp`, `http`)
274+
- Value is case-insensitive but must be one of the three options
275+
- If not set, defaults to `nats`
276+
277+
### Issue: Port Conflicts
278+
279+
**Symptoms:** Server fails to start due to "address already in use"
280+
281+
**Solutions:**
282+
- TCP default port: 9999 (adjust environment variable `DYN_TCP_RPC_PORT`)
283+
- HTTP default port: 8888 (adjust environment variable `DYN_HTTP_RPC_PORT`)
284+
285+
## Performance Considerations
286+
287+
### Latency
288+
289+
- **TCP**: Lowest latency due to direct connections and binary serialization
290+
- **HTTP**: Moderate latency with HTTP/2 overhead
291+
- **NATS**: Moderate latency due to nats jet stream persistence
292+
293+
294+
### Resource Usage
295+
296+
- **TCP**: Minimal infrastructure (no additional services required)
297+
- **HTTP**: Minimal infrastructure (no additional services required)
298+
- **NATS**: Requires running NATS server (additional memory/CPU)

docs/hidden_toctree.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
kvbm/trtllm-setup.md
3939
agents/tool-calling.md
4040
guides/jail_stream_readme.md
41+
guides/request_planes.md
4142

4243
router/kv_cache_routing.md
4344
planner/load_planner.md

0 commit comments

Comments
 (0)