Separate port configuration for API RPC and metrics#210
Conversation
Split the single HTTP server into two independent servers: - API server (--api-port, default 5054): health, states, checkpoints, fork choice - Metrics server (--metrics-port, default 5055): prometheus metrics, pprof Rename --metrics-address to --http-address (shared by both servers). Move /lean/v0/health from metrics router to API router.
🤖 Kimi Code ReviewReview SummaryThis PR splits the monolithic RPC server into separate API and metrics servers. The changes are generally well-structured, but there are several issues to address: 1. Critical: Missing Error Handling (main.rs:170-171)The spawned servers use tokio::spawn(async move {
if let Err(e) = ethlambda_rpc::start_metrics_server(metrics_socket).await {
error!(%e, "Metrics server failed");
}
});2. Security: Default Bind Address (main.rs:51)Binding metrics to if options.http_address == IpAddr::from([0, 0, 0, 0]) {
warn!("HTTP API binding to all interfaces (0.0.0.0) - ensure this is intentional");
}3. Port Collision Risk (main.rs:53-55)The new 4. Code Duplication (rpc/src/lib.rs:15-28)Both async fn start_server(address: SocketAddr, router: Router) -> Result<(), std::io::Error> {
let listener = tokio::net::TcpListener::bind(address).await?;
axum::serve(listener, router).await
}5. API Route Organization (rpc/src/lib.rs:38)The health endpoint
6. Missing Graceful Shutdown (main.rs:170-171)The spawned servers have no graceful shutdown mechanism. Consider adding a shutdown signal handler: let api_handle = tokio::spawn(ethlambda_rpc::start_api_server(api_socket, store));
let metrics_handle = tokio::spawn(ethlambda_rpc::start_metrics_server(metrics_socket));
// Later in shutdown handling
api_handle.abort();
metrics_handle.abort();7. Naming Consistency (main.rs:50)The variable These changes improve the architecture by separating concerns, but need the above fixes for production readiness. Automated review by Kimi (Moonshot AI) · custom prompt |
🤖 Codex Code Review
No consensus-critical logic (fork choice, attestation processing, finalization, state transition, XMSS, SSZ) appears modified in this PR. The routing split is reasonable, but startup/error-handling behavior needs tightening before merge. Automated review by OpenAI Codex · custom prompt |
🤖 Claude Code ReviewHere is my review of PR 210: PR 210: Separate port configuration for API RPC and metricsThe motivation is sound — decoupling the API and metrics surfaces for independent network policy control is a standard production pattern. The implementation is clean and minimal. Bug: Server errors silently dropped
tokio::spawn(ethlambda_rpc::start_metrics_server(metrics_socket));
tokio::spawn(ethlambda_rpc::start_api_server(api_socket, store));The Previously the server was awaited and The minimum fix is to log the error inside the spawned task: tokio::spawn(async move {
ethlambda_rpc::start_metrics_server(metrics_socket)
.await
.inspect_err(|err| error!(%err, "Metrics server failed"))
.ok();
});Same for the API server. Minor: Startup log races bind
Correctness and design: everything else looks good
SummaryOne real bug (silently swallowed server errors) that needs fixing before merge; everything else is in good shape. Automated review by Claude (Anthropic) · custom prompt |
Greptile SummaryThis PR splits the single Axum server into two independent servers — an API server (default
Documentation in Confidence Score: 2/5
|
| Filename | Overview |
|---|---|
| bin/ethlambda/src/main.rs | Adds --api-port / --http-address CLI flags and spawns two servers via tokio::spawn, but drops both JoinHandles — bind failures and runtime errors from either server are silently swallowed. |
| crates/net/rpc/src/lib.rs | Cleanly splits start_rpc_server into start_api_server and start_metrics_server, moves /lean/v0/health to the API router, and keeps debug profiling routes on the metrics server — logic is correct. |
| crates/net/rpc/src/metrics.rs | Removes /lean/v0/health from the Prometheus router; get_health handler is retained as pub(crate) for reuse by the API router — straightforward and correct. |
Sequence Diagram
sequenceDiagram
participant Main as main.rs
participant API as start_api_server(:5054)
participant Metrics as start_metrics_server(:5055)
participant Client as API Client
participant Prom as Prometheus Scraper
Main->>API: tokio::spawn (JoinHandle dropped)
Main->>Metrics: tokio::spawn (JoinHandle dropped)
Main->>Main: wait for ctrl_c
Client->>API: GET /lean/v0/health
API-->>Client: 200 OK {"status":"healthy"}
Client->>API: GET /lean/v0/states/finalized
API-->>Client: 200 SSZ bytes
Client->>API: GET /lean/v0/checkpoints/justified
API-->>Client: 200 JSON checkpoint
Client->>API: GET /lean/v0/fork_choice[/ui]
API-->>Client: 200 JSON / HTML
Prom->>Metrics: GET /metrics
Metrics-->>Prom: 200 Prometheus text
Prom->>Metrics: GET /debug/pprof/allocs[/flamegraph]
Metrics-->>Prom: 200 heap profile
Comments Outside Diff (2)
-
preview-config.nix, line 95 (link)Stale
--metrics-addressflag breaks preview nodesThe
--metrics-addressCLI flag was renamed to--http-addressin this PR, butpreview-config.nixstill passes the old name.clapwill reject the unknown flag and refuse to start, causing all four preview systemd services to fail immediately.Prompt To Fix With AI
This is a comment left during a code review. Path: preview-config.nix Line: 95 Comment: **Stale `--metrics-address` flag breaks preview nodes** The `--metrics-address` CLI flag was renamed to `--http-address` in this PR, but `preview-config.nix` still passes the old name. `clap` will reject the unknown flag and refuse to start, causing all four preview systemd services to fail immediately. How can I resolve this? If you propose a fix, please make it concise.
-
Dockerfile, line 67 (link)New metrics port
5055not exposed in DockerfileThe
EXPOSEdirective still only lists port5054. After this change the metrics server defaults to port5055, so Prometheus scrapers in container-based deployments won't be able to reach it unless the port is explicitly mapped. The comment even says "5054 - Prometheus metrics" which is now incorrect (5054 is the API port, 5055 is metrics).Also update the adjacent comment to reflect the new assignment:
# 5054 - HTTP API # 5055 - Prometheus metricsPrompt To Fix With AI
This is a comment left during a code review. Path: Dockerfile Line: 67 Comment: **New metrics port `5055` not exposed in Dockerfile** The `EXPOSE` directive still only lists port `5054`. After this change the metrics server defaults to port `5055`, so Prometheus scrapers in container-based deployments won't be able to reach it unless the port is explicitly mapped. The comment even says "5054 - Prometheus metrics" which is now incorrect (5054 is the API port, 5055 is metrics). Also update the adjacent comment to reflect the new assignment: ``` # 5054 - HTTP API # 5055 - Prometheus metrics ``` How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: bin/ethlambda/src/main.rs
Line: 170-171
Comment:
**Server failures are silently dropped**
Both `tokio::spawn` calls immediately drop their `JoinHandle`, which means any errors — including a bind failure (e.g., port already in use) — are silently swallowed. If either server fails to start, the node will continue running without logging an error, with one or both HTTP surfaces unavailable. The previous code used `.await.unwrap()`, which would at least panic on bind failure.
Consider retaining the handles and either aborting the process or surfacing errors:
```rust
let metrics_handle = tokio::spawn(async move {
if let Err(err) = ethlambda_rpc::start_metrics_server(metrics_socket).await {
error!(%err, "Metrics server failed");
}
});
let api_handle = tokio::spawn(async move {
if let Err(err) = ethlambda_rpc::start_api_server(api_socket, store).await {
error!(%err, "API server failed");
}
});
info!("Node initialized");
tokio::signal::ctrl_c().await.ok();
println!("Shutting down...");
metrics_handle.abort();
api_handle.abort();
```
At minimum, the `Result` from each future should be logged or cause the process to exit, so operators learn of a failed bind instead of wondering why metrics/API are unreachable.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: preview-config.nix
Line: 95
Comment:
**Stale `--metrics-address` flag breaks preview nodes**
The `--metrics-address` CLI flag was renamed to `--http-address` in this PR, but `preview-config.nix` still passes the old name. `clap` will reject the unknown flag and refuse to start, causing all four preview systemd services to fail immediately.
```suggestion
"--http-address" "0.0.0.0"
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: Dockerfile
Line: 67
Comment:
**New metrics port `5055` not exposed in Dockerfile**
The `EXPOSE` directive still only lists port `5054`. After this change the metrics server defaults to port `5055`, so Prometheus scrapers in container-based deployments won't be able to reach it unless the port is explicitly mapped. The comment even says "5054 - Prometheus metrics" which is now incorrect (5054 is the API port, 5055 is metrics).
```suggestion
EXPOSE 9000/tcp 9000/udp 9001/udp 5052 5054 5055
```
Also update the adjacent comment to reflect the new assignment:
```
# 5054 - HTTP API
# 5055 - Prometheus metrics
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: 0da3ed0
bin/ethlambda/src/main.rs
Outdated
| tokio::spawn(ethlambda_rpc::start_metrics_server(metrics_socket)); | ||
| tokio::spawn(ethlambda_rpc::start_api_server(api_socket, store)); |
There was a problem hiding this comment.
Server failures are silently dropped
Both tokio::spawn calls immediately drop their JoinHandle, which means any errors — including a bind failure (e.g., port already in use) — are silently swallowed. If either server fails to start, the node will continue running without logging an error, with one or both HTTP surfaces unavailable. The previous code used .await.unwrap(), which would at least panic on bind failure.
Consider retaining the handles and either aborting the process or surfacing errors:
let metrics_handle = tokio::spawn(async move {
if let Err(err) = ethlambda_rpc::start_metrics_server(metrics_socket).await {
error!(%err, "Metrics server failed");
}
});
let api_handle = tokio::spawn(async move {
if let Err(err) = ethlambda_rpc::start_api_server(api_socket, store).await {
error!(%err, "API server failed");
}
});
info!("Node initialized");
tokio::signal::ctrl_c().await.ok();
println!("Shutting down...");
metrics_handle.abort();
api_handle.abort();At minimum, the Result from each future should be logged or cause the process to exit, so operators learn of a failed bind instead of wondering why metrics/API are unreachable.
Prompt To Fix With AI
This is a comment left during a code review.
Path: bin/ethlambda/src/main.rs
Line: 170-171
Comment:
**Server failures are silently dropped**
Both `tokio::spawn` calls immediately drop their `JoinHandle`, which means any errors — including a bind failure (e.g., port already in use) — are silently swallowed. If either server fails to start, the node will continue running without logging an error, with one or both HTTP surfaces unavailable. The previous code used `.await.unwrap()`, which would at least panic on bind failure.
Consider retaining the handles and either aborting the process or surfacing errors:
```rust
let metrics_handle = tokio::spawn(async move {
if let Err(err) = ethlambda_rpc::start_metrics_server(metrics_socket).await {
error!(%err, "Metrics server failed");
}
});
let api_handle = tokio::spawn(async move {
if let Err(err) = ethlambda_rpc::start_api_server(api_socket, store).await {
error!(%err, "API server failed");
}
});
info!("Node initialized");
tokio::signal::ctrl_c().await.ok();
println!("Shutting down...");
metrics_handle.abort();
api_handle.abort();
```
At minimum, the `Result` from each future should be logged or cause the process to exit, so operators learn of a failed bind instead of wondering why metrics/API are unreachable.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed — both spawns now wrap the future and log errors with error!().
- Log errors from spawned API/metrics servers instead of silently dropping JoinHandles - Update preview-config.nix: rename --metrics-address to --http-address, add --api-port, assign metrics ports to 8085-8088 range - Expose port 5055 in Dockerfile and update port comments - Update docs to reflect new flag names and default ports
|
All three issues addressed in 0570730:
Also updated |
Match Lighthouse's port convention for drop-in compatibility with existing monitoring and deployment setups: - --api-port default: 5054 -> 5052 - --metrics-port default: 5055 -> 5054
|
Update Claude's context with this information. Some parts mention we only support setting the metrics port. |
Update crate tree description and add new "HTTP Servers (API + Metrics)" section covering CLI flags (--http-address, --api-port, --metrics-port), API server routes (:5052), metrics server routes (:5054), and startup behavior. Reflects the server split from PR #210.
|
Done in f0ac667 — updated CLAUDE.md with the new dual-server architecture: CLI flags ( |
Reflect PR #210 changes: ethlambda now runs separate API (--api-port, default 5052) and metrics (--metrics-port, default 5054) HTTP servers with a shared bind address (--http-address). Updated validator config schema, port allocation guide, troubleshooting, client reference, and known issues sections.
… docs Update SKILL.md and checkpoint-sync.md to reflect that ethlambda now serves API endpoints on --api-port (default 5052) and metrics on --metrics-port (default 5054). Checkpoint sync URLs must use the API port, not the metrics port.
Closes #206
Motivation
The API RPC endpoints and Prometheus metrics were served from a single Axum server on one port (
--metrics-port 5054). This made it impossible to expose metrics to a monitoring stack (e.g., Prometheus scraper) without also exposing the API, or to restrict API access without blocking metrics collection. In production and devnet deployments, these often need different network policies.Description
Split the single HTTP server into two independent servers, each with its own port:
API Server (
--api-port, default5054)GET /lean/v0/health— Health check (moved from metrics)GET /lean/v0/states/finalized— Finalized state (SSZ)GET /lean/v0/checkpoints/justified— Justified checkpoint (JSON)GET /lean/v0/fork_choice— Fork choice tree (JSON)GET /lean/v0/fork_choice/ui— Interactive fork choice UI (HTML)Metrics Server (
--metrics-port, default5055)GET /metrics— Prometheus metricsGET /debug/pprof/allocs— Heap profilingGET /debug/pprof/allocs/flamegraph— Heap flamegraphCLI Flag Changes
--metrics-address127.0.0.1--http-address(shared by both servers)--metrics-port50545055--api-port5054Architecture
Files Changed
bin/ethlambda/src/main.rs— Renamed--metrics-addressto--http-address, added--api-portflag, changed--metrics-portdefault to5055, spawn both servers viatokio::spawnwith error loggingcrates/net/rpc/src/lib.rs— Splitstart_rpc_server()intostart_api_server(address, store)andstart_metrics_server(address), moved/lean/v0/healthroute to the API routercrates/net/rpc/src/metrics.rs— Removed/lean/v0/healthroute (now served by API router)preview-config.nix— Updated CLI flags (--http-address,--api-port), split port ranges: API on 8081-8084, metrics on 8085-8088Dockerfile— AddedEXPOSE 5055, updated port commentsdocs/metrics.md— Updated default endpoint URL to port 5055docs/fork_choice_visualization.md— Updated flag references to--api-portDesign Decisions
/lean/v0/healthis an API endpoint (used by load balancers, checkpoint sync clients), not an ops/metrics endpoint, so it belongs with the API.--http-address: Both servers bind to the same address. Per-server addresses would add complexity without clear benefit today.5054for API (preserves existing behavior for API consumers) and5055for metrics (avoids port conflict).How to Test
Breaking Changes
--metrics-addressrenamed to--http-address--metrics-portdefault changed from5054to5055/lean/v0/healthno longer available on the metrics portAny deployment scripts or docker-compose files referencing
--metrics-addressor scraping metrics on port5054will need updating.