Summary
The runtime broker's control channel repeatedly disconnects after the hub has been running for approximately 4 hours, causing no_runtime_broker errors on all agent start attempts. The broker does not automatically re-establish the connection — a full hub restart is required to recover.
Environment
- Instance: scion-next (https://next.demo.scion-ai.dev)
- Hub binary: upstream main at b2eaa59 (cross-compiled, deployed 2026-06-03)
- Runtime broker:
isHubManaged=false, Docker runtime, listening on localhost:9800
Expected behavior
The runtime broker control channel should either stay connected indefinitely or automatically reconnect when disconnected, without requiring a hub restart.
Actual behavior
After ~4h15m of uptime the control channel disconnects. The hub marks the broker offline (onlineProviders=0). Reconnect attempts fail. All new agent start requests return:
no_runtime_broker: Default runtime broker is unavailable and no alternatives found (status: 422)
Log sequence (from scion-next journal)
- 14:28 — hub started (PID 2606172), runtime broker registered at
localhost:9800
- 18:40–18:43 — broker healthy,
onlineProviders=1, agents launching successfully
- 18:43:34 — "Broker control channel disconnected" / "Broker disconnected, marking offline"
- 18:46:33 — brief reconnect then disconnected again
- 18:51:22 — final disconnect, no recovery
- 20:16 — manual hub restart → broker re-registered → fixed
Key log messages:
INFO "Broker control channel disconnected" brokerID=a6487539-...
INFO "Broker disconnected, marking offline" brokerID=a6487539-...
Timing note
The first disconnect occurred at exactly 4h15m after hub startup — may indicate a connection TTL, OAuth token expiry, or keep-alive timeout. The pattern of 3 disconnects with brief recoveries before a permanent offline state suggests a retry backoff that eventually gives up.
Workaround
Restart scion-hub. The runtime broker re-registers on startup and resumes normally.
Additional context
- Not observed on the previous binary (branch
grove-rename2). May be a regression in how the runtime broker control channel lifecycle is managed in recent upstream/main code.
- The
scion-sagan instance (which uses isHubManaged=true) did not exhibit this pattern.
- Likely related to: the 422
no_runtime_broker errors currently blocking new agent creation on the scion-gteam instance.
Summary
The runtime broker's control channel repeatedly disconnects after the hub has been running for approximately 4 hours, causing
no_runtime_brokererrors on all agent start attempts. The broker does not automatically re-establish the connection — a full hub restart is required to recover.Environment
isHubManaged=false, Docker runtime, listening onlocalhost:9800Expected behavior
The runtime broker control channel should either stay connected indefinitely or automatically reconnect when disconnected, without requiring a hub restart.
Actual behavior
After ~4h15m of uptime the control channel disconnects. The hub marks the broker offline (
onlineProviders=0). Reconnect attempts fail. All new agent start requests return:Log sequence (from scion-next journal)
localhost:9800onlineProviders=1, agents launching successfullyKey log messages:
Timing note
The first disconnect occurred at exactly 4h15m after hub startup — may indicate a connection TTL, OAuth token expiry, or keep-alive timeout. The pattern of 3 disconnects with brief recoveries before a permanent offline state suggests a retry backoff that eventually gives up.
Workaround
Restart
scion-hub. The runtime broker re-registers on startup and resumes normally.Additional context
grove-rename2). May be a regression in how the runtime broker control channel lifecycle is managed in recent upstream/main code.scion-saganinstance (which usesisHubManaged=true) did not exhibit this pattern.no_runtime_brokererrors currently blocking new agent creation on thescion-gteaminstance.