fix(gateway): adopt external gateway instead of killing it (fixes #818)#820
fix(gateway): adopt external gateway instead of killing it (fixes #818)#820kagura-agent wants to merge 1 commit intoValueCell-ai:mainfrom
Conversation
…ueCell-ai#818) Reverse the order of operations in findExistingGatewayProcess() to try a WebSocket health check before killing non-owned processes. If the probe succeeds the gateway is adopted, preventing a kill-restart loop when systemd (or another service manager) owns the process. Also add a Linux systemd check in terminateOrphanedProcessIds(), mirroring the existing macOS launchctl handling — stops the user service before sending SIGTERM so the process is not auto-restarted. Co-Authored-By: Claude Opus 4.6 <[email protected]>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 04c0a935e8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| const ready = await probeGatewayReady(port, 5000); | ||
| if (ready) { | ||
| logger.info(`Adopting existing external gateway on port ${port}`); | ||
| return { port }; | ||
| } |
There was a problem hiding this comment.
Validate external gateway identity before adopting listener
This adoption branch trusts any process on the port that answers the WebSocket readiness probe, but that probe is unauthenticated and only checks for a connect.challenge event; once adopted, the startup flow proceeds to connect() and sends the gateway auth token in the handshake. In practice, a local process that binds the port and mimics the challenge can now receive credentials, whereas the previous behavior killed non-owned listeners first. Please require a stronger identity check (for example, verified ownership/service identity or authenticated probe) before returning { port } here.
Useful? React with 👍 / 👎.
Problem
When both ClawX (with
gatewayAutoStart: true) and a system service manager (systemd on Linux, launchd on macOS) are managing the gateway, ClawX enters an infinite kill-restart loop:Root Cause
findExistingGatewayProcess()insupervisor.tskills any non-owned process before checking if it's a healthy gateway. The WebSocket probe happens after the kill, so it always finds nothing.Fix
Probe before kill: Try WebSocket health check first. If the external gateway responds, adopt it (return
{ port }) instead of killing it. Only terminate if the probe fails (broken/non-gateway process).Add Linux systemd support: Added
stopSystemdGatewayService()(parallel to existing macOSunloadLaunchctlGatewayService()) that stops the systemd user service before killing orphaned processes, preventing the restart loop even when the gateway is unhealthy.Changes
electron/gateway/supervisor.ts: ReorderedfindExistingGatewayProcess()to probe-then-kill; addedstopSystemdGatewayService()tests/unit/gateway-supervisor.test.ts: Added tests for adoption path and systemd stop behavior; updated existing tests withprobeGatewayReadymockTesting
@testing-library/dompeer dep)Fixes #818