Skip to content

fix(gateway): adopt external gateway instead of killing it (fixes #818)#820

Open
kagura-agent wants to merge 1 commit intoValueCell-ai:mainfrom
kagura-agent:fix/adopt-external-gateway
Open

fix(gateway): adopt external gateway instead of killing it (fixes #818)#820
kagura-agent wants to merge 1 commit intoValueCell-ai:mainfrom
kagura-agent:fix/adopt-external-gateway

Conversation

@kagura-agent
Copy link
Copy Markdown
Contributor

Problem

When both ClawX (with gatewayAutoStart: true) and a system service manager (systemd on Linux, launchd on macOS) are managing the gateway, ClawX enters an infinite kill-restart loop:

  1. ClawX detects a non-owned process on the gateway port
  2. ClawX kills it (treats it as "orphaned")
  3. systemd detects the killed process → restarts it (5s later)
  4. ClawX detects it again → kills again → loop (163+ cycles reported)

Root Cause

findExistingGatewayProcess() in supervisor.ts kills any non-owned process before checking if it's a healthy gateway. The WebSocket probe happens after the kill, so it always finds nothing.

Fix

  1. Probe before kill: Try WebSocket health check first. If the external gateway responds, adopt it (return { port }) instead of killing it. Only terminate if the probe fails (broken/non-gateway process).

  2. Add Linux systemd support: Added stopSystemdGatewayService() (parallel to existing macOS unloadLaunchctlGatewayService()) that stops the systemd user service before killing orphaned processes, preventing the restart loop even when the gateway is unhealthy.

Changes

  • electron/gateway/supervisor.ts: Reordered findExistingGatewayProcess() to probe-then-kill; added stopSystemdGatewayService()
  • tests/unit/gateway-supervisor.test.ts: Added tests for adoption path and systemd stop behavior; updated existing tests with probeGatewayReady mock

Testing

  • All 5 gateway supervisor tests pass ✅
  • Full test suite: 61/70 suites pass (9 pre-existing failures from missing @testing-library/dom peer dep)

Fixes #818

…ueCell-ai#818)

Reverse the order of operations in findExistingGatewayProcess() to try a
WebSocket health check before killing non-owned processes. If the probe
succeeds the gateway is adopted, preventing a kill-restart loop when
systemd (or another service manager) owns the process.

Also add a Linux systemd check in terminateOrphanedProcessIds(), mirroring
the existing macOS launchctl handling — stops the user service before
sending SIGTERM so the process is not auto-restarted.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link
Copy Markdown
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 04c0a935e8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +293 to +297
const ready = await probeGatewayReady(port, 5000);
if (ready) {
logger.info(`Adopting existing external gateway on port ${port}`);
return { port };
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate external gateway identity before adopting listener

This adoption branch trusts any process on the port that answers the WebSocket readiness probe, but that probe is unauthenticated and only checks for a connect.challenge event; once adopted, the startup flow proceeds to connect() and sends the gateway auth token in the handshake. In practice, a local process that binds the port and mimics the challenge can now receive credentials, whereas the previous behavior killed non-owned listeners first. Please require a stronger identity check (for example, verified ownership/service identity or authenticated probe) before returning { port } here.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: 如果安装了openclaw, ClawX 桌面应用和 systemd 用户服务 同时在管理 openclaw-gateway, 会无限循环重启openclaw-gateway

1 participant