fix(openclaw-plugin): add defensive re-spawn for OpenViking subproces…#1053
Merged
qin-ctx merged 1 commit intovolcengine:mainfrom Mar 28, 2026
Merged
fix(openclaw-plugin): add defensive re-spawn for OpenViking subproces…#1053qin-ctx merged 1 commit intovolcengine:mainfrom
qin-ctx merged 1 commit intovolcengine:mainfrom
Conversation
|
Failed to generate code suggestions for PR |
…s after Gateway restart
qin-ctx
approved these changes
Mar 28, 2026
This was referenced Mar 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…s after Gateway restart
Description
Fix OpenViking subprocess not recovering after Gateway force-restart, causing all memory-dependent requests to time out indefinitely. Always log subprocess exit events (including code=0) for better diagnostics. Add diagnostic logging to ov_archive_expand tool invocations.
Related Issue
input long text → restart Gateway → query → timeout with no response.
Type of Change
Changes Made
Always log subprocess exit: Removed the code !== 0 guard in the exit handler so exits with code=0 are also logged, preventing silent process disappearance
Defensive re-spawn: When isSpawner=false in local mode, check if a valid process actually exists (via cache + health check); if not, trigger a fresh spawn using the same env/config as the primary spawn path
ov_archive_expand diagnostics: Added structured logging for tool invocations (archiveId, sessionId), successful expansions (message count, char count), and failures (error details)
Testing
Checklist
Screenshots (if applicable)
Additional Notes
Root cause: The plugin is registered multiple times per Gateway startup (gateway subsystem + per-session). Only the first start() call to find a pending entry in localClientPendingPromises becomes the spawner. After a force-restart, race conditions in module loading can cause all start() calls to miss the pending entry, leaving isSpawner=false for every call. The original else branch silently swallowed health-check failures, so the system entered a permanently broken state. The defensive re-spawn acts as a self-healing fallback that detects "no valid process" and recovers automatically.