fix(copilot-sdk): post-completion idle watchdog to bound SDK hang after final tool result#41572
Conversation
… after final tool result When the Copilot SDK's sendAndWait never resolves after the agent's final tool call returns success, the process would hang until the full step timeout fires (burning the entire remaining step budget). This adds a post-completion idle watchdog (GH_AW_SDK_IDLE_MS, default 5 min) to copilot_sdk_session.cjs: - The watchdog is armed whenever the session enters the "work complete" state: output has been collected AND no tool calls are still pending. - It is disarmed when a new tool call starts (the agent is still mid-turn). - If no new SDK events arrive within the idle window, the watchdog force-disconnects the session; the catch block treats that as a successful completion (same semantics as the existing idle-timeout path). This bounds the post-completion hang from full-step-timeout (~20 min) to at most the watchdog idle period (~5 min), without affecting normal completions where sendAndWait resolves on its own. Closes #41551 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
…ce-condition guard Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
@copilot Review the copilot SDK documentation to ensure we are not missing any events. |
Reviewed the
Also updated the watchdog arm/fire conditions to include Events intentionally left in
Two new regression tests added: watchdog disarmed during |
There was a problem hiding this comment.
Pull request overview
Adds a “post-completion” idle watchdog to the Copilot SDK session runner to prevent hangs where sendAndWait never resolves after the agent has effectively finished, reducing wasted runner time and step timeouts.
Changes:
- Arm/rearm a post-completion watchdog timer when
hasOutput === trueandpendingToolCalls.size === 0, force-disconnecting the session if no further events arrive within the configured window. - Treat watchdog-triggered disconnects as successful completion when output is present and there are no pending tool calls.
- Add regression tests covering watchdog firing, disarming behavior, and normal-completion behavior.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/copilot_sdk_session.cjs | Adds the post-completion idle watchdog and success-handling path to bound SDK hangs after final tool completion. |
| actions/setup/js/copilot_sdk_driver.test.cjs | Adds regression tests validating watchdog behavior and ensuring completion semantics are preserved. |
Review details
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 2
- Review effort level: Low
| // sendAndWait resolves before watchdog fires (watchdog idle = 20ms in test). | ||
| return { data: { content: "completed normally" } }; | ||
| }), | ||
| disconnect, | ||
| }; | ||
| class FakeCopilotClient { | ||
| start = vi.fn().mockResolvedValue(undefined); | ||
| createSession = vi.fn().mockResolvedValue(session); | ||
| stop = stop; | ||
| } | ||
|
|
||
| const prevIdleMs = process.env.GH_AW_SDK_IDLE_MS; | ||
| process.env.GH_AW_SDK_IDLE_MS = "20"; |
|
✅ PR Code Quality Reviewer completed the code quality review. |
|
✅ Test Quality Sentinel completed test quality analysis. TQS analysis already completed in this workflow run (run-id 28213036855): comment posted and APPROVE review submitted for PR #41572 (score 90/100, 4 design tests, 0 violations). Both add_comment and submit_pull_request_review limits were consumed in the prior agent session. |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR does not have the 'implementation' label and has ≤100 new lines of code in business logic directories (default_business_additions=0, file_count=2). |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
🧪 Test Quality Sentinel Report✅ Test Quality Score: 90/100 — Excellent
📊 Metrics & Test Classification (4 tests analyzed)
Go: 0 ( Score components: behavioral_ratio=40/40, edge_case_ratio=30/30, duplication_penalty=0, inflation_penalty=10 → Score = 90 📝 Per-Test Analysis
Mocking assessment: All Inflation note: The 4.46:1 ratio exceeds the 2:1 threshold and incurs a 10-point deduction. The scaffolding cost is justified — each async timing test requires Promise-coordinated fake SDKs, event sequences, and env-var isolation. Verbosity is structural, not duplicative. Verdict
|
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /diagnose, /tdd, /zoom-out, and /grill-with-docs — commenting with minor-to-medium observations. No blocking issues; the fix is correct and well-tested.
📋 Key Themes & Highlights
Key Themes
- Test spec precision: One assertion uses
.toHaveBeenCalled()where.toHaveBeenCalledTimes(2)is stated in the comment — a small but exploitable gap in the regression spec. - Env var naming:
GH_AW_SDK_IDLE_MSis ambiguous alongside the existingSDK_IDLE_TIMEOUT_PATTERNpath;GH_AW_SDK_POST_COMPLETION_IDLE_MSwould reduce operator confusion. - Test boilerplate: The env var save/restore pattern is copy-pasted 4× — a
beforeEach/afterEachblock would tighten this. - Comment accuracy: The "race condition guard" label and one test name do not reflect the actual mechanics.
- Sliding-window semantics: The watchdog re-arms on every qualifying event; worth documenting whether this is intentional (it probably is) so future maintainers don't change it inadvertently.
Positive Highlights
- ✅ Correct placement of the watchdog in the event handler and solid
finally-block cleanup — no timer leaks. - ✅ The
!catastrophicToolDenialsErrorguard in the catch path correctly preserves the existing denial-failure semantics. - ✅ 4 targeted regression tests that each isolate a distinct state transition — arm, disarm, no-premature-arm, and normal-resolution.
- ✅ Exported
SDK_POST_COMPLETION_IDLE_MS_DEFAULTconstant makes the default visible and testable without reading the source. - ✅ Clear PR description with a worked timeline example that makes the fix easy to validate.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 74.2 AIC · ⌖ 11.4 AIC · ⊞ 6.5K
| expect(result.hasOutput).toBe(true); | ||
| expect(result.output).toContain("Issue filed successfully"); | ||
| // disconnect is called twice: once by the watchdog and once in finally. | ||
| expect(disconnectWithSignal).toHaveBeenCalled(); |
There was a problem hiding this comment.
[/tdd] The comment says disconnect is called twice (watchdog + finally), but the assertion only checks .toHaveBeenCalled() — it does not verify the exact count.
This leaves the spec ambiguous: if the watchdog somehow called disconnect three times, the test would still pass. Pin the count to match the stated expectation.
💡 Suggested fix
// disconnect is called twice: once by the watchdog and once in finally.
expect(disconnectWithSignal).toHaveBeenCalledTimes(2);Using toHaveBeenCalledTimes(2) turns the comment into a machine-checked specification, preventing a future over-disconnect from going undetected.
| postCompletionWatchdog = setTimeout(() => { | ||
| postCompletionWatchdog = null; | ||
| // Re-check conditions at fire time: a new tool call could have started | ||
| // between arming the watchdog and the timer firing (race condition guard). |
There was a problem hiding this comment.
[/diagnose] The phrase "race condition guard" is inaccurate in a single-threaded JavaScript runtime — there is no concurrent mutation here.
The guard is catching stale state at timer-fire time: between when the watchdog was armed and when the macrotask fires, new events could have changed hasOutput or pendingToolCalls. "Stale-state guard" is the accurate description, and makes the intent clearer to future readers.
💡 Suggested comment
// Stale-state guard: events processed between arming the watchdog and the
// timer firing could have changed hasOutput / pendingToolCalls.
if (!hasOutput || pendingToolCalls.size !== 0 || !session) return;| // block treats the result as a successful completion. This bounds the damage | ||
| // from the SDK driver bug where sendAndWait never resolves after the final | ||
| // tool result is returned. | ||
| const postCompletionIdleMs = getEnvPositiveIntOrDefault("GH_AW_SDK_IDLE_MS", SDK_POST_COMPLETION_IDLE_MS_DEFAULT); |
There was a problem hiding this comment.
[/grill-with-docs] GH_AW_SDK_IDLE_MS is ambiguous alongside the existing SDK_IDLE_TIMEOUT_PATTERN mechanism — an operator could easily conflate the two and misconfigure the wrong timeout.
Consider renaming to GH_AW_SDK_POST_COMPLETION_IDLE_MS to match the exported constant name (SDK_POST_COMPLETION_IDLE_MS_DEFAULT) and to self-document the specific hang scenario it addresses.
💡 Context
The existing idle-timeout path is triggered by the SDK emitting a Timeout after Nms waiting for session.idle error — a different failure mode. A new operator reading the env var docs would have no way to distinguish which timeout GH_AW_SDK_IDLE_MS controls without reading the source.
| } | ||
| }); | ||
|
|
||
| it("post-completion watchdog does not fire when tool calls are still pending", async () => { |
There was a problem hiding this comment.
[/tdd] The test name "does not fire when tool calls are still pending" doesn't match the actual scenario.
The scenario emits an assistant.message first (which arms the watchdog), then starts a new tool call (which disarms it), completes it (which re-arms it), and finally resolves sendAndWait normally before the timer fires. The scenario being tested is watchdog disarmed on new tool start, re-armed on completion, cleared on normal resolution.
A more accurate name would help future readers understand the contract at a glance, e.g.: "post-completion watchdog is disarmed by a new tool call and cleared on normal sendAndWait resolution".
|
|
||
| const prevIdleMs = process.env.GH_AW_SDK_IDLE_MS; | ||
| // Use a very short idle timeout so the watchdog fires quickly in tests. | ||
| process.env.GH_AW_SDK_IDLE_MS = "20"; |
There was a problem hiding this comment.
[/tdd] The GH_AW_SDK_IDLE_MS save/restore pattern is copy-pasted across all 4 new tests. A beforeEach/afterEach pair would eliminate the repetition and make it impossible to forget cleanup in one test.
💡 Suggested refactor
let prevSdkIdleMs;
beforeEach(() => {
prevSdkIdleMs = process.env.GH_AW_SDK_IDLE_MS;
process.env.GH_AW_SDK_IDLE_MS = "20";
});
afterEach(() => {
if (prevSdkIdleMs === undefined) delete process.env.GH_AW_SDK_IDLE_MS;
else process.env.GH_AW_SDK_IDLE_MS = prevSdkIdleMs;
});Place these inside the relevant describe block so they scope only to the watchdog tests.
| postCompletionWatchdog = null; | ||
| // Re-check conditions at fire time: a new tool call could have started | ||
| // between arming the watchdog and the timer firing (race condition guard). | ||
| if (!hasOutput || pendingToolCalls.size !== 0 || !session) return; |
There was a problem hiding this comment.
[/diagnose] The !session guard is always false at this point — the watchdog can only be armed inside session.on(...), which is called after session is assigned. The guard is dead code.
Consider removing it to keep the condition accurate, or replace it with a comment explaining why it was considered (and why it was safe to drop).
| // tool call was just started, or no output yet). | ||
| // The watchdog fires only if sendAndWait never resolves on its own after | ||
| // the final tool result is returned — the common SDK post-completion hang. | ||
| if (hasOutput && pendingToolCalls.size === 0) { |
There was a problem hiding this comment.
[/zoom-out] The watchdog uses a sliding-window strategy: it re-arms on every event that satisfies hasOutput && pendingToolCalls.size === 0, not just on the initial transition into that state.
This means if the SDK keeps emitting qualifying events (e.g., repeated assistant.message chunks) after the final tool call, the window keeps resetting and the watchdog never fires — potentially defeating its purpose for a chatty SDK.
If the intent is "N ms of silence after the final tool result", the behaviour is correct and just needs a clearer comment. If the intent is "fire N ms after first entering the complete state", the logic needs a one-shot arm:
💡 One-shot arm alternative
// Arm once when transitioning into the post-completion state.
if (hasOutput && pendingToolCalls.size === 0 && !postCompletionWatchdog) {
postCompletionWatchdog = setTimeout(() => { ... }, postCompletionIdleMs);
}
// Disarm when no longer complete.
if (!(hasOutput && pendingToolCalls.size === 0) && postCompletionWatchdog) {
clearTimeout(postCompletionWatchdog);
postCompletionWatchdog = null;
}The PR description says the SDK "fails to resolve after the final tool result", which implies silence after that point — so sliding-window is likely fine. Just worth making the choice explicit in a comment.
There was a problem hiding this comment.
REQUEST_CHANGES — Two issues that need fixing before merge; two additional cleanup items.
🔴 Blocking issues
1. Double-disconnect can override the successful return (HIGH)
The watchdog fires void session.disconnect() and the finally block unconditionally calls await session.disconnect() again. If the SDK's disconnect() rejects on a second call, the finally-block rejection propagates to the caller instead of the return { exitCode: 0 } from the catch block. In JavaScript, an exception thrown from finally overrides any return or throw in try/catch. The feature works correctly only if the SDK happens to be idempotent on disconnect — an assumption that is not documented or tested.
See inline comment on line 418 for a concrete fix.
2. Watchdog re-arms on any SDK event — defeats the timeout on keepalive-emitting sessions (MEDIUM)
The watchdog is rearmed (clearTimeout + fresh setTimeout) on every event where hasOutput && pendingToolCalls.size === 0. In the exact hung-session scenario this watchdog is designed to fix, if the SDK emits any periodic keepalive or status event while sendAndWait is stuck, the idle window resets indefinitely. The watchdog will never fire.
See inline comment on line 331 for two alternative designs.
🟡 Cleanup items (non-blocking)
- Test assertion too weak (line 409, test file):
toHaveBeenCalled()should betoHaveBeenCalledTimes(2)since the comment explicitly says disconnect fires twice. - Dead
!sessionguard (line 335, session file):sessionis a const that is never nulled; the guard is unreachable.
🔎 Code quality review by PR Code Quality Reviewer · 127.1 AIC · ⌖ 8.86 AIC · ⊞ 5.2K
Comments that could not be inline-anchored
actions/setup/js/copilot_sdk_session.cjs:418
Double-disconnect: watchdog and finally both call session.disconnect() — if the SDK is not idempotent, the finally-block exception overrides the successful catch-block return.
<details>
<summary>💡 Details and suggested fix</summary>
When the watchdog fires (line 338), it calls void session.disconnect() (fire-and-forget). The finally block then unconditionally calls await session.disconnect() a second time. In JavaScript, a finally block always runs even after a return i…
actions/setup/js/copilot_sdk_session.cjs:331
Watchdog re-arms on every SDK event — SDK keepalives or status messages emitted while sendAndWait is hung will prevent the watchdog from ever firing.
<details>
<summary>💡 Details</summary>
The watchdog is rearmed (clearTimeout + fresh setTimeout) on every SDK event where hasOutput && pendingToolCalls.size === 0 — not only on tool or message events. This implements an idle timer: "no events for N ms."
The problem is the hang scenario this watchdog is designed to fix: `sendAnd…
actions/setup/js/copilot_sdk_driver.test.cjs:409
Assertion is weaker than the comment claims — toHaveBeenCalled() does not verify the expected two-call behavior.
<details>
<summary>💡 Suggested fix</summary>
The inline comment on the previous line says:
// disconnect is called twice: once by the watchdog and once in finally.
expect(disconnectWithSignal).toHaveBeenCalled();toHaveBeenCalled() passes with one or more calls. If the double-disconnect is intentional behavior (and it is — see line 418 in `copilot_sdk_sessi…
actions/setup/js/copilot_sdk_session.cjs:335
!session guard is dead code — session is a const that is never set to null.
<details>
<summary>💡 Details</summary>
session is assigned once via const session = await client.createSession(...) and is never reassigned or nulled anywhere in runWithCopilotSDK. The !session branch in the watchdog callback guard:
if (!hasOutput || pendingToolCalls.size !== 0 || !session) return;...will never be true. This is confusing to future readers who might wonder what lifec…
|
@copilot Run pr-finisher skill. |
…val in flaky test Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Done. Addressed the two open review threads:
Local validation passes. Commit: |
|
✅ All tools validated successfully! Agent Container Smoke Test confirms agent container is ready. |
|
🚀 Smoke Pi MISSION COMPLETE! Pi delivered. 🥧 Smoke test completed; created issue. PR auto-comment unsupported in current event context, label skipped because tests failed. |
|
🚀 Smoke Gemini MISSION COMPLETE! Gemini has spoken. ✨ |
|
🚀 Smoke Antigravity MISSION COMPLETE! Antigravity has spoken. ✨ |
|
🎬 THE END — Smoke Claude MISSION: ACCOMPLISHED! The hero saves the day! ✨ |
|
📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing... |
|
📰 BREAKING: Smoke Copilot - AOAI (apikey) is now investigating this pull request. Sources say the story is developing... |
|
📰 BREAKING: Smoke Copilot - AOAI (Entra) is now investigating this pull request. Sources say the story is developing... |
Agent Container Tool Check
Result: 12/12 tools available ✅ Overall Status: PASS
|
Smoke Test Results
Overall Status: FAIL Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "localhost"See Network Configuration for more information.
|
|
Smoke Test: Codex - 28215703546 Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
Comment MemoryNote This comment is managed by comment memory.It stores persistent context for this thread in the code block at the top of this comment. Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
💥 Smoke Test: Claude — Run 28215703568Core #1-12: 1✅ 2✅ 3✅ 4✅ 5❌(Tavily) 6✅ 7✅ 8✅ 9✅ 10✅ 11✅ 12✅ Overall: PARTIAL — Tavily MCP registered 0 tools. Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
There was a problem hiding this comment.
💥 Automated smoke test review - all systems nominal!
Warning
Firewall blocked 6 domains
The following domains were blocked by the firewall during workflow execution:
accounts.google.comandroid.clients.google.comclients2.google.comcontentautofill.googleapis.comsafebrowsingohttpgateway.googleapis.comwww.google.com
To allow these domains, add them to the
network.allowedlist in your workflow frontmatter:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
💥 [THE END] — Illustrated by Smoke Claude · 67.9 AIC · ⌖ 25.5 AIC · ⊞ 8.7K
| // Default idle period for the post-completion watchdog: 5 minutes. | ||
| // When the agent has produced output and all tracked tool calls have completed, | ||
| // the driver arms a watchdog timer. If no new SDK events arrive within this | ||
| // window, the driver force-disconnects the session and treats it as a successful |
There was a problem hiding this comment.
Consider documenting the default timeout value here for maintainability.
| // (default 5 minutes), the watchdog force-disconnects the session and the catch | ||
| // block treats the result as a successful completion. This bounds the damage | ||
| // from the SDK driver bug where sendAndWait never resolves after the final | ||
| // tool result is returned. |
There was a problem hiding this comment.
Nice watchdog guard — a brief comment on the race-condition rationale would help future readers.
Comment MemoryNote This comment is managed by comment memory.It stores persistent context for this thread in the code block at the top of this comment. Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
There was a problem hiding this comment.
Smoke review done. Two inline taps.
Warning
Firewall blocked 6 domains
The following domains were blocked by the firewall during workflow execution:
accounts.google.comandroid.clients.google.comclients2.google.comcontentautofill.googleapis.comsafebrowsingohttpgateway.googleapis.comwww.google.com
To allow these domains, add them to the
network.allowedlist in your workflow frontmatter:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
📰 BREAKING: Report filed by Smoke Copilot · 362.2 AIC · ⌖ 24.2 AIC · ⊞ 18.9K
|
Smoke tests: ✅1, ✅2, ❌3, ❌4, ✅5, ✅6, ✅7, ✅8. Overall: FAIL. Warning Firewall blocked 6 domainsThe following domains were blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
|
There was a problem hiding this comment.
Warning
Firewall blocked 6 domains
The following domains were blocked by the firewall during workflow execution:
accounts.google.comandroid.clients.google.comclients2.google.comcontentautofill.googleapis.comsafebrowsingohttpgateway.googleapis.comwww.google.com
To allow these domains, add them to the
network.allowedlist in your workflow frontmatter:
network:
allowed:
- defaults
- "accounts.google.com"
- "android.clients.google.com"
- "clients2.google.com"
- "contentautofill.googleapis.com"
- "safebrowsingohttpgateway.googleapis.com"
- "www.google.com"See Network Configuration for more information.
📰 BREAKING: Report filed by Smoke Copilot - AOAI (apikey) · 295.7 AIC · ⌖ 6.44 AIC · ⊞ 17.9K
After the agent's final tool call returns success, the SDK's
sendAndWaitcan fail to resolve — leaving the process hung until the step timeout kills the runner and burning the full remaining step budget (up to ~20 min) with no additional work produced.Changes
copilot_sdk_session.cjs— post-completion idle watchdogsetTimeout(default 5 min, override viaGH_AW_SDK_IDLE_MS) whenever the session enters "work complete" state:hasOutput === true && pendingToolCalls.size === 0.tool.execution_startevent (agent still mid-turn) and cleared infinallywhensendAndWaitresolves normally.session.disconnect()and setspostCompletionWatchdogTriggered; the catch block treats this as a successful completion — same semantics as the existingSDK_IDLE_TIMEOUT_PATTERNpath.SDK_POST_COMPLETION_IDLE_MS_DEFAULTfor external reference.copilot_sdk_driver.test.cjs— 4 new regression testsexitCode 0, output preserveddisconnectsendAndWaitresolves before timer fires✨ PR Review Safe Output Test - Run 28215703568
Warning
Firewall blocked 6 domains
The following domains were blocked by the firewall during workflow execution:
accounts.google.comandroid.clients.google.comclients2.google.comcontentautofill.googleapis.comsafebrowsingohttpgateway.googleapis.comwww.google.comSee Network Configuration for more information.