fix(copilot-sdk): post-completion idle watchdog to bound SDK hang after final tool result by Copilot · Pull Request #41572 · github/gh-aw

Copilot · 2026-06-26T01:37:45Z

After the agent's final tool call returns success, the SDK's sendAndWait can fail to resolve — leaving the process hung until the step timeout kills the runner and burning the full remaining step budget (up to ~20 min) with no additional work produced.

Changes

`copilot_sdk_session.cjs` — post-completion idle watchdog

Arms a setTimeout (default 5 min, override via GH_AW_SDK_IDLE_MS) whenever the session enters "work complete" state: hasOutput === true && pendingToolCalls.size === 0.
Disarms on every new tool.execution_start event (agent still mid-turn) and cleared in finally when sendAndWait resolves normally.
On fire: calls session.disconnect() and sets postCompletionWatchdogTriggered; the catch block treats this as a successful completion — same semantics as the existing SDK_IDLE_TIMEOUT_PATTERN path.
Exports SDK_POST_COMPLETION_IDLE_MS_DEFAULT for external reference.

T+12m45s  create_issue → safeoutputs returns success
T+12m45s  agent emits final markdown summary
T+12m45s  [sdk-driver] warning: post-completion idle watchdog fired after 300000ms — force-disconnecting session
T+17m45s  [sdk-driver] warning: post-completion watchdog triggered disconnect — treating as completed
          [copilot-harness] attempt 1: process exit event exitCode=0   ← instead of T+20m timeout

`copilot_sdk_driver.test.cjs` — 4 new regression tests

Test	Asserts
watchdog fires on hang	`exitCode 0`, output preserved
watchdog disarmed by new tool call	normal completion, single `disconnect`
watchdog not armed before first output	no spurious disconnect
`sendAndWait` resolves before timer fires	watchdog cleared, no double-disconnect

✨ PR Review Safe Output Test - Run 28215703568

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

accounts.google.com
android.clients.google.com
clients2.google.com
contentautofill.googleapis.com
safebrowsingohttpgateway.googleapis.com
www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · 67.9 AIC · ⌖ 25.5 AIC · ⊞ 8.7K · ◷

… after final tool result When the Copilot SDK's sendAndWait never resolves after the agent's final tool call returns success, the process would hang until the full step timeout fires (burning the entire remaining step budget). This adds a post-completion idle watchdog (GH_AW_SDK_IDLE_MS, default 5 min) to copilot_sdk_session.cjs: - The watchdog is armed whenever the session enters the "work complete" state: output has been collected AND no tool calls are still pending. - It is disarmed when a new tool call starts (the agent is still mid-turn). - If no new SDK events arrive within the idle window, the watchdog force-disconnects the session; the catch block treats that as a successful completion (same semantics as the existing idle-timeout path). This bounds the post-completion hang from full-step-timeout (~20 min) to at most the watchdog idle period (~5 min), without affecting normal completions where sendAndWait resolves on its own. Closes #41551 Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

…ce-condition guard Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

pelikhan · 2026-06-26T01:57:21Z

@copilot Review the copilot SDK documentation to ensure we are not missing any events.

Copilot · 2026-06-26T02:14:28Z

@copilot Review the copilot SDK documentation to ensure we are not missing any events.

Reviewed the @github/copilot-sdk SessionEvent union in session-events.d.ts. Found several events not yet handled in copilot_sdk_session.cjs:

SDK event	Was missing	Action taken
`assistant.turn_start`	✅	Sets `inAssistantTurn = true` — disarms the post-completion watchdog while the agent is mid-LLM-inference (before it dispatches any tool calls in the new turn)
`assistant.turn_end`	✅	Clears `inAssistantTurn = false` — allows the watchdog to re-arm after the turn
`session.task_complete`	✅	Logged to JSONL with `success` + `summary` fields
`subagent.started`	✅	Logged to JSONL
`subagent.completed`	✅	Logged to JSONL
`subagent.failed`	✅	Logged to JSONL

Also updated the watchdog arm/fire conditions to include !inAssistantTurn, so a slow LLM inference turn that starts with zero pending tool calls no longer risks a premature force-disconnect.

Events intentionally left in default (skip) after review:

tool.execution_partial_result / tool.execution_progress — always ephemeral: true, filtered before the switch
session.idle — always ephemeral: true, filtered before the switch
external_tool.requested / external_tool.completed — not used in gh-aw's driver (no external tools registered); external_tool.completed is always ephemeral so it cannot be used to remove entries from pendingToolCalls

Two new regression tests added: watchdog disarmed during turn_start→turn_end cycle, and session.task_complete written to JSONL. All 11,123 tests pass.

Copilot

Pull request overview

Adds a “post-completion” idle watchdog to the Copilot SDK session runner to prevent hangs where sendAndWait never resolves after the agent has effectively finished, reducing wasted runner time and step timeouts.

Changes:

Arm/rearm a post-completion watchdog timer when hasOutput === true and pendingToolCalls.size === 0, force-disconnecting the session if no further events arrive within the configured window.
Treat watchdog-triggered disconnects as successful completion when output is present and there are no pending tool calls.
Add regression tests covering watchdog firing, disarming behavior, and normal-completion behavior.

Show a summary per file

File	Description
actions/setup/js/copilot_sdk_session.cjs	Adds the post-completion idle watchdog and success-handling path to bound SDK hangs after final tool completion.
actions/setup/js/copilot_sdk_driver.test.cjs	Adds regression tests validating watchdog behavior and ensuring completion semantics are preserved.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 2
Review effort level: Low

+          // sendAndWait resolves before watchdog fires (watchdog idle = 20ms in test).
+          return { data: { content: "completed normally" } };
+        }),
+        disconnect,
+      };
+      class FakeCopilotClient {
+        start = vi.fn().mockResolvedValue(undefined);
+        createSession = vi.fn().mockResolvedValue(session);
+        stop = stop;
+      }
+
+      const prevIdleMs = process.env.GH_AW_SDK_IDLE_MS;
+      process.env.GH_AW_SDK_IDLE_MS = "20";


github-actions · 2026-06-26T02:26:36Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions · 2026-06-26T02:26:49Z

✅ Test Quality Sentinel completed test quality analysis.

TQS analysis already completed in this workflow run (run-id 28213036855): comment posted and APPROVE review submitted for PR #41572 (score 90/100, 4 design tests, 0 violations). Both add_comment and submit_pull_request_review limits were consumed in the prior agent session.

github-actions · 2026-06-26T02:26:56Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR does not have the 'implementation' label and has ≤100 new lines of code in business logic directories (default_business_additions=0, file_count=2).

github-actions · 2026-06-26T02:27:09Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-06-26T02:35:04Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 90/100 — Excellent

Analyzed 4 test(s) in actions/setup/js/copilot_sdk_driver.test.cjs: 4 design, 0 implementation, 0 guideline violation(s). Test inflation detected (281 test lines / 63 production lines ≈ 4.46:1); 10-point deduction applied.

📊 Metrics & Test Classification (4 tests analyzed)

Metric	Value
New/modified tests analyzed	4
✅ Design tests (behavioral contracts)	4 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	4 (100%)
Duplicate test clusters	0
Test inflation detected	YES — 281 test lines / 63 production lines ≈ 4.46:1 (threshold: 2:1)
🚨 Coding-guideline violations	0

Test	File	Classification	Issues Detected
`post-completion idle watchdog fires and treats session as completed`	`copilot_sdk_driver.test.cjs`	✅ Design	—
`post-completion watchdog does not fire when tool calls are still pending`	`copilot_sdk_driver.test.cjs`	✅ Design	—
`post-completion watchdog does not trigger when output not yet collected`	`copilot_sdk_driver.test.cjs`	✅ Design	—
`post-completion watchdog does not treat success as failure when sendAndWait resolves before timer fires`	`copilot_sdk_driver.test.cjs`	✅ Design	—

Go: 0 (*_test.go); JavaScript: 4 (*.test.cjs). No other languages detected.

Score components: behavioral_ratio=40/40, edge_case_ratio=30/30, duplication_penalty=0, inflation_penalty=10 → Score = 90

📝 Per-Test Analysis

post-completion idle watchdog fires and treats session as completed — ✅ Design test. Simulates the production regression (sendAndWait hanging after the final tool result) by having sendAndWait block until the watchdog calls session.disconnect(). Asserts exitCode=0, hasOutput=true, output content, and disconnect call count. High value — deleting this test would allow the core SDK hang regression to go undetected. Mocks target external Copilot SDK session objects (legitimate).

post-completion watchdog does not fire when tool calls are still pending — ✅ Design test. Tests the watchdog disarm invariant: a new tool.execution_start must disarm the watchdog even after a prior arm. Asserts disconnect called exactly once (finally cleanup, not watchdog). High value — deleting would allow premature disconnects mid-session.

post-completion watchdog does not trigger when output not yet collected — ✅ Design test. Verifies the guard condition: watchdog only arms when hasOutput=true. Tool calls completing before any assistant.message must not arm the watchdog. Asserts normal completion and single disconnect. High value — covers the guard precondition preventing false triggers on noisy early events.

post-completion watchdog does not treat success as failure when sendAndWait resolves before timer fires — ✅ Design test. Regression guard for the happy path: watchdog armed but unneeded must not cause double-disconnect or wrong exit codes. Asserts disconnect called exactly once. High value — ensures the watchdog does not degrade normal completions.

Mocking assessment: All vi.fn() calls target the external Copilot SDK session interface (disconnect, stop, sendAndWait, createSession, start). Mocking external SDK client objects is correct for testing driver-layer behavior. Call-count assertions (toHaveBeenCalledTimes(1), toHaveBeenCalled) verify behavioral side-effects (watchdog fires / does not fire), not internal function routing — appropriate use of mocks.

Inflation note: The 4.46:1 ratio exceeds the 2:1 threshold and incurs a 10-point deduction. The scaffolding cost is justified — each async timing test requires Promise-coordinated fake SDKs, event sequences, and env-var isolation. Verbosity is structural, not duplicative.

Verdict

✅ Check passed. 0% implementation tests (threshold: 30%). All 4 new tests verify observable behavioral contracts of the post-completion idle watchdog — covering the hang-fix scenario, watchdog disarm invariant, no-output guard, and happy-path non-interference. No coding-guideline violations detected.

🧪 Test quality analysis by Test Quality Sentinel · 95.5 AIC · ⌖ 16.9 AIC · ⊞ 8.4K · ◷

github-actions

✅ Test Quality Sentinel: 90/100. Test quality is excellent — 0% of new tests are implementation tests (threshold: 30%). All 4 new tests verify behavioral contracts of the post-completion idle watchdog.

github-actions

Skills-Based Review 🧠

Applied /diagnose, /tdd, /zoom-out, and /grill-with-docs — commenting with minor-to-medium observations. No blocking issues; the fix is correct and well-tested.

📋 Key Themes & Highlights

Key Themes

Test spec precision: One assertion uses .toHaveBeenCalled() where .toHaveBeenCalledTimes(2) is stated in the comment — a small but exploitable gap in the regression spec.
Env var naming: GH_AW_SDK_IDLE_MS is ambiguous alongside the existing SDK_IDLE_TIMEOUT_PATTERN path; GH_AW_SDK_POST_COMPLETION_IDLE_MS would reduce operator confusion.
Test boilerplate: The env var save/restore pattern is copy-pasted 4× — a beforeEach/afterEach block would tighten this.
Comment accuracy: The "race condition guard" label and one test name do not reflect the actual mechanics.
Sliding-window semantics: The watchdog re-arms on every qualifying event; worth documenting whether this is intentional (it probably is) so future maintainers don't change it inadvertently.

Positive Highlights

✅ Correct placement of the watchdog in the event handler and solid finally-block cleanup — no timer leaks.
✅ The !catastrophicToolDenialsError guard in the catch path correctly preserves the existing denial-failure semantics.
✅ 4 targeted regression tests that each isolate a distinct state transition — arm, disarm, no-premature-arm, and normal-resolution.
✅ Exported SDK_POST_COMPLETION_IDLE_MS_DEFAULT constant makes the default visible and testable without reading the source.
✅ Clear PR description with a worked timeline example that makes the fix easy to validate.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 74.2 AIC · ⌖ 11.4 AIC · ⊞ 6.5K

github-actions · 2026-06-26T02:35:09Z

+        expect(result.hasOutput).toBe(true);
+        expect(result.output).toContain("Issue filed successfully");
+        // disconnect is called twice: once by the watchdog and once in finally.
+        expect(disconnectWithSignal).toHaveBeenCalled();


[/tdd] The comment says disconnect is called twice (watchdog + finally), but the assertion only checks .toHaveBeenCalled() — it does not verify the exact count.

This leaves the spec ambiguous: if the watchdog somehow called disconnect three times, the test would still pass. Pin the count to match the stated expectation.

💡 Suggested fix

// disconnect is called twice: once by the watchdog and once in finally. expect(disconnectWithSignal).toHaveBeenCalledTimes(2);

Using toHaveBeenCalledTimes(2) turns the comment into a machine-checked specification, preventing a future over-disconnect from going undetected.

github-actions · 2026-06-26T02:35:09Z

+        postCompletionWatchdog = setTimeout(() => {
+          postCompletionWatchdog = null;
+          // Re-check conditions at fire time: a new tool call could have started
+          // between arming the watchdog and the timer firing (race condition guard).


[/diagnose] The phrase "race condition guard" is inaccurate in a single-threaded JavaScript runtime — there is no concurrent mutation here.

The guard is catching stale state at timer-fire time: between when the watchdog was armed and when the macrotask fires, new events could have changed hasOutput or pendingToolCalls. "Stale-state guard" is the accurate description, and makes the intent clearer to future readers.

💡 Suggested comment

// Stale-state guard: events processed between arming the watchdog and the // timer firing could have changed hasOutput / pendingToolCalls. if (!hasOutput || pendingToolCalls.size !== 0 || !session) return;

github-actions · 2026-06-26T02:35:09Z

+  // block treats the result as a successful completion.  This bounds the damage
+  // from the SDK driver bug where sendAndWait never resolves after the final
+  // tool result is returned.
+  const postCompletionIdleMs = getEnvPositiveIntOrDefault("GH_AW_SDK_IDLE_MS", SDK_POST_COMPLETION_IDLE_MS_DEFAULT);


[/grill-with-docs] GH_AW_SDK_IDLE_MS is ambiguous alongside the existing SDK_IDLE_TIMEOUT_PATTERN mechanism — an operator could easily conflate the two and misconfigure the wrong timeout.

Consider renaming to GH_AW_SDK_POST_COMPLETION_IDLE_MS to match the exported constant name (SDK_POST_COMPLETION_IDLE_MS_DEFAULT) and to self-document the specific hang scenario it addresses.

💡 Context

The existing idle-timeout path is triggered by the SDK emitting a Timeout after Nms waiting for session.idle error — a different failure mode. A new operator reading the env var docs would have no way to distinguish which timeout GH_AW_SDK_IDLE_MS controls without reading the source.

github-actions · 2026-06-26T02:35:09Z

+      }
+    });
+
+    it("post-completion watchdog does not fire when tool calls are still pending", async () => {


[/tdd] The test name "does not fire when tool calls are still pending" doesn't match the actual scenario.

The scenario emits an assistant.message first (which arms the watchdog), then starts a new tool call (which disarms it), completes it (which re-arms it), and finally resolves sendAndWait normally before the timer fires. The scenario being tested is watchdog disarmed on new tool start, re-armed on completion, cleared on normal resolution.

A more accurate name would help future readers understand the contract at a glance, e.g.: "post-completion watchdog is disarmed by a new tool call and cleared on normal sendAndWait resolution".

github-actions · 2026-06-26T02:35:09Z

+
+      const prevIdleMs = process.env.GH_AW_SDK_IDLE_MS;
+      // Use a very short idle timeout so the watchdog fires quickly in tests.
+      process.env.GH_AW_SDK_IDLE_MS = "20";


[/tdd] The GH_AW_SDK_IDLE_MS save/restore pattern is copy-pasted across all 4 new tests. A beforeEach/afterEach pair would eliminate the repetition and make it impossible to forget cleanup in one test.

💡 Suggested refactor

let prevSdkIdleMs; beforeEach(() => { prevSdkIdleMs = process.env.GH_AW_SDK_IDLE_MS; process.env.GH_AW_SDK_IDLE_MS = "20"; }); afterEach(() => { if (prevSdkIdleMs === undefined) delete process.env.GH_AW_SDK_IDLE_MS; else process.env.GH_AW_SDK_IDLE_MS = prevSdkIdleMs; });

Place these inside the relevant describe block so they scope only to the watchdog tests.

github-actions · 2026-06-26T02:35:09Z

+          postCompletionWatchdog = null;
+          // Re-check conditions at fire time: a new tool call could have started
+          // between arming the watchdog and the timer firing (race condition guard).
+          if (!hasOutput || pendingToolCalls.size !== 0 || !session) return;


[/diagnose] The !session guard is always false at this point — the watchdog can only be armed inside session.on(...), which is called after session is assigned. The guard is dead code.

Consider removing it to keep the condition accurate, or replace it with a comment explaining why it was considered (and why it was safe to drop).

github-actions · 2026-06-26T02:35:09Z

+      //   tool call was just started, or no output yet).
+      // The watchdog fires only if sendAndWait never resolves on its own after
+      // the final tool result is returned — the common SDK post-completion hang.
+      if (hasOutput && pendingToolCalls.size === 0) {


[/zoom-out] The watchdog uses a sliding-window strategy: it re-arms on every event that satisfies hasOutput && pendingToolCalls.size === 0, not just on the initial transition into that state.

This means if the SDK keeps emitting qualifying events (e.g., repeated assistant.message chunks) after the final tool call, the window keeps resetting and the watchdog never fires — potentially defeating its purpose for a chatty SDK.

If the intent is "N ms of silence after the final tool result", the behaviour is correct and just needs a clearer comment. If the intent is "fire N ms after first entering the complete state", the logic needs a one-shot arm:

💡 One-shot arm alternative

// Arm once when transitioning into the post-completion state. if (hasOutput && pendingToolCalls.size === 0 && !postCompletionWatchdog) { postCompletionWatchdog = setTimeout(() => { ... }, postCompletionIdleMs); } // Disarm when no longer complete. if (!(hasOutput && pendingToolCalls.size === 0) && postCompletionWatchdog) { clearTimeout(postCompletionWatchdog); postCompletionWatchdog = null; }

The PR description says the SDK "fails to resolve after the final tool result", which implies silence after that point — so sliding-window is likely fine. Just worth making the choice explicit in a comment.

github-actions

REQUEST_CHANGES — Two issues that need fixing before merge; two additional cleanup items.

🔴 Blocking issues

1. Double-disconnect can override the successful return (HIGH)

The watchdog fires void session.disconnect() and the finally block unconditionally calls await session.disconnect() again. If the SDK's disconnect() rejects on a second call, the finally-block rejection propagates to the caller instead of the return { exitCode: 0 } from the catch block. In JavaScript, an exception thrown from finally overrides any return or throw in try/catch. The feature works correctly only if the SDK happens to be idempotent on disconnect — an assumption that is not documented or tested.

See inline comment on line 418 for a concrete fix.

2. Watchdog re-arms on any SDK event — defeats the timeout on keepalive-emitting sessions (MEDIUM)

The watchdog is rearmed (clearTimeout + fresh setTimeout) on every event where hasOutput && pendingToolCalls.size === 0. In the exact hung-session scenario this watchdog is designed to fix, if the SDK emits any periodic keepalive or status event while sendAndWait is stuck, the idle window resets indefinitely. The watchdog will never fire.

See inline comment on line 331 for two alternative designs.

🟡 Cleanup items (non-blocking)

Test assertion too weak (line 409, test file): toHaveBeenCalled() should be toHaveBeenCalledTimes(2) since the comment explicitly says disconnect fires twice.
Dead !session guard (line 335, session file): session is a const that is never nulled; the guard is unreachable.

🔎 Code quality review by PR Code Quality Reviewer · 127.1 AIC · ⌖ 8.86 AIC · ⊞ 5.2K

Comments that could not be inline-anchored

actions/setup/js/copilot_sdk_session.cjs:418

Double-disconnect: watchdog and finally both call session.disconnect() — if the SDK is not idempotent, the finally-block exception overrides the successful catch-block return.

<details>
<summary>💡 Details and suggested fix</summary>

When the watchdog fires (line 338), it calls void session.disconnect() (fire-and-forget). The finally block then unconditionally calls await session.disconnect() a second time. In JavaScript, a finally block always runs even after a return i…

actions/setup/js/copilot_sdk_session.cjs:331

Watchdog re-arms on every SDK event — SDK keepalives or status messages emitted while sendAndWait is hung will prevent the watchdog from ever firing.

<details>
<summary>💡 Details</summary>

The watchdog is rearmed (clearTimeout + fresh setTimeout) on every SDK event where hasOutput && pendingToolCalls.size === 0 — not only on tool or message events. This implements an idle timer: "no events for N ms."

The problem is the hang scenario this watchdog is designed to fix: `sendAnd…

actions/setup/js/copilot_sdk_driver.test.cjs:409

Assertion is weaker than the comment claims — toHaveBeenCalled() does not verify the expected two-call behavior.

<details>
<summary>💡 Suggested fix</summary>

The inline comment on the previous line says:

// disconnect is called twice: once by the watchdog and once in finally.
expect(disconnectWithSignal).toHaveBeenCalled();

toHaveBeenCalled() passes with one or more calls. If the double-disconnect is intentional behavior (and it is — see line 418 in `copilot_sdk_sessi…

actions/setup/js/copilot_sdk_session.cjs:335

!session guard is dead code — session is a const that is never set to null.

<details>
<summary>💡 Details</summary>

session is assigned once via const session = await client.createSession(...) and is never reassigned or nulled anywhere in runWithCopilotSDK. The !session branch in the watchdog callback guard:

if (!hasOutput || pendingToolCalls.size !== 0 || !session) return;

...will never be true. This is confusing to future readers who might wonder what lifec…

Do

pelikhan · 2026-06-26T02:51:47Z

@copilot Run pr-finisher skill.

…val in flaky test Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot · 2026-06-26T03:02:23Z

@copilot Run pr-finisher skill.

Done. Addressed the two open review threads:

Removed the unused disconnect mock from the "watchdog fires" test.
Increased GH_AW_SDK_IDLE_MS from 20 to 500 in the "resolves before timer fires" test so normal completion always finishes before the watchdog on any CI runner.

Local validation passes. Commit: fix(tests): remove unused disconnect mock and increase watchdog interval in flaky test

github-actions · 2026-06-26T03:47:07Z

✅ All tools validated successfully! Agent Container Smoke Test confirms agent container is ready.

github-actions · 2026-06-26T03:47:10Z

🚀 Smoke Pi MISSION COMPLETE! Pi delivered. 🥧

Smoke test completed; created issue. PR auto-comment unsupported in current event context, label skipped because tests failed.

github-actions · 2026-06-26T03:47:16Z

🚀 Smoke Gemini MISSION COMPLETE! Gemini has spoken. ✨

github-actions · 2026-06-26T03:47:19Z

🚀 Smoke Antigravity MISSION COMPLETE! Antigravity has spoken. ✨

github-actions · 2026-06-26T03:47:39Z

🎬 THE END — Smoke Claude MISSION: ACCOMPLISHED! The hero saves the day! ✨

github-actions · 2026-06-26T03:47:51Z

📰 BREAKING: Smoke Copilot is now investigating this pull request. Sources say the story is developing...

github-actions · 2026-06-26T03:47:54Z

📰 BREAKING: Smoke Copilot - AOAI (apikey) is now investigating this pull request. Sources say the story is developing...

github-actions · 2026-06-26T03:48:02Z

📰 BREAKING: Smoke Copilot - AOAI (Entra) is now investigating this pull request. Sources say the story is developing...

github-actions · 2026-06-26T03:51:10Z

Agent Container Tool Check

Tool	Status	Version
bash	✅	5.2.21
sh	✅	available
git	✅	2.54.0
jq	✅	1.7
yq	✅	4.53.3
curl	✅	8.5.0
gh	✅	2.95.0
node	✅	22.23.0
python3	✅	3.11.15 (PyPy 7.3.23)
go	✅	1.24.13
java	✅	21.0.11
dotnet	✅	10.0.301

Result: 12/12 tools available ✅

Overall Status: PASS

🔧 Tool validation by Agent Container Smoke Test · 17.4 AIC · ⌖ 6.3 AIC · ⊞ 5.4K · ◷

github-actions · 2026-06-26T03:52:19Z

Smoke Test Results

GitHub MCP: ✅
Web Fetch: ✅
File Writing: ✅
Bash Tool: ✅
Build gh-aw: ❌

Overall Status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

✨ Smoke Gemini — Powered by Gemini · ◷

github-actions · 2026-06-26T03:55:00Z

Smoke Test: Codex - 28215703546
Merged PRs: #41578, #41575
Results: ✅ GitHub PR reads, ✅ Serena, ✅ Playwright, ❌ web-fetch unavailable, ✅ temp file, ✅ build, ✅ comment-memory, ✅ cache memory, ⚪ issue field skipped
Overall: FAIL

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

accounts.google.com
android.clients.google.com
clients2.google.com
contentautofill.googleapis.com
safebrowsingohttpgateway.googleapis.com
www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex · 4.04 AIC · ⌖ 4.49 AIC · ⊞ 12.5K · ◷

github-actions · 2026-06-26T03:55:01Z

Comment Memory

Quiet terminal
Tools whisper through the smoke test
Git waits, caches hum

Note

This comment is managed by comment memory.

It stores persistent context for this thread in the code block at the top of this comment.
Edit only the text inside the backtick fences; workflow metadata and the footer are regenerated automatically.

Learn more about comment memory

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

accounts.google.com
android.clients.google.com
clients2.google.com
contentautofill.googleapis.com
safebrowsingohttpgateway.googleapis.com
www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex · 4.04 AIC · ⌖ 4.49 AIC · ⊞ 12.5K · ◷

github-actions · 2026-06-26T03:55:56Z

💥 Smoke Test: Claude — Run 28215703568

Core #1-12: 1✅ 2✅ 3✅ 4✅ 5❌(Tavily) 6✅ 7✅ 8✅ 9✅ 10✅ 11✅ 12✅
PR Review #13-19: 13✅ 14✅ 15✅ 16✅ 17✅ 18✅ 19⚠️(skipped)

Overall: PARTIAL — Tavily MCP registered 0 tools.

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

accounts.google.com
android.clients.google.com
clients2.google.com
contentautofill.googleapis.com
safebrowsingohttpgateway.googleapis.com
www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · 67.9 AIC · ⌖ 25.5 AIC · ⊞ 8.7K · ◷

github-actions

💥 Automated smoke test review - all systems nominal!

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

accounts.google.com
android.clients.google.com
clients2.google.com
contentautofill.googleapis.com
safebrowsingohttpgateway.googleapis.com
www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

💥 [THE END] — Illustrated by Smoke Claude · 67.9 AIC · ⌖ 25.5 AIC · ⊞ 8.7K

github-actions · 2026-06-26T03:55:58Z

+// Default idle period for the post-completion watchdog: 5 minutes.
+// When the agent has produced output and all tracked tool calls have completed,
+// the driver arms a watchdog timer.  If no new SDK events arrive within this
+// window, the driver force-disconnects the session and treats it as a successful


Consider documenting the default timeout value here for maintainability.

github-actions · 2026-06-26T03:55:58Z

+  // (default 5 minutes), the watchdog force-disconnects the session and the catch
+  // block treats the result as a successful completion.  This bounds the damage
+  // from the SDK driver bug where sendAndWait never resolves after the final
+  // tool result is returned.


Nice watchdog guard — a brief comment on the race-condition rationale would help future readers.

github-actions · 2026-06-26T03:57:06Z

Comment Memory

Stone bot tap the keys
Green sparks jump on GitHub page
Smoke run done by dawn

Note

This comment is managed by comment memory.

It stores persistent context for this thread in the code block at the top of this comment.
Edit only the text inside the backtick fences; workflow metadata and the footer are regenerated automatically.

Learn more about comment memory

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

accounts.google.com
android.clients.google.com
clients2.google.com
contentautofill.googleapis.com
safebrowsingohttpgateway.googleapis.com
www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

📰 BREAKING: Report filed by Smoke Copilot · 362.2 AIC · ⌖ 24.2 AIC · ⊞ 18.9K · ◷

github-actions

Smoke review done. Two inline taps.

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

accounts.google.com
android.clients.google.com
clients2.google.com
contentautofill.googleapis.com
safebrowsingohttpgateway.googleapis.com
www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

📰 BREAKING: Report filed by Smoke Copilot · 362.2 AIC · ⌖ 24.2 AIC · ⊞ 18.9K

github-actions · 2026-06-26T04:08:13Z

Smoke tests: ✅1, ✅2, ❌3, ❌4, ✅5, ✅6, ✅7, ✅8. Overall: FAIL. @github-actions

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

accounts.google.com
android.clients.google.com
clients2.google.com
contentautofill.googleapis.com
safebrowsingohttpgateway.googleapis.com
www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

📰 BREAKING: Report filed by Smoke Copilot - AOAI (apikey) · 295.7 AIC · ⌖ 6.44 AIC · ⊞ 17.9K · ◷

github-actions

Warning

Firewall blocked 6 domains

The following domains were blocked by the firewall during workflow execution:

accounts.google.com
android.clients.google.com
clients2.google.com
contentautofill.googleapis.com
safebrowsingohttpgateway.googleapis.com
www.google.com

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "accounts.google.com"
    - "android.clients.google.com"
    - "clients2.google.com"
    - "contentautofill.googleapis.com"
    - "safebrowsingohttpgateway.googleapis.com"
    - "www.google.com"

See Network Configuration for more information.

📰 BREAKING: Report filed by Smoke Copilot - AOAI (apikey) · 295.7 AIC · ⌖ 6.44 AIC · ⊞ 17.9K

Initial plan

6d82129

Copilot AI assigned Copilot and pelikhan Jun 26, 2026

Copilot started work on behalf of pelikhan June 26, 2026 01:37 View session

Copilot AI and others added 2 commits June 26, 2026 01:51

fix: address code review — log watchdog disconnect errors, comment ra…

7045e88

…ce-condition guard Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix Copilot CLI process hanging after safe-output~~ fix(copilot-sdk): post-completion idle watchdog to bound SDK hang after final tool result Jun 26, 2026

Copilot AI requested a review from pelikhan June 26, 2026 01:53

Copilot finished work on behalf of pelikhan June 26, 2026 01:53

Copilot started work on behalf of pelikhan June 26, 2026 01:57 View session

Copilot finished work on behalf of pelikhan June 26, 2026 02:15

pelikhan marked this pull request as ready for review June 26, 2026 02:21

Copilot AI review requested due to automatic review settings June 26, 2026 02:21

Copilot started reviewing on behalf of pelikhan June 26, 2026 02:21 View session

Copilot AI reviewed Jun 26, 2026

View reviewed changes

github-actions Bot approved these changes Jun 26, 2026

View reviewed changes

github-actions Bot reviewed Jun 26, 2026

View reviewed changes

github-actions Bot previously requested changes Jun 26, 2026

View reviewed changes

Copilot started work on behalf of pelikhan June 26, 2026 02:52 View session

fix(tests): remove unused disconnect mock and increase watchdog inter…

c302ba6

…val in flaky test Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions Bot temporarily deployed to aoai-model June 26, 2026 03:48 Inactive

github-actions Bot mentioned this pull request Jun 26, 2026

[aw] Smoke Antigravity produced no safe outputs #41514

Closed

github-actions Bot mentioned this pull request Jun 26, 2026

Smoke Test: Claude - 28215703568 #41590

Closed

github-actions Bot reviewed Jun 26, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 26, 2026

[aw] Smoke Codex is missing required data #41591

Closed

pelikhan merged commit d5a2522 into main Jun 26, 2026
183 checks passed

pelikhan deleted the copilot/fix-cli-process-hang-issue branch June 26, 2026 03:56

github-actions Bot reviewed Jun 26, 2026

View reviewed changes

github-actions Bot temporarily deployed to aoai-model June 26, 2026 03:57 Inactive

github-actions Bot temporarily deployed to aoai-model June 26, 2026 03:59 Inactive

github-actions Bot mentioned this pull request Jun 26, 2026

Smoke Test: Copilot - AOAI (Entra) - 28215816300 #41594

Closed

github-actions Bot temporarily deployed to aoai-model June 26, 2026 03:59 Inactive

github-actions Bot reviewed Jun 26, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 26, 2026

[code-simplifier] simplify: remove redundant branching and align wildcard guard #41602

Open

Uh oh!

Conversation

Copilot AI commented Jun 26, 2026 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

copilot_sdk_session.cjs — post-completion idle watchdog

copilot_sdk_driver.test.cjs — 4 new regression tests

Uh oh!

pelikhan commented Jun 26, 2026

Uh oh!

Copilot AI commented Jun 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Review details

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026

🧪 Test Quality Sentinel Report

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

1. Double-disconnect can override the successful return (HIGH)

2. Watchdog re-arms on any SDK event — defeats the timeout on keepalive-emitting sessions (MEDIUM)

Comments that could not be inline-anchored

Uh oh!

pelikhan commented Jun 26, 2026

Uh oh!

Copilot AI commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI commented Jun 26, 2026 •

edited by github-actions Bot

Loading

`copilot_sdk_session.cjs` — post-completion idle watchdog

`copilot_sdk_driver.test.cjs` — 4 new regression tests

github-actions Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading

github-actions Bot commented Jun 26, 2026 •

edited

Loading