Keep detection in warn mode on parser/agent failures so non-reviewable safe outputs are blocked by Copilot · Pull Request #41547 · github/gh-aw

Copilot · 2026-06-25T21:37:44Z

In the referenced Issue Monster run, threat detection execution failed, but detection conclusion was escalated to failure instead of staying in warn mode. That prevented warn-mode safe-output policy from activating, so assign_to_agent proceeded when it should have been blocked and only cautionary commenting should have remained.

Detection conclusion semantics
- Updated actions/setup/js/parse_threat_detection_results.cjs so GH_AW_DETECTION_CONTINUE_ON_ERROR=true always yields conclusion=warning for detection-side failures (agent_failure / parse_error), instead of conditionally escalating to failure.
Safe-output policy alignment
- This restores expected coupling with safe-output handling: non-reviewable outputs (including assign_to_agent) are gated by warn-mode threat policy, while comments retain the threat caution path.
Behavioral expectation update
- Updated actions/setup/js/parse_threat_detection_results.test.cjs to reflect warn-mode outcome for detection execution failure.

// before: warn mode could still escalate to failure
if (isWarnMode && !mustFail) {
  conclusion = "warning";
} else {
  conclusion = "failure";
}

// after: warn mode is consistently warning
if (isWarnMode) {
  conclusion = "warning";
} else {
  conclusion = "failure";
}

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions · 2026-06-25T22:38:23Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #41547 does not have the 'implementation' label and has 0 new lines of code in business logic directories (threshold: 100).

github-actions · 2026-06-25T22:38:23Z

✅ PR Code Quality Reviewer completed the code quality review.

github-actions · 2026-06-25T22:38:23Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-06-25T22:38:35Z

✅ Test Quality Sentinel completed test quality analysis.

Test Quality Sentinel analysis for PR #41547 was already completed in the prior invocation of this workflow run (28204877295). Score: 90/100 APPROVE. Comment and review were already submitted successfully.

Copilot

Pull request overview

Adjusts threat-detection result parsing so that “warn mode” (GH_AW_DETECTION_CONTINUE_ON_ERROR=true) consistently produces a warning conclusion for detection-side failures, preserving downstream safe-output gating behavior (blocking non-reviewable outputs while avoiding hard failures).

Changes:

Updated setDetectionFailure() to always emit conclusion=warning in warn mode (removing the prior conditional escalation to failure for agent_failure / parse_error).
Updated the corresponding main() test to expect warn-mode behavior for detection execution failures (no setFailed()).

Show a summary per file

File	Description
actions/setup/js/parse_threat_detection_results.cjs	Ensures warn mode always outputs `conclusion=warning` (and does not fail the job) for detection-side failures.
actions/setup/js/parse_threat_detection_results.test.cjs	Updates assertions to match warn-mode semantics for detection execution failure.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 0

github-actions

Skills-Based Review 🧠

Applied /diagnose and /tdd — approving with two non-blocking suggestions.

📋 Key Themes & Highlights

Key Themes

Dead variable (/diagnose): detectionExecutionOutcome is still read from env but no longer used in any conditional — only in a diagnostic log. A comment or inlining would prevent future confusion.
Missing symmetric test (/tdd): There is no test for GH_AW_DETECTION_CONTINUE_ON_ERROR=false + DETECTION_AGENTIC_EXECUTION_OUTCOME=failure, so the "strict mode stays strict" invariant has no explicit guard after this change.

Positive Highlights

✅ Minimal and correct fix — removing mustFail is the right call; the isWarnMode flag is now the sole gatekeeper, which matches the documented contract of continueOnError.
✅ Test rename is accurate — the old name "should fail even in warn mode" described the buggy behaviour; the new name reads as a spec.
✅ Clean PR description — the before/after code snippet makes the intent immediately clear.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 52.7 AIC · ⌖ 9.66 AIC · ⊞ 6.5K

github-actions · 2026-06-25T22:45:25Z

    });

-    it("should fail when detection execution failed even in warn mode", async () => {
+    it("should warn when detection execution failed in warn mode", async () => {


[/tdd] The updated test covers warn mode + execution failure → warning, but there is no symmetric test for error mode (GH_AW_DETECTION_CONTINUE_ON_ERROR=false) combined with DETECTION_AGENTIC_EXECUTION_OUTCOME=failure. Adding it would document that the fix is bounded to warn mode and that strict mode still produces failure.

💡 Suggested test to add

it("should fail when detection execution failed in error mode", async () => { process.env.GH_AW_DETECTION_CONTINUE_ON_ERROR = "false"; process.env.DETECTION_AGENTIC_EXECUTION_OUTCOME = "failure"; mockExistsSync.mockReturnValue(false); await mod.main(); expect(mockCore.setOutput).toHaveBeenCalledWith("conclusion", "failure"); expect(mockCore.setOutput).toHaveBeenCalledWith("success", "false"); expect(mockCore.setOutput).toHaveBeenCalledWith("reason", "agent_failure"); expect(mockCore.exportVariable).toHaveBeenCalledWith("GH_AW_DETECTION_CONCLUSION", "failure"); expect(mockCore.setFailed).toHaveBeenCalledWith(expect.stringContaining("Detection log file not found")); });

This pins that DETECTION_AGENTIC_EXECUTION_OUTCOME no longer overrides warn mode and confirms strict mode is unchanged — a regression guard for both sides of the logic.

github-actions · 2026-06-25T22:46:13Z

🧪 Test Quality Sentinel Report

✅ Test Quality Score: 90/100 — Excellent

Analyzed 1 test (1 design, 0 implementation, 0 guideline violations). The modified test correctly tracks the behavioral change: agent/parser failures now stay in warn mode instead of escalating to a hard failure.

📊 Metrics & Test Classification (1 test analyzed)

Metric	Value
New/modified tests analyzed	1
✅ Design tests (behavioral contracts)	1 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	1 (100%)
Duplicate test clusters	0
Test inflation detected	Yes (borderline — see note)
🚨 Coding-guideline violations	0

Test	File	Classification	Issues Detected
`should warn when detection execution failed in warn mode`	`actions/setup/js/parse_threat_detection_results.test.cjs:830`	✅ Design	Borderline inflation (see note)

Go: 0 (*_test.go); JavaScript: 1 (*.test.cjs).

Inflation note: Test file shows 4 lines added vs 1 line added in the production file (4:1 ratio, threshold 2:1). However, this is test maintenance — the production change was primarily a deletion of a special-case guard (-3 lines), and the test updates 4 assertions to match the new behavior. Not genuine test padding; the -10 pt inflation penalty is applied mechanically but does not represent a quality concern.

Verdict

✅ Check passed. 0% implementation tests (threshold: 30%). The single modified test enforces the behavioral contract that a DETECTION_AGENTIC_EXECUTION_OUTCOME=failure outcome in warn mode (continueOnError=true) must produce conclusion=warning and must NOT call core.setFailed(). Mocking is limited to legitimate external I/O targets (core, fs.existsSync). No guideline violations detected.

🧪 Test quality analysis by Test Quality Sentinel · 65.5 AIC · ⌖ 13 AIC · ⊞ 8.4K · ◷

github-actions

✅ Test Quality Sentinel: 90/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%).

github-actions

Correct fix. Removing mustFail makes isWarnMode the single authoritative gate for setDetectionFailure, which is the right design — the old escalation-to-failure was bypassing warn-mode safe-output policy rather than honoring it. Test update accurately reflects the new behavior; reason=agent_failure + success=false still distinguish scanner

🔎 Code quality review by PR Code Quality Reviewer · 64.6 AIC · ⌖ 7.11 AIC · ⊞ 5.2K

github-actions · 2026-06-25T22:46:24Z

@@ -453,7 +453,6 @@ async function main() {
  const runDetection = process.env.RUN_DETECTION;
  const continueOnError = process.env.GH_AW_DETECTION_CONTINUE_ON_ERROR !== "false";
  const detectionExecutionOutcome = process.env.DETECTION_AGENTIC_EXECUTION_OUTCOME || "";


detectionExecutionOutcome is now read but has no behavioral effect — it only appears in a core.info log statement after this change.

💡 Suggested clarification

Without a comment, a future reader may wonder whether detectionExecutionFailed was accidentally dropped, or whether this variable can be removed entirely. A short note prevents that confusion:

// retained for diagnostic logging only; no longer gates conclusion (see setDetectionFailure) const detectionExecutionOutcome = process.env.DETECTION_AGENTIC_EXECUTION_OUTCOME || "";

Downstream consumers can still distinguish the failure type via reason=agent_failure and success=false outputs, so nothing is lost behaviorally — this is purely a readability concern.

pelikhan · 2026-06-25T23:30:45Z

@copilot run pr-finisher skill

Adjust detection parse outcome for warn mode

2bc4144

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI assigned Copilot and pelikhan Jun 25, 2026

Copilot AI changed the title ~~Fix warn-mode threat detection conclusion to block assign-to-agent~~ Keep detection in warn mode on parser/agent failures so non-reviewable safe outputs are blocked Jun 25, 2026

Copilot AI requested a review from pelikhan June 25, 2026 21:38

Copilot created this pull request from a session on behalf of pelikhan June 25, 2026 21:38 View session

Copilot finished work on behalf of pelikhan June 25, 2026 21:38

pelikhan marked this pull request as ready for review June 25, 2026 22:37

Copilot AI review requested due to automatic review settings June 25, 2026 22:37

Copilot started reviewing on behalf of pelikhan June 25, 2026 22:37 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 25, 2026

[aw] No-Op Runs #39849

Open

github-actions Bot approved these changes Jun 25, 2026

View reviewed changes

Copilot started work on behalf of pelikhan June 25, 2026 23:31 View session

pelikhan merged commit 7144868 into main Jun 25, 2026
61 of 70 checks passed

pelikhan deleted the copilot/investigate-safeoutputs-issue branch June 25, 2026 23:33

Copilot stopped work on behalf of pelikhan due to an error June 25, 2026 23:34
The session was cancelled by the user.

This was referenced Jun 26, 2026

Smoke Test: Pi - 28209735681 #41562

Closed

Smoke Test: Claude - 28209735611 #41565

Closed

[gh-aw-upgrade] gh-aw upgrade available: v0.80.9 → v0.81.4 elastic/ai-github-actions#1429

Open

Uh oh!

Conversation

Copilot AI commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

Uh oh!

github-actions Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 25, 2026

🧪 Test Quality Sentinel Report

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

pelikhan commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 25, 2026 •

edited

Loading

github-actions Bot commented Jun 25, 2026 •

edited

Loading

github-actions Bot commented Jun 25, 2026 •

edited

Loading

github-actions Bot commented Jun 25, 2026 •

edited

Loading

github-actions Bot commented Jun 25, 2026 •

edited

Loading