Skip to content

Keep detection in warn mode on parser/agent failures so non-reviewable safe outputs are blocked#41547

Merged
pelikhan merged 1 commit into
mainfrom
copilot/investigate-safeoutputs-issue
Jun 25, 2026
Merged

Keep detection in warn mode on parser/agent failures so non-reviewable safe outputs are blocked#41547
pelikhan merged 1 commit into
mainfrom
copilot/investigate-safeoutputs-issue

Conversation

Copilot AI commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

In the referenced Issue Monster run, threat detection execution failed, but detection conclusion was escalated to failure instead of staying in warn mode. That prevented warn-mode safe-output policy from activating, so assign_to_agent proceeded when it should have been blocked and only cautionary commenting should have remained.

  • Detection conclusion semantics

    • Updated actions/setup/js/parse_threat_detection_results.cjs so GH_AW_DETECTION_CONTINUE_ON_ERROR=true always yields conclusion=warning for detection-side failures (agent_failure / parse_error), instead of conditionally escalating to failure.
  • Safe-output policy alignment

    • This restores expected coupling with safe-output handling: non-reviewable outputs (including assign_to_agent) are gated by warn-mode threat policy, while comments retain the threat caution path.
  • Behavioral expectation update

    • Updated actions/setup/js/parse_threat_detection_results.test.cjs to reflect warn-mode outcome for detection execution failure.
// before: warn mode could still escalate to failure
if (isWarnMode && !mustFail) {
  conclusion = "warning";
} else {
  conclusion = "failure";
}

// after: warn mode is consistently warning
if (isWarnMode) {
  conclusion = "warning";
} else {
  conclusion = "failure";
}

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title Fix warn-mode threat detection conclusion to block assign-to-agent Keep detection in warn mode on parser/agent failures so non-reviewable safe outputs are blocked Jun 25, 2026
Copilot AI requested a review from pelikhan June 25, 2026 21:38
@pelikhan pelikhan marked this pull request as ready for review June 25, 2026 22:37
Copilot AI review requested due to automatic review settings June 25, 2026 22:37
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #41547 does not have the 'implementation' label and has 0 new lines of code in business logic directories (threshold: 100).

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

Test Quality Sentinel analysis for PR #41547 was already completed in the prior invocation of this workflow run (28204877295). Score: 90/100 APPROVE. Comment and review were already submitted successfully.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adjusts threat-detection result parsing so that “warn mode” (GH_AW_DETECTION_CONTINUE_ON_ERROR=true) consistently produces a warning conclusion for detection-side failures, preserving downstream safe-output gating behavior (blocking non-reviewable outputs while avoiding hard failures).

Changes:

  • Updated setDetectionFailure() to always emit conclusion=warning in warn mode (removing the prior conditional escalation to failure for agent_failure / parse_error).
  • Updated the corresponding main() test to expect warn-mode behavior for detection execution failures (no setFailed()).
Show a summary per file
File Description
actions/setup/js/parse_threat_detection_results.cjs Ensures warn mode always outputs conclusion=warning (and does not fail the job) for detection-side failures.
actions/setup/js/parse_threat_detection_results.test.cjs Updates assertions to match warn-mode semantics for detection execution failure.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

@github-actions github-actions Bot mentioned this pull request Jun 25, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — approving with two non-blocking suggestions.

📋 Key Themes & Highlights

Key Themes

  • Dead variable (/diagnose): detectionExecutionOutcome is still read from env but no longer used in any conditional — only in a diagnostic log. A comment or inlining would prevent future confusion.
  • Missing symmetric test (/tdd): There is no test for GH_AW_DETECTION_CONTINUE_ON_ERROR=false + DETECTION_AGENTIC_EXECUTION_OUTCOME=failure, so the "strict mode stays strict" invariant has no explicit guard after this change.

Positive Highlights

  • Minimal and correct fix — removing mustFail is the right call; the isWarnMode flag is now the sole gatekeeper, which matches the documented contract of continueOnError.
  • Test rename is accurate — the old name "should fail even in warn mode" described the buggy behaviour; the new name reads as a spec.
  • Clean PR description — the before/after code snippet makes the intent immediately clear.

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 52.7 AIC · ⌖ 9.66 AIC · ⊞ 6.5K

Comment thread actions/setup/js/parse_threat_detection_results.cjs
});

it("should fail when detection execution failed even in warn mode", async () => {
it("should warn when detection execution failed in warn mode", async () => {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The updated test covers warn mode + execution failure → warning, but there is no symmetric test for error mode (GH_AW_DETECTION_CONTINUE_ON_ERROR=false) combined with DETECTION_AGENTIC_EXECUTION_OUTCOME=failure. Adding it would document that the fix is bounded to warn mode and that strict mode still produces failure.

💡 Suggested test to add
it("should fail when detection execution failed in error mode", async () => {
  process.env.GH_AW_DETECTION_CONTINUE_ON_ERROR = "false";
  process.env.DETECTION_AGENTIC_EXECUTION_OUTCOME = "failure";
  mockExistsSync.mockReturnValue(false);

  await mod.main();

  expect(mockCore.setOutput).toHaveBeenCalledWith("conclusion", "failure");
  expect(mockCore.setOutput).toHaveBeenCalledWith("success", "false");
  expect(mockCore.setOutput).toHaveBeenCalledWith("reason", "agent_failure");
  expect(mockCore.exportVariable).toHaveBeenCalledWith("GH_AW_DETECTION_CONCLUSION", "failure");
  expect(mockCore.setFailed).toHaveBeenCalledWith(expect.stringContaining("Detection log file not found"));
});

This pins that DETECTION_AGENTIC_EXECUTION_OUTCOME no longer overrides warn mode and confirms strict mode is unchanged — a regression guard for both sides of the logic.

@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100 — Excellent

Analyzed 1 test (1 design, 0 implementation, 0 guideline violations). The modified test correctly tracks the behavioral change: agent/parser failures now stay in warn mode instead of escalating to a hard failure.

📊 Metrics & Test Classification (1 test analyzed)
Metric Value
New/modified tests analyzed 1
✅ Design tests (behavioral contracts) 1 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 1 (100%)
Duplicate test clusters 0
Test inflation detected Yes (borderline — see note)
🚨 Coding-guideline violations 0
Test File Classification Issues Detected
should warn when detection execution failed in warn mode actions/setup/js/parse_threat_detection_results.test.cjs:830 ✅ Design Borderline inflation (see note)

Go: 0 (*_test.go); JavaScript: 1 (*.test.cjs).

Inflation note: Test file shows 4 lines added vs 1 line added in the production file (4:1 ratio, threshold 2:1). However, this is test maintenance — the production change was primarily a deletion of a special-case guard (-3 lines), and the test updates 4 assertions to match the new behavior. Not genuine test padding; the -10 pt inflation penalty is applied mechanically but does not represent a quality concern.

Verdict

Check passed. 0% implementation tests (threshold: 30%). The single modified test enforces the behavioral contract that a DETECTION_AGENTIC_EXECUTION_OUTCOME=failure outcome in warn mode (continueOnError=true) must produce conclusion=warning and must NOT call core.setFailed(). Mocking is limited to legitimate external I/O targets (core, fs.existsSync). No guideline violations detected.

🧪 Test quality analysis by Test Quality Sentinel · 65.5 AIC · ⌖ 13 AIC · ⊞ 8.4K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 90/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%).

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct fix. Removing mustFail makes isWarnMode the single authoritative gate for setDetectionFailure, which is the right design — the old escalation-to-failure was bypassing warn-mode safe-output policy rather than honoring it. Test update accurately reflects the new behavior; reason=agent_failure + success=false still distinguish scanner

🔎 Code quality review by PR Code Quality Reviewer · 64.6 AIC · ⌖ 7.11 AIC · ⊞ 5.2K

@@ -453,7 +453,6 @@ async function main() {
const runDetection = process.env.RUN_DETECTION;
const continueOnError = process.env.GH_AW_DETECTION_CONTINUE_ON_ERROR !== "false";
const detectionExecutionOutcome = process.env.DETECTION_AGENTIC_EXECUTION_OUTCOME || "";

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detectionExecutionOutcome is now read but has no behavioral effect — it only appears in a core.info log statement after this change.

💡 Suggested clarification

Without a comment, a future reader may wonder whether detectionExecutionFailed was accidentally dropped, or whether this variable can be removed entirely. A short note prevents that confusion:

// retained for diagnostic logging only; no longer gates conclusion (see setDetectionFailure)
const detectionExecutionOutcome = process.env.DETECTION_AGENTIC_EXECUTION_OUTCOME || "";

Downstream consumers can still distinguish the failure type via reason=agent_failure and success=false outputs, so nothing is lost behaviorally — this is purely a readability concern.

@pelikhan

Copy link
Copy Markdown
Collaborator

@copilot run pr-finisher skill

@pelikhan pelikhan merged commit 7144868 into main Jun 25, 2026
61 of 70 checks passed
@pelikhan pelikhan deleted the copilot/investigate-safeoutputs-issue branch June 25, 2026 23:33
Copilot stopped work on behalf of pelikhan due to an error June 25, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants