Normalize Pi threat-detection models before Copilot fallback by Copilot · Pull Request #41545 · github/gh-aw

Copilot · 2026-06-25T21:28:10Z

Issue Monster’s detection job was passing the Pi-style model identifier copilot/gpt-5.4 into the Copilot CLI fallback path. That string is valid for Pi engine config, but Copilot detection expects the bare model id so the API proxy can resolve it correctly.

Root cause
- Threat detection normalizes pi to the copilot detector, but it preserved the original Pi provider/model string in COPILOT_MODEL.
- This caused Copilot detection to receive copilot/gpt-5.4 instead of gpt-5.4, which surfaced as an unsupported/retired model error in the failing job.
Change
- Preserve the original engine identity during threat-detection engine resolution.
- When detection falls back from pi to copilot, strip the provider prefix from the configured model before passing it to the Copilot detector.
- Leave existing behavior unchanged for native Copilot detection and already-bare model ids.
Regression coverage
- Add coverage for Pi workflows using copilot/gpt-5.4 so the detection path now emits a Copilot-compatible model id.
- Document the engine-precedence path to make the Pi→Copilot normalization explicit.

Example of the normalization applied in the fallback path:

if engineSetting == "copilot" && originalEngineID == "pi" {
	detectionEngineConfig.Model = extractPiModelID(detectionEngineConfig.Model)
}

This turns:

engine:
  id: pi
  model: copilot/gpt-5.4

into an effective Copilot detection model of:

COPILOT_MODEL: gpt-5.4

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

github-actions · 2026-06-25T22:38:40Z

✅ Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #41545 does not have the 'implementation' label and has ≤100 new lines of code in business logic directories (33 additions across 2 files).

github-actions · 2026-06-25T22:38:45Z

❌ Test Quality Sentinel failed during test quality analysis.

github-actions · 2026-06-25T22:38:45Z

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

github-actions · 2026-06-25T22:38:56Z

✅ PR Code Quality Reviewer completed the code quality review.

Copilot

Pull request overview

This pull request fixes a model-identifier mismatch in threat detection when workflows configure the Pi engine but threat detection normalizes Pi → Copilot. In that fallback case, Pi-style provider/model strings (e.g. copilot/gpt-5.4) are now converted to the bare model ID (gpt-5.4) before being passed to the Copilot CLI, preventing “unsupported/retired model” errors.

Changes:

Preserve the original configured engine identity (before Pi normalization) so the compiler can detect when the Copilot detector is being used as a Pi fallback.
When the effective detection engine is Copilot but the original engine was Pi, strip the provider prefix from the configured model via extractPiModelID.
Add a regression test ensuring copilot/gpt-5.4 becomes gpt-5.4 in the detection job’s COPILOT_MODEL.

Show a summary per file

File	Description
pkg/workflow/threat_detection_inline_engine.go	Tracks original engine ID and normalizes Pi-style `provider/model` to a Copilot-compatible model ID in the Pi→Copilot detection fallback path.
pkg/workflow/threat_detection_test.go	Adds regression coverage asserting the detection steps emit `COPILOT_MODEL: gpt-5.4` for a Pi workflow configured with `copilot/gpt-5.4`.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 2/2 changed files
Comments generated: 0

github-actions · 2026-06-25T22:45:45Z

🧪 Test Quality Sentinel Report

⚠️ Test Quality Score: 70/100 — Acceptable

Analyzed 1 test (new table row in a table-driven function): 1 design, 0 implementation, 0 guideline violations.

📊 Metrics & Test Classification (1 test analyzed)

Metric	Value
New/modified tests analyzed	1
✅ Design tests (behavioral contracts)	1 (100%)
⚠️ Implementation tests (low value)	0 (0%)
Tests with error/edge cases	0 (0%)
Duplicate test clusters	0
Test inflation detected	No (ratio 0.83 — 15 test lines / 18 production lines)
🚨 Coding-guideline violations	0

Test	File	Classification	Issues Detected
`TestCopilotDetectionDefaultModel` — row: "pi engine threat detection normalizes provider-scoped model for copilot fallback"	`pkg/workflow/threat_detection_test.go:1269`	✅ Design	No error/edge-case row

Go: 1 (*_test.go); JavaScript: 0. Other languages detected but not scored.

⚠️ Flagged Tests — Requires Review (1 observation)

TestCopilotDetectionDefaultModel — new row (pkg/workflow/threat_detection_test.go:1272) — ⚠️ Happy-path only: the new table row confirms that copilot/gpt-5.4 is normalised to gpt-5.4, but does not cover the complementary cases exercised by the production code:

A bare model name (e.g. gpt-4) passed from a Pi engine — extractPiModelID is expected to leave it unchanged; a row asserting expectedModel: "gpt-4" would guard against accidental stripping.
An empty model string — the production path falls back to the default env-var expression; a row with Model: "" would confirm the fallback still fires.

Suggested additions (2 new table rows in the same test function):

{
    name: "pi engine with bare model name leaves model unchanged",
    data: &WorkflowData{
        AI: "pi",
        EngineConfig: &EngineConfig{ID: "pi", Model: "gpt-4"},
        SafeOutputs:  &SafeOutputsConfig{ThreatDetection: &ThreatDetectionConfig{}},
    },
    shouldContainModel: true,
    expectedModel:      "gpt-4",
},
{
    name: "pi engine with empty model falls back to default env var",
    data: &WorkflowData{
        AI: "pi",
        EngineConfig: &EngineConfig{ID: "pi"},
        SafeOutputs:  &SafeOutputsConfig{ThreatDetection: &ThreatDetectionConfig{}},
    },
    shouldContainModel: true,
    expectedModel: "${{ vars.GH_AW_DEFAULT_MODEL_DETECTION_COPILOT || vars.GH_AW_DEFAULT_MODEL_COPILOT || 'claude-sonnet-4.6' }}",
},

These are observations, not blockers — the existing row already guards the primary regression path.

Verdict

✅ Check passed. 0% implementation tests (threshold: 30%). The single new table row tests an observable behavioral contract (generated step contains the normalised model ID). Score is 70/100 because no error or boundary paths are covered by the new row.

🧪 Test quality analysis by Test Quality Sentinel · 49.7 AIC · ⌖ 23.8 AIC · ⊞ 8.4K · ◷

github-actions

✅ Test Quality Sentinel: 70/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%).

github-actions

Skills-Based Review 🧠

Applied /diagnose and /tdd — two targeted observations, no blocking issues.

📋 Key Themes & Highlights

Key Themes

Precedence divergence: originalEngineID resolution doesn't quite mirror getThreatDetectionEngineID — data.EngineConfig.ID always overrides data.AI in the new code, while the existing helper only uses it as a fallback when data.AI is empty. Harmless for the current bug (both fields are "pi"), but the comment claiming they match is inaccurate and could mislead a future reader.
Coverage gap on ThreatDetection.EngineConfig path: The third originalEngineID branch (explicit threat-detection.engine.id: pi) has no regression test. The explicit-override path should be as hardened as the top-level path.

Positive Highlights

✅ Clean reuse of extractPiModelID — no new logic for stripping the prefix
✅ Backward-compatible: bare/empty model IDs pass through unchanged
✅ Good PR description with root-cause analysis and a concrete before/after example
✅ Regression test added for the exact failing scenario
✅ Normalization is correctly placed after APITarget inheritance and after default-model resolution, so ordering is sound

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 53.6 AIC · ⌖ 7.07 AIC · ⊞ 6.5K

github-actions · 2026-06-25T22:46:15Z

+	// the legacy top-level AI field.
+	originalEngineID := data.AI
+	if data.EngineConfig != nil && data.EngineConfig.ID != "" {
+		originalEngineID = data.EngineConfig.ID


[/diagnose] The data.EngineConfig.ID override here lacks the originalEngineID == "" guard present in getThreatDetectionEngineID, so the two precedence rules diverge.

In getThreatDetectionEngineID (threat_detection_external.go:124), data.AI wins if non-empty, with data.EngineConfig.ID used only as a fallback. Here data.EngineConfig.ID always overrides data.AI — the inverse. The comment above claims precedence matches runtime resolution, but it doesn't.

For the reported bug both fields are "pi" so the inconsistency is invisible, but it becomes a latent correctness hazard if the two fields ever diverge.

💡 Suggested fix

Add the same empty-check guard used in getThreatDetectionEngineID:

originalEngineID := data.AI if originalEngineID == "" && data.EngineConfig != nil && data.EngineConfig.ID != "" { originalEngineID = data.EngineConfig.ID }

This brings the precedence in line with the comment and with the existing engine-resolution logic.

github-actions · 2026-06-25T22:46:15Z

+			shouldContainModel: true,
+			expectedModel:      "gpt-5.4",
+		},
 		{


[/tdd] The third branch of originalEngineID resolution — where ThreatDetection.EngineConfig.ID is set explicitly to "pi" — has no test coverage.

A user who writes threat-detection: { engine: { id: pi, model: "copilot/gpt-5.4" } } in frontmatter would exercise this path (hasThreatDetectionEngineConfig && data.SafeOutputs.ThreatDetection.EngineConfig.ID != ""), but no test currently validates it.

💡 Suggested test case to add after this one

{ name: "pi engine via explicit threat-detection engine config normalizes provider-scoped model", data: &WorkflowData{ AI: "claude", SafeOutputs: &SafeOutputsConfig{ ThreatDetection: &ThreatDetectionConfig{ EngineConfig: &EngineConfig{ ID: "pi", Model: "copilot/gpt-5.4", }, }, }, }, shouldContainModel: true, expectedModel: "gpt-5.4", },

This ensures the explicit threat-detection.engine.id: pi path is as robust as the top-level engine path.

github-actions

Two non-blocking observations on the new originalEngineID logic. The primary fix (stripping the provider/ prefix before handing the model to Copilot CLI) is correct for all realistic Pi workflows.

### Findings

Precedence inversion vs `getThreatDetectionEngineID` (line 42–43)

The new code unconditionally overrides data.AI with data.EngineConfig.ID. getThreatDetectionEngineID does the opposite: data.AI wins and data.EngineConfig.ID is a fallback only when data.AI is empty. In the edge case where the two fields disagree (e.g. a workflow authored with both ai: pi and a conflicting engine.id), originalEngineID resolves to the wrong value and the normalization guard never fires. The inline comment also documents the wrong precedence, which risks cargo-culting it deeper.

Test gap: TD-specific engine config path not covered (test line 1285)

The new test sets both data.AI and data.EngineConfig.ID to "pi". The third branch of originalEngineID — where SafeOutputs.ThreatDetection.EngineConfig.ID = "pi" triggers the normalization — has no dedicated test case.

🔎 Code quality review by PR Code Quality Reviewer · 88.4 AIC · ⌖ 9.37 AIC · ⊞ 5.2K

github-actions · 2026-06-25T22:48:53Z

+	// the legacy top-level AI field.
+	originalEngineID := data.AI
+	if data.EngineConfig != nil && data.EngineConfig.ID != "" {
+		originalEngineID = data.EngineConfig.ID


Precedence mismatch vs getThreatDetectionEngineID: this assignment makes data.EngineConfig.ID override data.AI, but getThreatDetectionEngineID (in threat_detection_external.go) uses data.AI as primary and data.EngineConfig.ID only as a fallback when data.AI is empty — the inverse priority. If the two fields diverge, originalEngineID resolves incorrectly and the pi→copilot model normalization silently skips.

💡 Details and suggested fix

getThreatDetectionEngineID (non-TD-config path):

engineID = data.AI // primary if engineID == "" && data.EngineConfig != nil && data.EngineConfig.ID != "" { engineID = data.EngineConfig.ID // fallback only when data.AI is empty }

But the new code at line 42–43 does:

originalEngineID := data.AI if data.EngineConfig != nil && data.EngineConfig.ID != "" { originalEngineID = data.EngineConfig.ID // unconditional override — opposite priority }

The comment on lines 37–40 claims this matches runtime resolution (EngineConfig.ID > data.AI), but that is not what getThreatDetectionEngineID implements. One of these is wrong, and the comment risks cargo-culting the incorrect precedence into future code.

Impact: if data.AI = "pi" and data.EngineConfig.ID is any non-"pi" value, getThreatDetectionEngineID returns "copilot" via pi normalization, but originalEngineID resolves to the non-"pi" EngineConfig value. The guard engineSetting == "copilot" && originalEngineID == "pi" never fires, so copilot/gpt-5.4 leaks to the Copilot CLI unchanged.

Fix: mirror getThreatDetectionEngineID's fallback-only semantics:

originalEngineID := data.AI if originalEngineID == "" && data.EngineConfig != nil && data.EngineConfig.ID != "" { originalEngineID = data.EngineConfig.ID // fallback only when data.AI is empty }

Also update the comment on line 37 to match actual behaviour.

github-actions · 2026-06-25T22:48:53Z

+				},
+			},
+			shouldContainModel: true,
+			expectedModel:      "gpt-5.4",


Test only covers the aligned AI/EngineConfig.ID = "pi" case: the new case sets both data.AI and data.EngineConfig.ID to "pi", so neither the second nor third branch of originalEngineID (lines 45–46) is meaningfully exercised in isolation. A regression in the threat-detection-specific engine config path would be invisible.

💡 Suggested additional test case

A case where normalization is driven by a threat-detection-specific engine config (exercises the originalEngineID = data.SafeOutputs.ThreatDetection.EngineConfig.ID branch):

{ name: "pi threat-detection engine config normalizes provider-scoped model for copilot fallback", data: &WorkflowData{ AI: "copilot", // main engine is copilot; TD override is pi SafeOutputs: &SafeOutputsConfig{ ThreatDetection: &ThreatDetectionConfig{ EngineConfig: &EngineConfig{ ID: "pi", Model: "copilot/gpt-5.4", }, }, }, }, shouldContainModel: true, expectedModel: "gpt-5.4", },

This ensures the third branch of originalEngineID (line 46) and the normalization block both fire when pi is specified as a TD-specific engine override rather than the main engine.

Copilot AI and others added 3 commits June 25, 2026 21:18

Normalize Pi detection model for Copilot

81daf32

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Fix Pi threat detection model fallback

042f501

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Document Pi detection fallback precedence

79256e6

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>

Copilot AI assigned Copilot and pelikhan Jun 25, 2026

Copilot AI changed the title ~~Fix Pi threat detection model fallback for Copilot~~ Normalize Pi threat-detection models before Copilot fallback Jun 25, 2026

Copilot finished work on behalf of pelikhan June 25, 2026 21:28

Copilot AI requested a review from pelikhan June 25, 2026 21:28

Copilot created this pull request from a session on behalf of pelikhan June 25, 2026 21:28 View session

pelikhan marked this pull request as ready for review June 25, 2026 22:37

Copilot AI review requested due to automatic review settings June 25, 2026 22:37

Copilot started reviewing on behalf of pelikhan June 25, 2026 22:37 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

github-actions Bot mentioned this pull request Jun 25, 2026

[aw] No-Op Runs #39849

Open

github-actions Bot approved these changes Jun 25, 2026

View reviewed changes

github-actions Bot reviewed Jun 25, 2026

View reviewed changes

pelikhan merged commit 6236de8 into main Jun 25, 2026
84 of 96 checks passed

pelikhan deleted the copilot/investigate-copilot-cli-detection-job-failure branch June 25, 2026 23:32

This was referenced Jun 26, 2026

Smoke Test: Pi - 28209735681 #41562

Closed

Smoke Test: Claude - 28209735611 #41565

Closed

Uh oh!

Conversation

Copilot AI commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

github-actions Bot commented Jun 25, 2026

🧪 Test Quality Sentinel Report

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Skills-Based Review 🧠

Key Themes

Positive Highlights

Uh oh!

github-actions Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Precedence inversion vs getThreatDetectionEngineID (line 42–43)

Test gap: TD-specific engine config path not covered (test line 1285)

Uh oh!

github-actions Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jun 25, 2026 •

edited

Loading

github-actions Bot commented Jun 25, 2026 •

edited

Loading

github-actions Bot commented Jun 25, 2026 •

edited

Loading

github-actions Bot commented Jun 25, 2026 •

edited

Loading

github-actions Bot commented Jun 25, 2026 •

edited

Loading

Precedence inversion vs `getThreatDetectionEngineID` (line 42–43)