Skip to content

Normalize Pi threat-detection models before Copilot fallback#41545

Merged
pelikhan merged 3 commits into
mainfrom
copilot/investigate-copilot-cli-detection-job-failure
Jun 25, 2026
Merged

Normalize Pi threat-detection models before Copilot fallback#41545
pelikhan merged 3 commits into
mainfrom
copilot/investigate-copilot-cli-detection-job-failure

Conversation

Copilot AI commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Issue Monster’s detection job was passing the Pi-style model identifier copilot/gpt-5.4 into the Copilot CLI fallback path. That string is valid for Pi engine config, but Copilot detection expects the bare model id so the API proxy can resolve it correctly.

  • Root cause

    • Threat detection normalizes pi to the copilot detector, but it preserved the original Pi provider/model string in COPILOT_MODEL.
    • This caused Copilot detection to receive copilot/gpt-5.4 instead of gpt-5.4, which surfaced as an unsupported/retired model error in the failing job.
  • Change

    • Preserve the original engine identity during threat-detection engine resolution.
    • When detection falls back from pi to copilot, strip the provider prefix from the configured model before passing it to the Copilot detector.
    • Leave existing behavior unchanged for native Copilot detection and already-bare model ids.
  • Regression coverage

    • Add coverage for Pi workflows using copilot/gpt-5.4 so the detection path now emits a Copilot-compatible model id.
    • Document the engine-precedence path to make the Pi→Copilot normalization explicit.

Example of the normalization applied in the fallback path:

if engineSetting == "copilot" && originalEngineID == "pi" {
	detectionEngineConfig.Model = extractPiModelID(detectionEngineConfig.Model)
}

This turns:

engine:
  id: pi
  model: copilot/gpt-5.4

into an effective Copilot detection model of:

COPILOT_MODEL: gpt-5.4

Copilot AI and others added 3 commits June 25, 2026 21:18
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title Fix Pi threat detection model fallback for Copilot Normalize Pi threat-detection models before Copilot fallback Jun 25, 2026
Copilot AI requested a review from pelikhan June 25, 2026 21:28
@pelikhan pelikhan marked this pull request as ready for review June 25, 2026 22:37
Copilot AI review requested due to automatic review settings June 25, 2026 22:37
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #41545 does not have the 'implementation' label and has ≤100 new lines of code in business logic directories (33 additions across 2 files).

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel failed during test quality analysis.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request fixes a model-identifier mismatch in threat detection when workflows configure the Pi engine but threat detection normalizes Pi → Copilot. In that fallback case, Pi-style provider/model strings (e.g. copilot/gpt-5.4) are now converted to the bare model ID (gpt-5.4) before being passed to the Copilot CLI, preventing “unsupported/retired model” errors.

Changes:

  • Preserve the original configured engine identity (before Pi normalization) so the compiler can detect when the Copilot detector is being used as a Pi fallback.
  • When the effective detection engine is Copilot but the original engine was Pi, strip the provider prefix from the configured model via extractPiModelID.
  • Add a regression test ensuring copilot/gpt-5.4 becomes gpt-5.4 in the detection job’s COPILOT_MODEL.
Show a summary per file
File Description
pkg/workflow/threat_detection_inline_engine.go Tracks original engine ID and normalizes Pi-style provider/model to a Copilot-compatible model ID in the Pi→Copilot detection fallback path.
pkg/workflow/threat_detection_test.go Adds regression coverage asserting the detection steps emit COPILOT_MODEL: gpt-5.4 for a Pi workflow configured with copilot/gpt-5.4.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 0

@github-actions github-actions Bot mentioned this pull request Jun 25, 2026
@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

⚠️ Test Quality Score: 70/100 — Acceptable

Analyzed 1 test (new table row in a table-driven function): 1 design, 0 implementation, 0 guideline violations.

📊 Metrics & Test Classification (1 test analyzed)
Metric Value
New/modified tests analyzed 1
✅ Design tests (behavioral contracts) 1 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 0 (0%)
Duplicate test clusters 0
Test inflation detected No (ratio 0.83 — 15 test lines / 18 production lines)
🚨 Coding-guideline violations 0
Test File Classification Issues Detected
TestCopilotDetectionDefaultModel — row: "pi engine threat detection normalizes provider-scoped model for copilot fallback" pkg/workflow/threat_detection_test.go:1269 ✅ Design No error/edge-case row

Go: 1 (*_test.go); JavaScript: 0. Other languages detected but not scored.

⚠️ Flagged Tests — Requires Review (1 observation)

TestCopilotDetectionDefaultModel — new row (pkg/workflow/threat_detection_test.go:1272) — ⚠️ Happy-path only: the new table row confirms that copilot/gpt-5.4 is normalised to gpt-5.4, but does not cover the complementary cases exercised by the production code:

  • A bare model name (e.g. gpt-4) passed from a Pi engine — extractPiModelID is expected to leave it unchanged; a row asserting expectedModel: "gpt-4" would guard against accidental stripping.
  • An empty model string — the production path falls back to the default env-var expression; a row with Model: "" would confirm the fallback still fires.

Suggested additions (2 new table rows in the same test function):

{
    name: "pi engine with bare model name leaves model unchanged",
    data: &WorkflowData{
        AI: "pi",
        EngineConfig: &EngineConfig{ID: "pi", Model: "gpt-4"},
        SafeOutputs:  &SafeOutputsConfig{ThreatDetection: &ThreatDetectionConfig{}},
    },
    shouldContainModel: true,
    expectedModel:      "gpt-4",
},
{
    name: "pi engine with empty model falls back to default env var",
    data: &WorkflowData{
        AI: "pi",
        EngineConfig: &EngineConfig{ID: "pi"},
        SafeOutputs:  &SafeOutputsConfig{ThreatDetection: &ThreatDetectionConfig{}},
    },
    shouldContainModel: true,
    expectedModel: "${{ vars.GH_AW_DEFAULT_MODEL_DETECTION_COPILOT || vars.GH_AW_DEFAULT_MODEL_COPILOT || 'claude-sonnet-4.6' }}",
},

These are observations, not blockers — the existing row already guards the primary regression path.

Verdict

Check passed. 0% implementation tests (threshold: 30%). The single new table row tests an observable behavioral contract (generated step contains the normalised model ID). Score is 70/100 because no error or boundary paths are covered by the new row.

🧪 Test quality analysis by Test Quality Sentinel · 49.7 AIC · ⌖ 23.8 AIC · ⊞ 8.4K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 70/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%).

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /diagnose and /tdd — two targeted observations, no blocking issues.

📋 Key Themes & Highlights

Key Themes

  • Precedence divergence: originalEngineID resolution doesn't quite mirror getThreatDetectionEngineIDdata.EngineConfig.ID always overrides data.AI in the new code, while the existing helper only uses it as a fallback when data.AI is empty. Harmless for the current bug (both fields are "pi"), but the comment claiming they match is inaccurate and could mislead a future reader.
  • Coverage gap on ThreatDetection.EngineConfig path: The third originalEngineID branch (explicit threat-detection.engine.id: pi) has no regression test. The explicit-override path should be as hardened as the top-level path.

Positive Highlights

  • ✅ Clean reuse of extractPiModelID — no new logic for stripping the prefix
  • ✅ Backward-compatible: bare/empty model IDs pass through unchanged
  • ✅ Good PR description with root-cause analysis and a concrete before/after example
  • ✅ Regression test added for the exact failing scenario
  • ✅ Normalization is correctly placed after APITarget inheritance and after default-model resolution, so ordering is sound

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 53.6 AIC · ⌖ 7.07 AIC · ⊞ 6.5K

// the legacy top-level AI field.
originalEngineID := data.AI
if data.EngineConfig != nil && data.EngineConfig.ID != "" {
originalEngineID = data.EngineConfig.ID

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The data.EngineConfig.ID override here lacks the originalEngineID == "" guard present in getThreatDetectionEngineID, so the two precedence rules diverge.

In getThreatDetectionEngineID (threat_detection_external.go:124), data.AI wins if non-empty, with data.EngineConfig.ID used only as a fallback. Here data.EngineConfig.ID always overrides data.AI — the inverse. The comment above claims precedence matches runtime resolution, but it doesn't.

For the reported bug both fields are "pi" so the inconsistency is invisible, but it becomes a latent correctness hazard if the two fields ever diverge.

💡 Suggested fix

Add the same empty-check guard used in getThreatDetectionEngineID:

originalEngineID := data.AI
if originalEngineID == "" && data.EngineConfig != nil && data.EngineConfig.ID != "" {
	originalEngineID = data.EngineConfig.ID
}

This brings the precedence in line with the comment and with the existing engine-resolution logic.

shouldContainModel: true,
expectedModel: "gpt-5.4",
},
{

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The third branch of originalEngineID resolution — where ThreatDetection.EngineConfig.ID is set explicitly to "pi" — has no test coverage.

A user who writes threat-detection: { engine: { id: pi, model: "copilot/gpt-5.4" } } in frontmatter would exercise this path (hasThreatDetectionEngineConfig && data.SafeOutputs.ThreatDetection.EngineConfig.ID != ""), but no test currently validates it.

💡 Suggested test case to add after this one
{
    name: "pi engine via explicit threat-detection engine config normalizes provider-scoped model",
    data: &WorkflowData{
        AI: "claude",
        SafeOutputs: &SafeOutputsConfig{
            ThreatDetection: &ThreatDetectionConfig{
                EngineConfig: &EngineConfig{
                    ID:    "pi",
                    Model: "copilot/gpt-5.4",
                },
            },
        },
    },
    shouldContainModel: true,
    expectedModel:      "gpt-5.4",
},

This ensures the explicit threat-detection.engine.id: pi path is as robust as the top-level engine path.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two non-blocking observations on the new originalEngineID logic. The primary fix (stripping the provider/ prefix before handing the model to Copilot CLI) is correct for all realistic Pi workflows.

### Findings

Precedence inversion vs getThreatDetectionEngineID (line 42–43)

The new code unconditionally overrides data.AI with data.EngineConfig.ID. getThreatDetectionEngineID does the opposite: data.AI wins and data.EngineConfig.ID is a fallback only when data.AI is empty. In the edge case where the two fields disagree (e.g. a workflow authored with both ai: pi and a conflicting engine.id), originalEngineID resolves to the wrong value and the normalization guard never fires. The inline comment also documents the wrong precedence, which risks cargo-culting it deeper.

Test gap: TD-specific engine config path not covered (test line 1285)

The new test sets both data.AI and data.EngineConfig.ID to "pi". The third branch of originalEngineID — where SafeOutputs.ThreatDetection.EngineConfig.ID = "pi" triggers the normalization — has no dedicated test case.

🔎 Code quality review by PR Code Quality Reviewer · 88.4 AIC · ⌖ 9.37 AIC · ⊞ 5.2K

// the legacy top-level AI field.
originalEngineID := data.AI
if data.EngineConfig != nil && data.EngineConfig.ID != "" {
originalEngineID = data.EngineConfig.ID

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precedence mismatch vs getThreatDetectionEngineID: this assignment makes data.EngineConfig.ID override data.AI, but getThreatDetectionEngineID (in threat_detection_external.go) uses data.AI as primary and data.EngineConfig.ID only as a fallback when data.AI is empty — the inverse priority. If the two fields diverge, originalEngineID resolves incorrectly and the pi→copilot model normalization silently skips.

💡 Details and suggested fix

getThreatDetectionEngineID (non-TD-config path):

engineID = data.AI                          // primary
if engineID == "" && data.EngineConfig != nil && data.EngineConfig.ID != "" {
    engineID = data.EngineConfig.ID         // fallback only when data.AI is empty
}

But the new code at line 42–43 does:

originalEngineID := data.AI
if data.EngineConfig != nil && data.EngineConfig.ID != "" {
    originalEngineID = data.EngineConfig.ID // unconditional override — opposite priority
}

The comment on lines 37–40 claims this matches runtime resolution (EngineConfig.ID > data.AI), but that is not what getThreatDetectionEngineID implements. One of these is wrong, and the comment risks cargo-culting the incorrect precedence into future code.

Impact: if data.AI = "pi" and data.EngineConfig.ID is any non-"pi" value, getThreatDetectionEngineID returns "copilot" via pi normalization, but originalEngineID resolves to the non-"pi" EngineConfig value. The guard engineSetting == "copilot" && originalEngineID == "pi" never fires, so copilot/gpt-5.4 leaks to the Copilot CLI unchanged.

Fix: mirror getThreatDetectionEngineID's fallback-only semantics:

originalEngineID := data.AI
if originalEngineID == "" && data.EngineConfig != nil && data.EngineConfig.ID != "" {
    originalEngineID = data.EngineConfig.ID // fallback only when data.AI is empty
}

Also update the comment on line 37 to match actual behaviour.

},
},
shouldContainModel: true,
expectedModel: "gpt-5.4",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test only covers the aligned AI/EngineConfig.ID = "pi" case: the new case sets both data.AI and data.EngineConfig.ID to "pi", so neither the second nor third branch of originalEngineID (lines 45–46) is meaningfully exercised in isolation. A regression in the threat-detection-specific engine config path would be invisible.

💡 Suggested additional test case

A case where normalization is driven by a threat-detection-specific engine config (exercises the originalEngineID = data.SafeOutputs.ThreatDetection.EngineConfig.ID branch):

{
    name: "pi threat-detection engine config normalizes provider-scoped model for copilot fallback",
    data: &WorkflowData{
        AI: "copilot", // main engine is copilot; TD override is pi
        SafeOutputs: &SafeOutputsConfig{
            ThreatDetection: &ThreatDetectionConfig{
                EngineConfig: &EngineConfig{
                    ID:    "pi",
                    Model: "copilot/gpt-5.4",
                },
            },
        },
    },
    shouldContainModel: true,
    expectedModel:      "gpt-5.4",
},

This ensures the third branch of originalEngineID (line 46) and the normalization block both fire when pi is specified as a TD-specific engine override rather than the main engine.

@pelikhan pelikhan merged commit 6236de8 into main Jun 25, 2026
84 of 96 checks passed
@pelikhan pelikhan deleted the copilot/investigate-copilot-cli-detection-job-failure branch June 25, 2026 23:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants