You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copilot-CLI agent jobs that successfully emit their required safe-outputs are still concluded failure (process exit code 1).
The copilot-harness classifies the attempt as failureClass=permission_denied with hasNumerousPermissionDenied=true (permissionDeniedCount>=5), treats it as a terminal "missing tool/permission" issue, does not retry, and exits 1.
The permission-denial counter increments on every disallowed Bash command form the agent attempts, including optional/exploratory variants — even after the real work is done. The classifier never consults whether expected safe-outputs were produced, so a fully successful task is reported as a red run (false-red), polluting CI signal and masking true success.
Affected workflows and run IDs
Confirmed (exact signature verified from run artifacts):
PR Description Updater — §28222667341 (07:00 UTC). Log: permissionDeniedCount=5, hasNumerousPermissionDenied=true, agent output PR #41608 description updated successfully, missing_tool emitted, not retrying (classified as missing tool/permission issue), Process completed with exit code 1.
Test Quality Sentinel — §28221623302 (06:35 UTC). safeoutputs.jsonl contains a completed add_comment (full Test Quality Report → PR Update gh aw update --org to support workflow-targeted updates and repo prefiltering #41617) and a submit_pull_request_review (APPROVE). Both succeeded; the run then emitted missing_tool with reason: "missing tool/permission issue: numerous permission denied errors detected" and exited 1. Run did real work: 794k tokens, 21 turns, 12m0s.
Test Quality Sentinel — §28218968236 (05:22 UTC), §28215990234 (03:52 UTC). TQS failed 3× in the window while interleaved successful runs (06:03, 04:54, 04:38) prove the workflow itself is healthy.
Same engine, root cause unconfirmed (may differ): Changeset Generator 28215703557, Go Logger Enhancement 28217478494, Daily AstroStyleLite Markdown Spellcheck 28217439569.
Evidence
From 28221623302 artifacts (safeoutputs.jsonl), the missing_tool record's alternatives field shows the agent looping on denied Bash command forms before the terminal classification fired:
with open('/tmp/gh-aw/agent/payload.json','w') as f: ...
These are exploratory plumbing attempts; the agent had already emitted valid add_comment + submit_pull_request_review items. The denials were on extra/optional invocations, yet they tripped the >=5 terminal threshold.
Probable root cause
Classification ignores the success signal.copilot-harness derives a terminal permission_denied verdict purely from permissionDeniedCount, without checking whether outputs.jsonl already contains valid safe-output items. A run that produced its required output is reported as failed.
safeoutputs CLI ergonomics. The agent reaches for python3 -c JSON-encode pipelines and stdin-redirection forms that the Bash allowlist blocks, inflating permissionDeniedCount even though a simpler allowed safeoutputs <tool> --param value invocation exists.
Proposed remediation
In copilot-harness failure classification, suppress the hasNumerousPermissionDenied terminal verdict and exit 0 when the run produced ≥1 expected safe-output item in outputs.jsonl. Permission-denials should then yield at most a warning + missing_tool record, not a red run.
Optionally: only count denials toward the threshold for commands the agent was required to run (not optional/exploratory variants), or make the threshold configurable.
Reduce denial generation: prominently document the exact allowed safeoutputs <tool> --param value form, and/or widen the Bash allowlist to cover the common safeoutputs <tool> . < file.json and small python3 -c JSON-encode forms the agent reaches for.
Success criteria / verification
A re-run of PR Description Updater / Test Quality Sentinel that emits its safe-outputs concludes success (green), even when permission-denials occurred.
permissionDeniedCount no longer forces exit 1 when outputs.jsonl is non-empty.
Scheduled Test Quality Sentinel false-red rate for this signature drops to ~0.
Existing-issue correlation
Distinct from all open agentic-workflows issues: #41195 (BYOK 403 — genuine provider-auth failure, no work produced), #41455 (firewall/DNS startup), #41456 (patch-parser under-detection), #41355 (workflow_call permissions), #41293 (Copilot Python SDK ModuleNotFound). None covers the "succeeds-then-marked-failed via permission-denied count" signature.
Other observations this window (not filed — insufficient/ambiguous evidence)
Copilot CLI false-red — runs marked
failure(exit 1) after safe-outputs already succeeded, via "numerous permission denied" terminal classificationInvestigated window: 6h ending 2026-06-26 08:04 UTC. Dominant recurring, untracked failure signature.
Problem statement
failure(process exit code 1).copilot-harnessclassifies the attempt asfailureClass=permission_deniedwithhasNumerousPermissionDenied=true(permissionDeniedCount>=5), treats it as a terminal "missing tool/permission" issue, does not retry, and exits 1.Affected workflows and run IDs
Confirmed (exact signature verified from run artifacts):
permissionDeniedCount=5,hasNumerousPermissionDenied=true, agent outputPR #41608 description updated successfully,missing_tool emitted,not retrying (classified as missing tool/permission issue),Process completed with exit code 1.safeoutputs.jsonlcontains a completedadd_comment(full Test Quality Report → PR Updategh aw update --orgto support workflow-targeted updates and repo prefiltering #41617) and asubmit_pull_request_review(APPROVE). Both succeeded; the run then emittedmissing_toolwithreason: "missing tool/permission issue: numerous permission denied errors detected"and exited 1. Run did real work: 794k tokens, 21 turns, 12m0s.Recurrence (same workflow + engine, same window,
conclusion=failure; signature strongly suspected):Same engine, root cause unconfirmed (may differ): Changeset Generator
28215703557, Go Logger Enhancement28217478494, Daily AstroStyleLite Markdown Spellcheck28217439569.Evidence
From
28221623302artifacts (safeoutputs.jsonl), themissing_toolrecord'salternativesfield shows the agent looping on denied Bash command forms before the terminal classification fired:python3 -c "print(json.dumps(...))"pipelinessafeoutputs add_comment . < /tmp/gh-aw/agent/payload.jsonstdin redirectionwith open('/tmp/gh-aw/agent/payload.json','w') as f: ...These are exploratory plumbing attempts; the agent had already emitted valid
add_comment+submit_pull_request_reviewitems. The denials were on extra/optional invocations, yet they tripped the>=5terminal threshold.Probable root cause
copilot-harnessderives a terminalpermission_deniedverdict purely frompermissionDeniedCount, without checking whetheroutputs.jsonlalready contains valid safe-output items. A run that produced its required output is reported as failed.python3 -cJSON-encode pipelines and stdin-redirection forms that the Bash allowlist blocks, inflatingpermissionDeniedCounteven though a simpler allowedsafeoutputs <tool> --param valueinvocation exists.Proposed remediation
copilot-harnessfailure classification, suppress thehasNumerousPermissionDeniedterminal verdict and exit 0 when the run produced ≥1 expected safe-output item inoutputs.jsonl. Permission-denials should then yield at most a warning +missing_toolrecord, not a red run.safeoutputs <tool> --param valueform, and/or widen the Bash allowlist to cover the commonsafeoutputs <tool> . < file.jsonand smallpython3 -cJSON-encode forms the agent reaches for.Success criteria / verification
success(green), even when permission-denials occurred.permissionDeniedCountno longer forces exit 1 whenoutputs.jsonlis non-empty.Existing-issue correlation
Distinct from all open
agentic-workflowsissues: #41195 (BYOK 403 — genuine provider-auth failure, no work produced), #41455 (firewall/DNS startup), #41456 (patch-parser under-detection), #41355 (workflow_callpermissions), #41293 (Copilot Python SDKModuleNotFound). None covers the "succeeds-then-marked-failed via permission-denied count" signature.Other observations this window (not filed — insufficient/ambiguous evidence)
28221753707failed at step Start MCP Gateway with 0 token usage (agent never invoked); raw gateway error not captured. Possibly related to [aw-failures] AWF firewall startup fails — getaddrinfo EAI_AGAIN awmg-cli-proxy, managed-gateway DNS not resolvable, agent never [Content truncated due to length] #41455 but a different step — insufficient evidence to file.28221799739— 0 token usage, no clear signature in available logs.28217567247— Copilot engine; consistent with existing [aw-failures] Copilot BYOK provider HTTP 403 (authentication_failed) discards full Code Simplifier run #41195.cancelled(not failures) and were excluded.References: