Skip to content

Fix rootless firewall artifact permissions to prevent EACCES on upload#5546

Merged
lpcox merged 3 commits into
mainfrom
copilot/fix-firewall-logs-eaccess
Jun 25, 2026
Merged

Fix rootless firewall artifact permissions to prevent EACCES on upload#5546
lpcox merged 3 commits into
mainfrom
copilot/fix-firewall-logs-eaccess

Conversation

Copilot AI commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

In --network-isolation (sudo: false) mode, firewall artifacts could be written by container UIDs the runner cannot read, and AWF’s host-side chmod repair silently failed. This caused artifact upload failures despite successful agent execution.

  • Rootless artifact repair backstop (cleanup-time)

    • Added a rootless-only permission repair path in artifact preservation that runs a short-lived Docker helper container to repair ownership/permissions for artifact directories.
    • Uses daemon-side bind path translation via dockerHostPathPrefix for ARC/DinD compatibility.
    • Reuses AWF image selection inputs (imageRegistry/imageTag) and disables pulls (--pull never) to avoid introducing network dependency at cleanup.
  • Eliminate sidecar UID mismatch at source

    • api-proxy and cli-proxy services now run as "<host_uid>:<host_gid>" in compose generation.
    • Sidecar-created files are now runner-owned in the common rootless case, reducing reliance on post-hoc repairs.
  • Harden log file readability at write-time

    • Updated sidecar log stream creation to explicitly use mode 0644 for generated log files.
    • Applied in:
      • containers/cli-proxy/server.js
      • containers/api-proxy/{token-persistence.js,blocked-request-diagnostics.js,otel-exporters.js}
  • Restore observability for permission regressions

    • Promoted swallowed permission-fix failures from debug to warn where host-side chmod can fail under rootless execution.
    • Warn text now indicates rootless repair attempt context.
  • Cleanup plumbing

    • Extended cleanup/preservation call chain to pass through dockerHostPathPrefix, imageRegistry, and imageTag so permission repair behavior matches runtime environment/image config.
// rootless cleanup backstop (conceptual)
docker run --rm --pull never --network none \
  --cap-drop ALL --cap-add CHOWN --cap-add DAC_OVERRIDE --cap-add FOWNER \
  -e TUID=<runner_uid> -e TGID=<runner_gid> \
  -v <translated_artifact_dir>:/fix:rw <awf-agent-image> \
  sh -c 'chown -R "$TUID:$TGID" /fix && chmod -R a+rX /fix'

Copilot AI changed the title [WIP] Fix EACCES on artifact upload for rootless firewall logs Fix rootless firewall artifact permissions to prevent EACCES on upload Jun 25, 2026
Copilot finished work on behalf of lpcox June 25, 2026 22:03
Copilot AI requested a review from lpcox June 25, 2026 22:03
@lpcox lpcox marked this pull request as ready for review June 25, 2026 22:09
Copilot AI review requested due to automatic review settings June 25, 2026 22:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses artifact upload failures in rootless (--network-isolation / sudo: false) runs by ensuring firewall sidecar outputs (logs/artifacts) remain readable by the GitHub Actions runner user, and by adding a cleanup-time “backstop” repair path for permissions.

Changes:

  • Run api-proxy and cli-proxy services under the host UID/GID in generated Docker Compose.
  • Add a rootless-only cleanup-time permission repair that uses a short-lived Docker helper container with daemon-side path translation support (dockerHostPathPrefix) and image selection inputs (imageRegistry/imageTag).
  • Harden sidecar log file creation to use 0644, and plumb additional config through cleanup/preservation calls; update/add tests accordingly.
Show a summary per file
File Description
src/services/cli-proxy-service.ts Sets user: <host_uid>:<host_gid> for cli-proxy in compose generation.
src/services/cli-proxy-service.test.ts Asserts cli-proxy service includes the user field.
src/services/api-proxy-service-config.ts Sets user: <host_uid>:<host_gid> for api-proxy in compose generation.
src/services/api-proxy-service-config.test.ts Asserts api-proxy service includes the user field.
src/container-cleanup.ts Extends cleanup plumbing to pass docker path prefix + image selection into artifact preservation.
src/commands/main-action.ts Passes dockerHostPathPrefix, imageRegistry, imageTag into cleanup().
src/commands/main-action.test.ts Updates cleanup() invocation expectations for new parameters.
src/artifact-preservation.ts Adds rootless-only Docker helper permission repair; promotes some chmod-failure logs from debug → warn.
src/artifact-preservation-errors.test.ts Adds tests for rootless repair path and mount path translation.
containers/cli-proxy/server.js Creates log stream with mode: 0644 for new log files.
containers/api-proxy/token-persistence.js Creates diagnostic and token log streams with mode: 0644.
containers/api-proxy/otel-exporters.js Creates OTel span file stream with mode: 0644.
containers/api-proxy/blocked-request-diagnostics.js Creates blocked-request diagnostics stream with mode: 0644.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 13/13 changed files
  • Comments generated: 3

Comment thread src/artifact-preservation.ts Outdated
Comment on lines +96 to +103
function resolvePermFixerImageRef(imageRegistry?: string, imageTag?: string): string {
try {
const registry = imageRegistry || 'ghcr.io/github/gh-aw-firewall';
const parsedImageTag = parseImageTag(imageTag || 'latest');
return buildRuntimeImageRef(registry, 'agent', parsedImageTag);
} catch {
return 'ghcr.io/github/gh-aw-firewall/agent:latest';
}
Comment thread src/artifact-preservation.ts Outdated
Comment on lines +117 to +129
const existingDirs = dirs.filter(
(dir): dir is string => typeof dir === 'string' && dir.length > 0 && fs.existsSync(dir),
);
if (existingDirs.length === 0) {
return;
}

const uid = getSafeHostUid();
const gid = getSafeHostGid();
const imageRef = resolvePermFixerImageRef(imageRegistry, imageTag);

for (const dir of existingDirs) {
const mount = applyHostPathPrefixToVolumes([`${dir}:/fix:rw`], dockerHostPathPrefix)[0];
Comment thread src/services/cli-proxy-service.ts
@lpcox

lpcox commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

@copilot address the review feedback

- resolvePermFixerImageRef: use agent-act image when agentImage is 'act'
  so --pull never succeeds when the act preset was used at runtime
- fixArtifactPermissionsForRootless: resolve dirs to absolute paths via
  path.resolve() before constructing docker -v mount strings (relative
  paths would be treated as named volumes by Docker)
- cli-proxy service: remove user override (host UID:GID) which breaks
  the entrypoint's write to /tmp/proxy-tls/combined-ca.crt owned by
  the cliproxy image user
- Plumb agentImage through cleanup() -> preserveCleanupArtifacts() chain
- Update affected tests

Copilot AI commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@copilot address the review feedback

Addressed all three review items in commit fix: address reviewer feedback on rootless permission repair:

  1. resolvePermFixerImageRef image mismatch — Added agentImage to PreserveCleanupArtifactsOptions and plumbed it through cleanup()preserveCleanupArtifacts()fixArtifactPermissionsForRootless(). resolvePermFixerImageRef now selects agent-act when agentImage === 'act', so --pull never resolves to the same image already pulled at runtime.

  2. Relative path → named volume hazard — Added path.resolve(dir) before constructing the -v mount string in fixArtifactPermissionsForRootless, ensuring Docker always receives an absolute bind-mount path.

  3. cli-proxy user override breaking /tmp/proxy-tls — Removed the user: <host_uid>:<host_gid> override from the cli-proxy service definition. The entrypoint's write to /tmp/proxy-tls/combined-ca.crt now runs as the image's cliproxy user as intended. Updated the corresponding test assertion.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

📡 Smoke OTel Tracing completed. All tracing scenarios validated. ✅

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

🔑 Smoke Copilot PAT PAT auth validated. All systems operational. ✅

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

📰 VERDICT: Smoke Copilot has concluded. All systems operational. This is a developing story. 🎤

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Security Guard failed. Please review the logs for details.

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (Entra) completed. Copilot AOAI BYOK (Entra) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Smoke Claude passed

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK AOAI (api-key) completed. Copilot AOAI BYOK (api-key) mode operational. 🔓

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Build Test Suite completed successfully!

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

🔌 Smoke Services — All services reachable! ✅

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Contribution Check completed successfully!

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Smoke Copilot BYOK completed. Copilot BYOK mode operational. 🔓

@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Smoke Gemini completed. All facets verified. 💎

Gemini Smoke Test: PASS. Titles found: 1. fix(network-isolation): break topology-attach ordering deadlock... 2. refactor(api-proxy): extract sliding-window... File created: /tmp/gh-aw/agent/smoke-test-gemini-28206620319.txt. Connectivity: 200 via proxy.

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Claude Engine Validation

  • API status: ✅ PASS
  • gh check: ✅ PASS
  • File status: ✅ PASS

Overall result: PASS

Generated by Smoke Claude for issue #5546 · 61.4 AIC · ⊞ 3.3K ·

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the clear write-up and for adding tests around the rootless artifact permission repair path.

One contribution-guideline item needs attention before this fully matches CONTRIBUTING.md's pull request requirements for a clear description: the PR description says both api-proxy and cli-proxy services now run as "<host_uid>:<host_gid>", but the diff only sets user on api-proxy, and the cli-proxy test explicitly expects proxy.user to be undefined.

Could you either update the PR description to remove the cli-proxy claim, or adjust the implementation/tests if cli-proxy is intended to run with the host UID/GID too?

Generated by Contribution Check for issue #5546 · 65.1 AIC · ⊞ 22.1K ·

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test: Copilot BYOK Mode — PASS ✅

  • ✅ GitHub MCP connectivity (PR data fetched)
  • ✅ GitHub.com HTTP 200 connectivity
  • ✅ File write/read operations
  • ✅ BYOK inference path (api-proxy → api.githubcopilot.com)

Running in direct BYOK mode via COPILOT_PROVIDER_API_KEY with api-proxy sidecar injection.

@lpcox @Copilot

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions

Copy link
Copy Markdown
Contributor

🔥 Smoke Test Results

PR: Fix rootless firewall artifact permissions to prevent EACCES on upload
Author: @Copilot | Assignees: @lpcox @Copilot

Test Result
GitHub MCP connectivity
GitHub.com HTTP ✅ (200)
File write/read ❌ (pre-step template vars not resolved)

Overall: FAIL — pre-step data was not injected (workflow template variables unresolved).

📰 BREAKING: Report filed by Smoke Copilot

@github-actions

Copy link
Copy Markdown
Contributor

🔥 Smoke Test: Copilot PAT Auth — PASS

Test Result
GitHub MCP connectivity
GitHub.com connectivity
File write/read

Overall: PASS · Auth mode: PAT (COPILOT_GITHUB_TOKEN)

CC @lpcox @Copilot

🔑 PAT report filed by Smoke Copilot PAT

@github-actions

Copy link
Copy Markdown
Contributor

@Copilot

Smoke Test Results:

  • GitHub MCP: ✅
  • GitHub.com HTTP: ✅
  • File I/O: ❌
  • BYOK Inference: ✅

Running in direct BYOK mode (AWF_AUTH_TYPE=github-oidc + AWF_AUTH_AZURE_* + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw).

Overall: FAIL

🪪 BYOK (AOAI Entra) report filed by Smoke Copilot BYOK AOAI (Entra)

@github-actions

Copy link
Copy Markdown
Contributor

5544: fix(network-isolation): break topology-attach ordering deadlock starving cli-proxy health gate ✅
5523: refactor(api-proxy): extract sliding-window data structure into rate-limiter-window.js ✅
Overall: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions

Copy link
Copy Markdown
Contributor

🔬 Smoke Test: API Proxy OpenTelemetry Tracing

Scenario Result Detail
✅ Module Loading Pass otel.js loads cleanly; exports 13 symbols (startRequestSpan, setTokenAttributes, setBudgetAttributes, endSpan, endSpanError, shutdown, isEnabled, + 6 private)
✅ Test Suite Pass 59/59 tests passed across otel.test.js + otel-fanout.test.js (0 failures)
✅ Env Var Forwarding Pass api-proxy-env-config.ts forwards GH_AW_OTLP_ENDPOINTS, OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, GITHUB_AW_OTEL_TRACE_ID, GITHUB_AW_OTEL_PARENT_SPAN_ID; observability-environment.ts passes through all OTEL_* vars
✅ Token Tracker Integration Pass onUsage callback present in token-tracker-http.js (lines 283/324) as the OTEL hook point for gen_ai.usage.* attributes
✅ Graceful Degradation Pass Without OTLP endpoint, otel.js falls back to FileSpanExporter/var/log/api-proxy/otel.jsonl; no errors on load

All 5 scenarios pass. GenAI semantic conventions (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.response.model) and W3C parent context propagation via GITHUB_AW_OTEL_TRACE_ID/GITHUB_AW_OTEL_PARENT_SPAN_ID are fully implemented.

📡 OTel tracing validated by Smoke OTel Tracing

@github-actions

Copy link
Copy Markdown
Contributor

Chroot Version Comparison Results

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3
Node.js v24.17.0 v22.23.0
Go go1.22.12 go1.22.12

Overall: ❌ Not all tests passed — Python and Node.js versions differ between host and chroot environments.

Tested by Smoke Chroot

@github-actions

Copy link
Copy Markdown
Contributor

@Copilot @lpcox

  • PR: Fix rootless firewall artifact permissions to prevent EACCES on upload
  1. MCP Connectivity: ✅
  2. GitHub.com HTTP: ✅
  3. File I/O: ✅
  4. Direct BYOK mode: ✅

Running in direct BYOK mode (COPILOT_PROVIDER_API_KEY + COPILOT_PROVIDER_BASE_URL) via api-proxy → Azure OpenAI (Foundry, o4-mini-aw)

Overall: PASS

🔑 BYOK (AOAI api-key) report filed by Smoke Copilot BYOK AOAI (api-key)

@github-actions

Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx All passed ✅ PASS
Node.js execa All passed ✅ PASS
Node.js p-limit All passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #5546 · 41.4 AIC · ⊞ 7.8K ·

@github-actions

Copy link
Copy Markdown
Contributor

Smoke Test Results — Services Connectivity

Check Result
Redis PING ❌ Timeout (no response)
PostgreSQL pg_isready ❌ No response
PostgreSQL SELECT 1 ❌ Timeout

Overall: ❌ FAIL

host.docker.internal is not reachable from this environment — the runner is not running inside a Docker container, so service containers are not accessible via that hostname.

🔌 Service connectivity validated by Smoke Services

@github-actions

Copy link
Copy Markdown
Contributor

Overall status: PASS

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@lpcox lpcox merged commit 463dfad into main Jun 25, 2026
85 of 88 checks passed
@lpcox lpcox deleted the copilot/fix-firewall-logs-eaccess branch June 25, 2026 23:38
lpcox added a commit that referenced this pull request Jun 25, 2026
Two unrelated stale-test failures had landed on main:

- main-action.test.ts: cleanup() gained a 9th agentImage parameter
  (rootless perm-fixer, #5546) but the call assertion still expected
  8 args. Add the trailing agentImage arg.
- 4 workflow lock tests pinned setup@3c7f3b6f (v0.81.0); the lock
  files were upgraded to setup@b5cde6c (v0.81.2). Update the pins.

All 3211 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants