You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In --network-isolation (sandbox.agent.sudo: false) mode, the firewall's log and audit artifacts are written into host-mounted directories by container processes whose UIDs the unprivileged runner cannot read. AWF's existing permission-repair step silently no-ops when AWF is rootless, so the runner hits EACCES when upload-artifact zips the firewall logs. The agent run itself may succeed, but the artifact upload fails.
This is fully addressable within gh-aw-firewall and the fix works identically on standard runners and on ARC/DinD. It is the rootless-permission sibling of the topology-attach deadlock (#5543); the broader sudo:false rollout regression set is #5542, and the ARC/chroot track is #5541.
Evidence
gh-aw PR github/gh-aw#41426 reverted glossary-maintainer from sudo: false → true. Part of the motivation: under sudo: false the previous sudo chmod -R a+rX /tmp/gh-aw/sandbox/firewall workaround step is gone, but the firewall containers still write files the runner can't read, so upload-artifact fails with EACCES.
Root cause — the rootless permission asymmetry
AWF already attempts a repair: preserveCleanupArtifacts() runs chmod -R a+rX on every log/audit dir at cleanup (src/artifact-preservation.ts:59,74,160,172,189,201). But that chmod is host-side, executed by the AWF process:
When AWF runs under sudo (sudo: true), the process is root → the chmod succeeds (and gh-aw additionally ran its own sudo chmod).
When AWF runs rootless (sudo: false), the process is the runner UID and gets EPERM on files it doesn't own. The failure is caught and logged at debug only (src/artifact-preservation.ts:61-63), so it is silent.
The files were written by container processes with UIDs that don't match the runner:
cli-proxy → USER cliproxy (non-root, no in-container root at all — containers/cli-proxy/Dockerfile:44)
api-proxy → USER apiproxy (non-root — containers/api-proxy/Dockerfile:43)
squid → USER proxy / uid 13 (containers/squid/Dockerfile:79)
agent / iptables-init → start as root inside the container; on a stock daemon (no userns-remap) those files land root-owned on the host
The log dirs are pre-created 0777 so files can be created (src/workdir-setup.ts:151-198), but the runner can neither chmod nor reliably read what another UID wrote. Result: EACCES at artifact-upload time.
Invariant for any fix: the artifacts must become runner-readable without host root (there is no host sudo in rootless mode, and none at all in an ARC pod) — either created readable, or relaxed by a privileged actor that is available, namely a container under the Docker daemon.
Proposed solution (four parts)
A layered fix: 1a removes the mismatch at the source for the common case, 2b is the universal backstop, 1b is cheap hardening, and the swallowed-chmod change restores observability.
1a — Run the Node sidecars as the runner's UID:GID (primary)
Add compose user: "${uid}:${gid}" to the cli-proxy and api-proxy services, using the host uid/gid AWF already resolves for the agent (getSafeHostUid/getSafeHostGid). Every file these sidecars write is then runner-owned → trivially readable, no chmod needed.
Low risk: both are simple Node servers writing to a 0777 bind mount; neither needs in-container root.
Do not blanket-apply this to squid — it expects uid 13 to own /var/spool/squid; running it as an arbitrary uid can break its cache/db. squid is covered by 2b instead.
Files: src/services/cli-proxy-service.ts, src/services/api-proxy-service-config.ts (add user:), reuse the existing host-uid resolution.
2b — Root "perm-fixer" container at cleanup (universal backstop)
When AWF is rootless (process.getuid() !== 0), run a short-lived root container under the daemon that chowns/chmods AWF's log+audit dirs to the runner. This covers everything 1a doesn't: squid uid-13 files and any root-owned agent/iptables-init output.
Why it works where the host-side chmod fails: on a stock (non-userns-remapped) daemon, container uid 0 == host uid 0 over the bind-mounted files, so the container holds CHOWN/DAC_OVERRIDE/FOWNER regardless of which service UID wrote each file.
Placement: in the cleanup sequence (src/commands/main-action.ts:154-165), step 3 cleanup() → preserveCleanupArtifacts(). Containers are already stopped by stopContainers() (step 2), so there are no write races.
Sketch:
// src/services/perm-fixer.ts — called from preserveCleanupArtifacts when getuid() !== 0asyncfunctionfixArtifactOwnership(dirs: string[],uid: number,gid: number,dockerHostPathPrefix?: string){for(constdirofdirs){// proxyLogsDir, auditDir, sessionStateDirconst[translated]=applyHostPathPrefixToVolumes(// ARC translation; no-op when no prefix[`${dir}:/fix`],dockerHostPathPrefix);// src/services/host-path-prefix.ts:75awaitexeca('docker',['run','--rm','--network','none','--cap-drop','ALL','--cap-add','CHOWN','--cap-add','DAC_OVERRIDE','--cap-add','FOWNER','-e',`TUID=${uid}`,'-e',`TGID=${gid}`,'-v',translated,AWF_AGENT_IMAGE,// reuse an already-pulled image — NO network pull'sh','-c','chown -R "$TUID:$TGID" /fix && chmod -R a+rX /fix',],{env: getLocalDockerEnv(),reject: false});}}
Minimum guarantee:chmod -R a+rX → world-readable + dir-traversable (all upload-artifact needs to read). Stronger:chown to the runner so it can also delete/move the dir afterward.
ARC + non-ARC, same code: the -v source is routed through applyHostPathPrefixToVolumes, which is a no-op when there's no prefix (host-path-prefix.ts:76) and applies the daemon-side prefix in ARC. The chown targets the runner's numeric uid/gid, which is identical on both sides of the shared volume.
Image: reuse the agent/squid image (already pulled) — never busybox/alpine that would need a network pull through the firewall.
Skip when --keep-containers (debugging; leave perms as-is).
Why AWF can still launch the perm-fixer (the chroot is the agent container's, not AWF's)
A natural objection: "doesn't AWF chroot/jail itself, so it can't launch more containers at cleanup?" No — these are two separate processes in separate namespaces:
The AWF orchestrator (src/cli.ts → src/commands/main-action.ts) is an ordinary host/runner process. It never chroots itself; every chroot reference in src/ is just configuration it passes into the agent container (paths/identity/caps — src/awf-config-schema.json:613, src/types/runtime-options.ts:132). It talks to Docker the whole time via execa('docker', …, { env: getLocalDockerEnv() }).
The agent container is what chroots: containers/agent/entrypoint.sh (PID 1 inside that one container) does chroot /host and drops CAP_SYS_CHROOT/CAP_SYS_ADMIN (entrypoint.sh:399-402). That jail lives entirely in the agent container's mount namespace and has zero effect on the AWF process or any other container.
By cleanup time the agent container has already exited (runAgentCommand returned), so its chroot is gone regardless. Launching the perm-fixer is just one more docker run from the same un-jailed orchestrator that already launched squid/agent/api-proxy/cli-proxy — architecturally identical.
The only real precondition is the one that is already true whenever AWF runs at all: the AWF process can reach the Docker daemon (socket access via the docker group — not host root). In rootless mode the orchestrator is the unprivileged runner user, which is exactly why it can't chmod other-UID files directly — but it can still docker run, and the perm-fixer container runs as root in its own namespace (container-uid-0 == host-uid-0 over the bind mount on a stock daemon) to do the chown the host process couldn't. If AWF couldn't talk to the daemon, it could never have started the firewall in the first place.
1b — Permissive file modes at the source (hardening)
Ensure the sidecar log writers create files world-readable (e.g. fs.createWriteStream(LOG_FILE, { flags: 'a', mode: 0o644 }) in containers/cli-proxy/server.js:59 and the api-proxy equivalent, and/or umask 0 in entrypoints). Cheap, FS/daemon-agnostic, and reduces reliance on 2b for the node sidecars. Does not fix ownership of root-/uid-13-owned trees, so it complements rather than replaces 2b.
Restore observability for the swallowed chmod
Today the rootless chmod -R a+rX failure is logged at debug and lost (src/artifact-preservation.ts:61-63). Promote it to a warn (e.g. "could not relax artifact permissions as a non-root user; rootless perm-fixer will repair ownership"), so this class of failure can't silently regress and is diagnosable from default logs.
ARC/DinD compatibility
The fix is ARC-correct by construction because ARC forces exactly two rules, both already satisfied above:
Never assume host root — the runner pod is unprivileged and there is no host sudo; only the DinD daemon can run a root container. → 2b runs under the daemon.
Every helper bind mount must use the existing path translation — runner and daemon have separate filesystems bridged by a shared volume + --docker-host-path-prefix. → 2b routes its -v through applyHostPathPrefixToVolumes, and chowns to the runner's numeric uid (identical on both sides).
Per-solution: 1a ✅ (numeric runner uid on the shared volume), 1b ✅ (mode-based, FS-agnostic), 2b ✅ (daemon-run root + translated path), restore-observability ✅. Known limitation: a userns-remapped daemon maps container root to a subordinate uid, so 2b's chown may fail there; chmod a+rX still applies as the floor.
Boundary / coordination note
AWF can only guarantee the directories it is told about (--proxy-logs-dir, the audit dir, the session-state dir). The failing path /tmp/gh-aw/sandbox/firewall/{logs,audit} may also contain output from gh-aw-managed containers (awmg-mcpg, awmg-cli-proxy) written outside AWF's known dirs. A 2b perm-fixer bind-mounting the passed-in tree covers what AWF owns; anything gh-aw writes elsewhere remains gh-aw's responsibility. The issue should make this split explicit so neither side assumes the other handles it.
Acceptance criteria
Under --network-isolation on a standard hosted runner, firewall logs//audit/ artifacts are readable by the unprivileged runner and upload-artifact succeeds with no external sudo/chmod step.
The same holds on an ARC/DinD runner (shared-volume artifacts owned by / readable to the runner uid).
sudo: true (rootful) runs are unchanged (2b is gated on getuid() !== 0).
The rootless host-side chmod failure is visible at warn level.
Regression coverage: a CI job exercising sudo: false end-to-end (sidecar logs + audit + artifact upload) so the EACCES regression can't silently return.
Summary
In
--network-isolation(sandbox.agent.sudo: false) mode, the firewall's log and audit artifacts are written into host-mounted directories by container processes whose UIDs the unprivileged runner cannot read. AWF's existing permission-repair step silently no-ops when AWF is rootless, so the runner hitsEACCESwhenupload-artifactzips the firewall logs. The agent run itself may succeed, but the artifact upload fails.This is fully addressable within gh-aw-firewall and the fix works identically on standard runners and on ARC/DinD. It is the rootless-permission sibling of the topology-attach deadlock (#5543); the broader sudo:false rollout regression set is #5542, and the ARC/chroot track is #5541.
Evidence
gh-aw PR github/gh-aw#41426 reverted
glossary-maintainerfromsudo: false → true. Part of the motivation: undersudo: falsethe previoussudo chmod -R a+rX /tmp/gh-aw/sandbox/firewallworkaround step is gone, but the firewall containers still write files the runner can't read, soupload-artifactfails withEACCES.Root cause — the rootless permission asymmetry
AWF already attempts a repair:
preserveCleanupArtifacts()runschmod -R a+rXon every log/audit dir at cleanup (src/artifact-preservation.ts:59,74,160,172,189,201). But thatchmodis host-side, executed by the AWF process:sudo(sudo: true), the process is root → thechmodsucceeds (and gh-aw additionally ran its ownsudo chmod).sudo: false), the process is the runner UID and getsEPERMon files it doesn't own. The failure is caught and logged atdebugonly (src/artifact-preservation.ts:61-63), so it is silent.The files were written by container processes with UIDs that don't match the runner:
cli-proxy→USER cliproxy(non-root, no in-container root at all —containers/cli-proxy/Dockerfile:44)api-proxy→USER apiproxy(non-root —containers/api-proxy/Dockerfile:43)squid→USER proxy/ uid 13 (containers/squid/Dockerfile:79)The log dirs are pre-created
0777so files can be created (src/workdir-setup.ts:151-198), but the runner can neitherchmodnor reliably read what another UID wrote. Result:EACCESat artifact-upload time.Invariant for any fix: the artifacts must become runner-readable without host root (there is no host
sudoin rootless mode, and none at all in an ARC pod) — either created readable, or relaxed by a privileged actor that is available, namely a container under the Docker daemon.Proposed solution (four parts)
A layered fix: 1a removes the mismatch at the source for the common case, 2b is the universal backstop, 1b is cheap hardening, and the swallowed-chmod change restores observability.
1a — Run the Node sidecars as the runner's UID:GID (primary)
Add compose
user: "${uid}:${gid}"to thecli-proxyandapi-proxyservices, using the host uid/gid AWF already resolves for the agent (getSafeHostUid/getSafeHostGid). Every file these sidecars write is then runner-owned → trivially readable, no chmod needed.0777bind mount; neither needs in-container root.squid— it expects uid 13 to own/var/spool/squid; running it as an arbitrary uid can break its cache/db. squid is covered by 2b instead.src/services/cli-proxy-service.ts,src/services/api-proxy-service-config.ts(adduser:), reuse the existing host-uid resolution.2b — Root "perm-fixer" container at cleanup (universal backstop)
When AWF is rootless (
process.getuid() !== 0), run a short-lived root container under the daemon that chowns/chmods AWF's log+audit dirs to the runner. This covers everything 1a doesn't: squid uid-13 files and any root-owned agent/iptables-init output.Why it works where the host-side
chmodfails: on a stock (non-userns-remapped) daemon, container uid 0 == host uid 0 over the bind-mounted files, so the container holdsCHOWN/DAC_OVERRIDE/FOWNERregardless of which service UID wrote each file.Placement: in the cleanup sequence (
src/commands/main-action.ts:154-165), step 3cleanup() → preserveCleanupArtifacts(). Containers are already stopped bystopContainers()(step 2), so there are no write races.Sketch:
chmod -R a+rX→ world-readable + dir-traversable (allupload-artifactneeds to read). Stronger:chownto the runner so it can also delete/move the dir afterward.-vsource is routed throughapplyHostPathPrefixToVolumes, which is a no-op when there's no prefix (host-path-prefix.ts:76) and applies the daemon-side prefix in ARC. The chown targets the runner's numeric uid/gid, which is identical on both sides of the shared volume.busybox/alpinethat would need a network pull through the firewall.--keep-containers(debugging; leave perms as-is).Why AWF can still launch the perm-fixer (the chroot is the agent container's, not AWF's)
A natural objection: "doesn't AWF chroot/jail itself, so it can't launch more containers at cleanup?" No — these are two separate processes in separate namespaces:
src/cli.ts→src/commands/main-action.ts) is an ordinary host/runner process. It neverchroots itself; everychrootreference insrc/is just configuration it passes into the agent container (paths/identity/caps —src/awf-config-schema.json:613,src/types/runtime-options.ts:132). It talks to Docker the whole time viaexeca('docker', …, { env: getLocalDockerEnv() }).containers/agent/entrypoint.sh(PID 1 inside that one container) doeschroot /hostand dropsCAP_SYS_CHROOT/CAP_SYS_ADMIN(entrypoint.sh:399-402). That jail lives entirely in the agent container's mount namespace and has zero effect on the AWF process or any other container.By cleanup time the agent container has already exited (
runAgentCommandreturned), so its chroot is gone regardless. Launching the perm-fixer is just one moredocker runfrom the same un-jailed orchestrator that already launched squid/agent/api-proxy/cli-proxy — architecturally identical.The only real precondition is the one that is already true whenever AWF runs at all: the AWF process can reach the Docker daemon (socket access via the
dockergroup — not host root). In rootless mode the orchestrator is the unprivileged runner user, which is exactly why it can'tchmodother-UID files directly — but it can stilldocker run, and the perm-fixer container runs as root in its own namespace (container-uid-0 == host-uid-0 over the bind mount on a stock daemon) to do the chown the host process couldn't. If AWF couldn't talk to the daemon, it could never have started the firewall in the first place.1b — Permissive file modes at the source (hardening)
Ensure the sidecar log writers create files world-readable (e.g.
fs.createWriteStream(LOG_FILE, { flags: 'a', mode: 0o644 })incontainers/cli-proxy/server.js:59and the api-proxy equivalent, and/orumask 0in entrypoints). Cheap, FS/daemon-agnostic, and reduces reliance on 2b for the node sidecars. Does not fix ownership of root-/uid-13-owned trees, so it complements rather than replaces 2b.Restore observability for the swallowed
chmodToday the rootless
chmod -R a+rXfailure is logged atdebugand lost (src/artifact-preservation.ts:61-63). Promote it to awarn(e.g. "could not relax artifact permissions as a non-root user; rootless perm-fixer will repair ownership"), so this class of failure can't silently regress and is diagnosable from default logs.ARC/DinD compatibility
The fix is ARC-correct by construction because ARC forces exactly two rules, both already satisfied above:
sudo; only the DinD daemon can run a root container. → 2b runs under the daemon.--docker-host-path-prefix. → 2b routes its-vthroughapplyHostPathPrefixToVolumes, and chowns to the runner's numeric uid (identical on both sides).Per-solution: 1a ✅ (numeric runner uid on the shared volume), 1b ✅ (mode-based, FS-agnostic), 2b ✅ (daemon-run root + translated path), restore-observability ✅. Known limitation: a userns-remapped daemon maps container root to a subordinate uid, so 2b's
chownmay fail there;chmod a+rXstill applies as the floor.Boundary / coordination note
AWF can only guarantee the directories it is told about (
--proxy-logs-dir, the audit dir, the session-state dir). The failing path/tmp/gh-aw/sandbox/firewall/{logs,audit}may also contain output from gh-aw-managed containers (awmg-mcpg,awmg-cli-proxy) written outside AWF's known dirs. A 2b perm-fixer bind-mounting the passed-in tree covers what AWF owns; anything gh-aw writes elsewhere remains gh-aw's responsibility. The issue should make this split explicit so neither side assumes the other handles it.Acceptance criteria
--network-isolationon a standard hosted runner, firewalllogs//audit/artifacts are readable by the unprivileged runner andupload-artifactsucceeds with no externalsudo/chmodstep.sudo: true(rootful) runs are unchanged (2b is gated ongetuid() !== 0).chmodfailure is visible atwarnlevel.sudo: falseend-to-end (sidecar logs + audit + artifact upload) so the EACCES regression can't silently return.References
sudo: false → truefor glossary-maintainer)src/artifact-preservation.ts:52-208,src/workdir-setup.ts:117-200,src/commands/main-action.ts:148-174,src/services/host-path-prefix.ts:75-80,src/services/cli-proxy-service.ts,src/services/api-proxy-service-config.ts,containers/cli-proxy/Dockerfile:44,containers/api-proxy/Dockerfile:43,containers/squid/Dockerfile:79,containers/cli-proxy/server.js:45-59