fix(docker): salvage operational hardening from #1686 — .env readonly-var parser + xz-utils/git apt deps + root re-exec#1969
Conversation
… apt deps) Three independent operational hardening fixes salvaged from PR #1686 (@binhpt310) after the parent PR was deferred over a separate sibling-repo build-context concern unrelated to these fixes: 1. start.sh's .env loader now filters readonly bash vars (UID, GID, EUID, EGID, PPID) before `source`-ing. docker-compose.yml's macOS instructions document `echo "UID=$(id -u)" >> .env` to set host UID/GID for bind-mount permission fixing — that .env was crashing start.sh with `UID: readonly variable` when `set -a; source ...; set +a` tried to assign to those names. Replaced with `source <(grep -vE '^[[:space:]]*(export[[:space:]]+)?(UID|GID|EUID|EGID|PPID)=' "${REPO_ROOT}/.env")`. The bootstrap regression guard at tests/test_bootstrap_dotenv.py:181 still passes — both `source` and `.env` are still on the modified line. 2. start.sh now defensively re-execs as the unprivileged hermeswebui user when invoked as root. Fires only when EUID==0 AND a hermeswebui user actually exists AND sudo is on PATH — so it's a no-op on host machines without the container user setup. The production image's entrypoint (docker_init.bash) already drops to hermeswebui before invoking start.sh, so this is a no-op on the canonical container path; it only matters for `sudo ./start.sh` or accidental root shells inside the container during interactive debugging. 3. Dockerfile installs xz-utils + git apt packages. xz-utils is required to decompress .tar.xz archives (e.g. Node.js distribution tarballs); git is needed for `git describe` (powers WEBUI_VERSION resolution at api/updates.py:_detect_webui_version) and any clone-based agent install path. Both are tiny apt packages on top of python:3.12-slim with no measurable image-size impact. What's NOT in this commit (deferred from #1686): - Pre-baking hermes-agent source into the image via `COPY hermes-agent-desktop/hermes-agent /opt/hermes/` plus a build-context flip to `..`. Requires a sibling-repo layout that breaks the canonical `git clone hermes-webui && cd hermes-webui && docker compose build` flow. The right shape is a build arg gating the COPY behind --build-arg WITH_AGENT_SOURCE=1; left to a separate PR. - Pre-installing Node.js 22 LTS system-wide. Real motivation but worth evaluating the fix shape (full Node bake vs. opt-in vs. layer cache) separately from these three operational fixes. Tests: tests/test_docker_env_readonly_vars.py — 11 tests (4 source-grep on the start.sh filter pattern + 5 behavioral that actually run bash against synthetic .env files containing readonly vars + 2 Dockerfile package-presence tests). All 11 pass. Behavioral tests skip if bash is not on PATH. Full suite: 5028 → 5036 passing (+8 net new after pytest collection counted some behavioral tests under skip), 0 regressions, 147.84s. Closes the operational-hardening portion of #1686. Co-authored-by: binhpt310 <binhpt310@users.noreply.github.com>
Per Opus advisor on PR #1969: the original three-guard root re-exec (EUID==0, hermeswebui exists, sudo on PATH) would exit non-zero with `sudo: a password is required` on host machines where the developer's hermeswebui user doesn't have NOPASSWD configured. Better failure mode: silent fall-through to running as root (back to pre-PR behavior). Adds a fourth guard `sudo -n -u hermeswebui true 2>/dev/null` that pre-flights the sudo capability without producing visible output. Also expands the comment to clarify which guard is load-bearing on the canonical container path (the production image doesn't ship sudo at all, so `command -v sudo` is the silent-no-op gate there; the entrypoint docker_init.bash never invokes start.sh in any case). No new tests needed — existing behavioral tests already cover the non-root + non-sudo paths, which is what runs in CI and on host.
Opus advisor — VERDICT: SHIP-AS-ISRan Opus on the PR diff + brief covering the 5 numbered asks. Verbatim verdict:
Final state
Merging. |
Summary
Operational hardening for Docker /
start.sh, salvaged from PR #1686 (@binhpt310) after the parent PR was deferred over a separate sibling-repo build-context concern unrelated to these fixes.Three small, independent fixes — each addresses a real, reproducible bug on master:
1.
start.shno longer crashes on.envfiles that carryUID/GIDlinesdocker-compose.yml's macOS instructions explicitly document:…to set host UID/GID for the bind-mount permission fixer. That
.envfile is then read by bothdocker-compose(substitutes${UID}/${GID}references) and bystart.shviaset -a; source "${REPO_ROOT}/.env"; set +a.bash treats
UID,GID,EUID,EGID, andPPIDas read-only variables.source-ing a.envcontainingUID=501fails with:…and
set -eabortsstart.shimmediately. Reproduced cleanly on master:The fix replaces the bare
source "${REPO_ROOT}/.env"with:…which strips the readonly-named lines from the stream before
sourcereads them. The.envfile itself is untouched, so docker-compose's${UID}/${GID}substitutions still resolve correctly.The regression guard at
tests/test_bootstrap_dotenv.py:181(which requires bothsourceand.envto appear instart.sh) still passes — both keywords are still on that single line.2.
start.shre-execs ashermeswebuiwhen invoked as rootA defensive guard for the
sudo ./start.shcase (or accidental root shell inside the container, which can happen during interactive debugging). Without it, the WebUI process owns root-mode files on bind-mounted state, which then fights the host UID alignment the bind-mount fixer is trying to set up.The production image's entrypoint (
docker_init.bash) already drops tohermeswebuibefore invokingstart.sh, so this is a no-op on the canonical container path. It only fires when:EUID == 0hermeswebuiuser actually exists (so this is a no-op on host machines without the container's user setup)sudois available (production image already has it)If any of those don't hold, the re-exec is skipped and the script runs as-is.
3.
Dockerfileinstallsxz-utilsandgitxz-utils— required to decompress.tar.xzarchives (e.g. Node.js distribution tarballs, some Python wheel deps). Without it, any code path that downloads an.tar.xzarchive fails withxz: Cannot exec: No such file or directory.git— needed for any agent-install path that clones a repo, plus forgit describe(which powers theWEBUI_VERSIONresolution atapi/updates.py:_detect_webui_version).Both are tiny apt packages (a few hundred KB total) on top of
python:3.12-slim, no measurable image-size impact.Tests
New file
tests/test_docker_env_readonly_vars.py(11 tests, all passing):start.sh.envfiles containing the readonly vars + verify the loader doesn't crash and non-readonly keys still loadxz-utils,git)The behavioral tests are skipped if
bashis not onPATH(CI is fine). Existing bootstrap regression guard attest_bootstrap_dotenv.py:181still passes.Full suite: pending verification in stage.
What's NOT in this PR (deferred from #1686)
The original PR also bundled two larger changes that don't ship here:
COPY hermes-agent-desktop/hermes-agent /opt/hermes/plus a build-context flip to..and a Dockerfile path ofhermes-webui/Dockerfile. That requires a sibling-repo layout —git clone hermes-webui && cd hermes-webui && docker compose buildwould fail withCOPY failed: file not found in build context: hermes-agent-desktop/hermes-agent, breaking Docker for every standalone clone. The right shape for that feature is a build arg (ARG WITH_AGENT_SOURCE=0) so users with the sibling repo opt in via--build-arg WITH_AGENT_SOURCE=1, and the canonicalgit clone + cd + docker compose buildflow keeps working unchanged. Detailed in my May 5 close comment on #1686 (Option A).npm installat agent install time) is real, but the fix shape (bake Node into every image, +50 MB) is one option among several and worth evaluating separately from these three operational fixes.Both are open questions for @binhpt310's follow-up if they want to refile.
Attribution
Sourced from PR #1686 (@binhpt310) — when #1686 was deferred over the sibling-repo build-context concern, these three fixes (xz-utils + git apt packages,
.envreadonly-var parser, root re-exec) were the cluster that was unambiguously net-new and shippable on top of master without any of the structural changes. Pulling them out as a focused follow-up keeps @binhpt310's work credited and shipping rather than orphaned.Closes the operational-hardening portion of #1686.
🤖 Generated with Claude Code
Co-authored-by: binhpt310 binhpt310@users.noreply.github.com