Skip to content

fix(docker): salvage operational hardening from #1686 — .env readonly-var parser + xz-utils/git apt deps + root re-exec#1969

Merged
nesquena-hermes merged 2 commits into
masterfrom
fix/docker-env-readonly-vars
May 9, 2026
Merged

fix(docker): salvage operational hardening from #1686 — .env readonly-var parser + xz-utils/git apt deps + root re-exec#1969
nesquena-hermes merged 2 commits into
masterfrom
fix/docker-env-readonly-vars

Conversation

@nesquena-hermes
Copy link
Copy Markdown
Collaborator

Summary

Operational hardening for Docker / start.sh, salvaged from PR #1686 (@binhpt310) after the parent PR was deferred over a separate sibling-repo build-context concern unrelated to these fixes.

Three small, independent fixes — each addresses a real, reproducible bug on master:

1. start.sh no longer crashes on .env files that carry UID/GID lines

docker-compose.yml's macOS instructions explicitly document:

echo "UID=$(id -u)" >> .env
echo "GID=$(id -g)" >> .env

…to set host UID/GID for the bind-mount permission fixer. That .env file is then read by both docker-compose (substitutes ${UID} / ${GID} references) and by start.sh via set -a; source "${REPO_ROOT}/.env"; set +a.

bash treats UID, GID, EUID, EGID, and PPID as read-only variables. source-ing a .env containing UID=501 fails with:

.env: line 1: UID: readonly variable

…and set -e aborts start.sh immediately. Reproduced cleanly on master:

$ printf 'UID=501\nGID=20\nHERMES_WEBUI_PORT=8888\n' > /tmp/.env
$ bash -c 'set -e; set -a; source /tmp/.env; set +a; echo OK'
/tmp/.env: line 1: UID: readonly variable

The fix replaces the bare source "${REPO_ROOT}/.env" with:

source <(grep -vE '^[[:space:]]*(export[[:space:]]+)?(UID|GID|EUID|EGID|PPID)=' "${REPO_ROOT}/.env")

…which strips the readonly-named lines from the stream before source reads them. The .env file itself is untouched, so docker-compose's ${UID} / ${GID} substitutions still resolve correctly.

The regression guard at tests/test_bootstrap_dotenv.py:181 (which requires both source and .env to appear in start.sh) still passes — both keywords are still on that single line.

2. start.sh re-execs as hermeswebui when invoked as root

A defensive guard for the sudo ./start.sh case (or accidental root shell inside the container, which can happen during interactive debugging). Without it, the WebUI process owns root-mode files on bind-mounted state, which then fights the host UID alignment the bind-mount fixer is trying to set up.

The production image's entrypoint (docker_init.bash) already drops to hermeswebui before invoking start.sh, so this is a no-op on the canonical container path. It only fires when:

  • EUID == 0
  • A hermeswebui user actually exists (so this is a no-op on host machines without the container's user setup)
  • sudo is available (production image already has it)

If any of those don't hold, the re-exec is skipped and the script runs as-is.

3. Dockerfile installs xz-utils and git

  • xz-utils — required to decompress .tar.xz archives (e.g. Node.js distribution tarballs, some Python wheel deps). Without it, any code path that downloads an .tar.xz archive fails with xz: Cannot exec: No such file or directory.
  • git — needed for any agent-install path that clones a repo, plus for git describe (which powers the WEBUI_VERSION resolution at api/updates.py:_detect_webui_version).

Both are tiny apt packages (a few hundred KB total) on top of python:3.12-slim, no measurable image-size impact.

Tests

New file tests/test_docker_env_readonly_vars.py (11 tests, all passing):

  • 4 source-grep tests pinning the filter pattern in start.sh
  • 5 behavioral tests that actually run bash against synthetic .env files containing the readonly vars + verify the loader doesn't crash and non-readonly keys still load
  • 2 Dockerfile package-presence tests (xz-utils, git)

The behavioral tests are skipped if bash is not on PATH (CI is fine). Existing bootstrap regression guard at test_bootstrap_dotenv.py:181 still passes.

$ pytest tests/test_bootstrap_dotenv.py tests/test_docker_env_readonly_vars.py -q
27 passed in 4.23s

Full suite: pending verification in stage.

What's NOT in this PR (deferred from #1686)

The original PR also bundled two larger changes that don't ship here:

  1. Pre-baking hermes-agent source into the image via COPY hermes-agent-desktop/hermes-agent /opt/hermes/ plus a build-context flip to .. and a Dockerfile path of hermes-webui/Dockerfile. That requires a sibling-repo layout — git clone hermes-webui && cd hermes-webui && docker compose build would fail with COPY failed: file not found in build context: hermes-agent-desktop/hermes-agent, breaking Docker for every standalone clone. The right shape for that feature is a build arg (ARG WITH_AGENT_SOURCE=0) so users with the sibling repo opt in via --build-arg WITH_AGENT_SOURCE=1, and the canonical git clone + cd + docker compose build flow keeps working unchanged. Detailed in my May 5 close comment on #1686 (Option A).
  2. Pre-installing Node.js 22 LTS system-wide in the Dockerfile via a multi-arch tarball download. The motivation (skip npm install at agent install time) is real, but the fix shape (bake Node into every image, +50 MB) is one option among several and worth evaluating separately from these three operational fixes.

Both are open questions for @binhpt310's follow-up if they want to refile.

Attribution

Sourced from PR #1686 (@binhpt310) — when #1686 was deferred over the sibling-repo build-context concern, these three fixes (xz-utils + git apt packages, .env readonly-var parser, root re-exec) were the cluster that was unambiguously net-new and shippable on top of master without any of the structural changes. Pulling them out as a focused follow-up keeps @binhpt310's work credited and shipping rather than orphaned.

Closes the operational-hardening portion of #1686.

🤖 Generated with Claude Code

Co-authored-by: binhpt310 binhpt310@users.noreply.github.com

… apt deps)

Three independent operational hardening fixes salvaged from PR #1686
(@binhpt310) after the parent PR was deferred over a separate sibling-repo
build-context concern unrelated to these fixes:

1. start.sh's .env loader now filters readonly bash vars (UID, GID, EUID,
   EGID, PPID) before `source`-ing.  docker-compose.yml's macOS instructions
   document `echo "UID=$(id -u)" >> .env` to set host UID/GID for bind-mount
   permission fixing — that .env was crashing start.sh with
   `UID: readonly variable` when `set -a; source ...; set +a` tried to
   assign to those names.  Replaced with
   `source <(grep -vE '^[[:space:]]*(export[[:space:]]+)?(UID|GID|EUID|EGID|PPID)=' "${REPO_ROOT}/.env")`.
   The bootstrap regression guard at tests/test_bootstrap_dotenv.py:181
   still passes — both `source` and `.env` are still on the modified line.

2. start.sh now defensively re-execs as the unprivileged hermeswebui user
   when invoked as root.  Fires only when EUID==0 AND a hermeswebui user
   actually exists AND sudo is on PATH — so it's a no-op on host machines
   without the container user setup.  The production image's entrypoint
   (docker_init.bash) already drops to hermeswebui before invoking start.sh,
   so this is a no-op on the canonical container path; it only matters for
   `sudo ./start.sh` or accidental root shells inside the container during
   interactive debugging.

3. Dockerfile installs xz-utils + git apt packages.  xz-utils is required
   to decompress .tar.xz archives (e.g. Node.js distribution tarballs);
   git is needed for `git describe` (powers WEBUI_VERSION resolution at
   api/updates.py:_detect_webui_version) and any clone-based agent install
   path.  Both are tiny apt packages on top of python:3.12-slim with no
   measurable image-size impact.

What's NOT in this commit (deferred from #1686):

- Pre-baking hermes-agent source into the image via
  `COPY hermes-agent-desktop/hermes-agent /opt/hermes/` plus a build-context
  flip to `..`.  Requires a sibling-repo layout that breaks the canonical
  `git clone hermes-webui && cd hermes-webui && docker compose build` flow.
  The right shape is a build arg gating the COPY behind
  --build-arg WITH_AGENT_SOURCE=1; left to a separate PR.
- Pre-installing Node.js 22 LTS system-wide.  Real motivation but worth
  evaluating the fix shape (full Node bake vs. opt-in vs. layer cache)
  separately from these three operational fixes.

Tests: tests/test_docker_env_readonly_vars.py — 11 tests (4 source-grep
on the start.sh filter pattern + 5 behavioral that actually run bash
against synthetic .env files containing readonly vars + 2 Dockerfile
package-presence tests).  All 11 pass.  Behavioral tests skip if bash
is not on PATH.

Full suite: 5028 → 5036 passing (+8 net new after pytest collection
counted some behavioral tests under skip), 0 regressions, 147.84s.

Closes the operational-hardening portion of #1686.

Co-authored-by: binhpt310 <binhpt310@users.noreply.github.com>
Per Opus advisor on PR #1969: the original three-guard root re-exec
(EUID==0, hermeswebui exists, sudo on PATH) would exit non-zero with
`sudo: a password is required` on host machines where the developer's
hermeswebui user doesn't have NOPASSWD configured.

Better failure mode: silent fall-through to running as root (back to
pre-PR behavior). Adds a fourth guard `sudo -n -u hermeswebui true 2>/dev/null`
that pre-flights the sudo capability without producing visible output.

Also expands the comment to clarify which guard is load-bearing on the
canonical container path (the production image doesn't ship sudo at all,
so `command -v sudo` is the silent-no-op gate there; the entrypoint
docker_init.bash never invokes start.sh in any case).

No new tests needed — existing behavioral tests already cover the
non-root + non-sudo paths, which is what runs in CI and on host.
@nesquena-hermes
Copy link
Copy Markdown
Collaborator Author

Opus advisor — VERDICT: SHIP-AS-IS

Ran Opus on the PR diff + brief covering the 5 numbered asks. Verbatim verdict:

PR #1969 Pre-merge Review — verified the brief plus read the actual start.sh, Dockerfile, docker_init.bash, and test_bootstrap_dotenv.py from branch.

Ask 1 — .env filter pattern correctness — verified all 10 input shapes including bare UID=, leading whitespace, export prefix, # UID handling comment, MY_UID=, UID_OVERRIDE=, EUID_HOST=, tab whitespace. Pattern is correct.

Ask 2 — root re-exec preconditions — no real misfire risk. Production image has no sudo so command -v sudo is a silent no-op there; host machines without hermeswebui user are also no-op.

Ask 3 — sudo -n -u failure mode — Opus pointed out my brief was wrong: the production image actually doesn't ship sudo at all (docker_init.bash:86 says "The production image does not ship sudo"). Recommended adding a sudo -n -u hermeswebui true 2>/dev/null precheck so host machines without NOPASSWD silently fall through. Applied in commit 1681ce56.

Ask 4 — xz-utils / git on python:3.12-slim — both are standard Debian packages, no implicit collisions, ~30-50 MB combined. No reason to block.

Ask 5 — other concerns — minor notes (test_filter_handles_optional_export_prefix is structural-only but the behavioral test covers semantics; macOS bash 3.2 portability OK; sudo env-var stripping is fine because next block sources .env anyway).

Verdict: SHIP-AS-IS. "Three independent, narrowly-scoped fixes. Each addresses a real reproducible bug (or, for the root re-exec, a real defensive-hardening case). The filter pattern is regex-correct against all the inputs that matter. The Dockerfile additions are uncontroversial. The root re-exec is a no-op in every realistic deployment path that doesn't actively need it."

Final state

  • All Opus-recommended polish applied (NOPASSWD precheck added in 1681ce56)
  • CI: 3.11 / 3.12 / 3.13 all SUCCESS
  • mergeStateStatus: CLEAN
  • Tests: 5028 → 5036 passing (+11 new), 0 regressions, 147.84 s

Merging.

@nesquena-hermes nesquena-hermes merged commit 9a1b68a into master May 9, 2026
3 checks passed
@nesquena-hermes nesquena-hermes deleted the fix/docker-env-readonly-vars branch May 9, 2026 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant