fix(docker): combine pixi installs into single RUN to prevent disk-full by Abdelsalam-Abbas · Pull Request #206 · diff-use/sampleworks

Abdelsalam-Abbas · 2026-04-09T19:07:57Z

Summary

Combines the three separate RUN pixi install commands back into a single RUN
Separate RUNs create overlay layers that duplicate shared conda packages (numpy, CUDA libs, etc.) across environments -- measured ~37 GB (3 layers) vs ~14 GB (1 layer)
ubuntu-latest non-deterministically provisions runners with 72 GB or 145 GB disks. On 72 GB runners, the split-RUN approach exceeds available space during build

Root cause investigation

The split was introduced in #204 for better layer caching. However, the three pixi environments share many conda packages, and overlay layers store full copies of files per layer. This ~23 GB overhead pushes the build past the disk limit on smaller runners.

Disk usage measured via df -h inside Docker build steps:

Metric	Split RUN (3 layers)	Single RUN (1 layer)
boltz	8 GB	8 GB
protenix	12 GB (new layer)	4 GB (shared pkgs deduped)
rf3	12 GB (new layer)	2 GB (shared pkgs deduped)
Total pixi disk	~37 GB	~14 GB

Test plan

Verified split-RUN fails on 72 GB runners (jobs 70663224672, 70670232201)
Verified single-RUN disk usage via df -h debugging (job 70682099334)
Confirmed checkpoint image unchanged since March 25 (same SHA across all builds)

Summary by CodeRabbit

Chores
- Streamlined the deployment build process by consolidating multiple environment setup commands into a single optimized layer, resulting in improved build performance, reduced container image overhead, better dependency caching efficiency, and enhanced operational efficiency during containerization and deployment cycles while maintaining full functionality.

The docker-container driver duplicates the checkpoint image in its nested content store (~20 GB for pull + extract), leaving insufficient space for pixi install. The docker driver runs BuildKit in-process, avoiding this duplication. Also adds pull_request trigger and disables push for testing.

coderabbitai · 2026-04-09T19:08:12Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 64b41697-72dd-476a-bb99-6adbb311c78d

📥 Commits

Reviewing files that changed from the base of the PR and between a0ef63c and c907a49.

📒 Files selected for processing (1)

Dockerfile

🚧 Files skipped from review as they are similar to previous changes (1)

Dockerfile

📝 Walkthrough

Walkthrough

The Dockerfile consolidates three separate pixi install commands for the boltz, protenix, and rf3 environments into a single chained RUN instruction. This reduces Docker layers from three independent installation steps to one combined layer, altering caching behavior and image layer composition during the build process.

Changes

Cohort / File(s)	Summary
Docker Build Optimization `Dockerfile`	Consolidated three separate `RUN pixi install -e <env> --frozen` commands into a single chained RUN using `&&` operators for sequential environment installations within one layer.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Speed up Docker build: free disk space, drop broken cache #204: Directly reverses this change by splitting the single chained RUN command back into three separate RUN layers for the same Pixi environment installations.

Suggested reviewers

xraymemory
k-chrispens
marcuscollins

Poem

🐰 Layers once three, now melded as one,
Docker builds faster when chaining is done!
Pixi's environments dance in a line,
One cache, one image—a optimization divine! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(docker): combine pixi installs into single RUN to prevent disk-full' accurately describes the main change—consolidating multiple pixi install commands into a single RUN layer to reduce disk usage during Docker builds.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch worktree-fix-docker-buildx-driver

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

.github/workflows/docker.yml (1)

13-14: Consider guarding Docker Hub auth/publish for PR events (especially forks).

With Line 13 enabling pull_request, keep build validation but gate secret-dependent steps so fork PRs don’t fail on Docker Hub auth.

Suggested hardening

       - name: Login to Docker Hub
+        if: github.event_name != 'pull_request' || !github.event.pull_request.head.repo.fork
         uses: docker/login-action@v4
         with:
           username: ${{ secrets.DOCKERHUB_USERNAME }}
           password: ${{ secrets.DOCKERHUB_TOKEN }}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/docker.yml around lines 13 - 14, The workflow enables
pull_request events but doesn’t guard secret-dependent Docker Hub auth/publish
steps, causing fork PRs to fail; update the Docker login/publish steps (those
that use DOCKERHUB_USERNAME/DOCKERHUB_TOKEN or perform docker build-and-push) to
run only when the PR is not from a fork by adding an if conditional such as if:
github.event_name != 'pull_request' || github.event.pull_request.head.repo.fork
== false (or equivalent check for same-repo PRs), and ensure repository-level
permissions/secrets are restricted so secret access is skipped for forked PRs.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/docker.yml:
- Line 71: The workflow currently disables image publishing by setting "push:
false" which prevents pushes on real events (e.g., push to main or release
tags); update the workflow to restore publishing for real runs by removing or
changing the test-only override so that "push" is enabled for production events
(e.g., set push: true or conditionally set push: false only for the specific
test job/event), ensuring the Docker login step can push images as intended.

---

Nitpick comments:
In @.github/workflows/docker.yml:
- Around line 13-14: The workflow enables pull_request events but doesn’t guard
secret-dependent Docker Hub auth/publish steps, causing fork PRs to fail; update
the Docker login/publish steps (those that use
DOCKERHUB_USERNAME/DOCKERHUB_TOKEN or perform docker build-and-push) to run only
when the PR is not from a fork by adding an if conditional such as if:
github.event_name != 'pull_request' || github.event.pull_request.head.repo.fork
== false (or equivalent check for same-repo PRs), and ensure repository-level
permissions/secrets are restricted so secret access is skipped for forked PRs.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bb9cf2bb-d424-494e-b3c0-9c911c0f576e

📥 Commits

Reviewing files that changed from the base of the PR and between 41512b1 and ef8a4bd.

📒 Files selected for processing (1)

.github/workflows/docker.yml

The checkpoint COPY materializes ~10 GB into the build layer graph, leaving insufficient space for pixi to extract conda packages. Moving it after pixi installs reduces peak disk usage during the build. Added df -h before/after each pixi install and before checkpoint COPY to capture disk usage for debugging.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

Dockerfile (1)

113-115: Remove temporary df -h probes before merge

The df -h probes were useful for debugging, but keeping them permanently adds noisy logs and unnecessary build steps. Since this PR’s test plan treats these as temporary diagnostics, it’s better to drop them after validation.

Proposed cleanup

-RUN df -h / && pixi install -e boltz --frozen && df -h /
-RUN df -h / && pixi install -e protenix --frozen && df -h /
-RUN df -h / && pixi install -e rf3 --frozen && df -h /
+RUN pixi install -e boltz --frozen
+RUN pixi install -e protenix --frozen
+RUN pixi install -e rf3 --frozen
...
-RUN df -h /
 COPY --from=diffuseproject/sampleworks-checkpoints:latest /checkpoints/ /checkpoints/

Also applies to: 124-124

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 113 - 115, Remove the temporary df -h probes from
the Docker build RUN steps that wrap pixi install commands; specifically edit
the RUN lines that call "pixi install -e boltz --frozen", "pixi install -e
protenix --frozen", and "pixi install -e rf3 --frozen" to drop the preceding and
trailing "df -h / &&" and "&& df -h /" so the steps only run the pixi install
commands, and do the same for the similar RUN at the other location mentioned.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Dockerfile`:
- Line 125: The Dockerfile COPY step uses a mutable tag
"diffuseproject/sampleworks-checkpoints:latest" which makes builds
non-reproducible; replace that tag with an immutable image digest (use the
repository image digest from the registry) and update the COPY --from reference
accordingly (e.g., change COPY
--from=diffuseproject/sampleworks-checkpoints:latest /checkpoints/ /checkpoints/
to use diffuseproject/sampleworks-checkpoints@sha256:<actual-digest>), ensuring
the pinned digest is the one you verified from the registry so future builds use
the exact same source image.

---

Nitpick comments:
In `@Dockerfile`:
- Around line 113-115: Remove the temporary df -h probes from the Docker build
RUN steps that wrap pixi install commands; specifically edit the RUN lines that
call "pixi install -e boltz --frozen", "pixi install -e protenix --frozen", and
"pixi install -e rf3 --frozen" to drop the preceding and trailing "df -h / &&"
and "&& df -h /" so the steps only run the pixi install commands, and do the
same for the similar RUN at the other location mentioned.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 330f40f0-a2a6-4178-a902-b3f686624d46

📥 Commits

Reviewing files that changed from the base of the PR and between ef8a4bd and 78e7e44.

📒 Files selected for processing (1)

Dockerfile

Reverting to the original PR #204 ordering (checkpoints COPY before pixi installs) to capture df -h readings and verify disk usage theory. This will likely fail on 75G runners but the df output will confirm why.

Testing whether combining pixi installs into one RUN reduces peak disk via fewer overlay layers. Checkpoints still before pixi (the ordering that failed on 72G runners) to see if single-RUN is enough to fit.

coderabbitai

♻️ Duplicate comments (1)

Dockerfile (1)

109-111: ⚠️ Potential issue | 🟠 Major

Pin checkpoints image to an immutable digest.

Using :latest on Line 110 makes this build non-reproducible; upstream image changes can silently alter artifacts and disk behavior across CI runs.

Proposed hardening

-COPY --from=diffuseproject/sampleworks-checkpoints:latest /checkpoints/ /checkpoints/
+COPY --from=diffuseproject/sampleworks-checkpoints@sha256:<verified_digest> /checkpoints/ /checkpoints/

#!/bin/bash
set -euo pipefail

# Verify mutable tag usage in Dockerfile
rg -n 'COPY\s+--from=diffuseproject/sampleworks-checkpoints:latest' Dockerfile

# Resolve the current digest for :latest from Docker Hub (to pin in Dockerfile)
TOKEN="$(curl -fsSL 'https://auth.docker.io/token?service=registry.docker.io&scope=repository:diffuseproject/sampleworks-checkpoints:pull' | jq -r '.token')"
curl -fsSI \
  -H "Authorization: Bearer ${TOKEN}" \
  -H 'Accept: application/vnd.docker.distribution.manifest.list.v2+json' \
  'https://registry-1.docker.io/v2/diffuseproject/sampleworks-checkpoints/manifests/latest' \
  | tr -d '\r' | rg -i 'docker-content-digest'

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 109 - 111, The Dockerfile uses a mutable tag in the
COPY --from stage (COPY --from=diffuseproject/sampleworks-checkpoints:latest
/checkpoints/ /checkpoints/), making builds non-reproducible; resolve the
current immutable digest for diffuseproject/sampleworks-checkpoints:latest (via
Docker Hub manifest API or docker pull + docker inspect/manifest) and replace
the tag with the pinned digest form
(diffuseproject/sampleworks-checkpoints@sha256:<DIGEST>) in the COPY --from
instruction; optionally document/update CI to refresh the pinned digest when
intentional updates are needed.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@Dockerfile`:
- Around line 109-111: The Dockerfile uses a mutable tag in the COPY --from
stage (COPY --from=diffuseproject/sampleworks-checkpoints:latest /checkpoints/
/checkpoints/), making builds non-reproducible; resolve the current immutable
digest for diffuseproject/sampleworks-checkpoints:latest (via Docker Hub
manifest API or docker pull + docker inspect/manifest) and replace the tag with
the pinned digest form (diffuseproject/sampleworks-checkpoints@sha256:<DIGEST>)
in the COPY --from instruction; optionally document/update CI to refresh the
pinned digest when intentional updates are needed.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dad32474-d386-47c7-9134-276f3fce374c

📥 Commits

Reviewing files that changed from the base of the PR and between 9c856d9 and a0ef63c.

📒 Files selected for processing (1)

Dockerfile

Separate RUN commands for each pixi environment create overlay layers that duplicate shared conda packages (numpy, CUDA libs, etc.), consuming ~37 GB vs ~14 GB in a single layer. This causes disk-full failures on CI runners with 72 GB disks (ubuntu-latest provisions either 72 or 145 GB non-deterministically). Reverts the split introduced in #204 while keeping the other optimizations (free-disk-space, checkpoint layer ordering).

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread .github/workflows/docker.yml Outdated

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread Dockerfile Outdated

Abdelsalam-Abbas added 2 commits April 9, 2026 21:46

Test: revert to checkpoints-before-pixi ordering with df -h debugging

9c856d9

Reverting to the original PR #204 ordering (checkpoints COPY before pixi installs) to capture df -h readings and verify disk usage theory. This will likely fail on 75G runners but the df output will confirm why.

Test: single RUN for all pixi installs with checkpoints-first ordering

a0ef63c

Testing whether combining pixi installs into one RUN reduces peak disk via fewer overlay layers. Checkpoints still before pixi (the ordering that failed on 72G runners) to see if single-RUN is enough to fit.

coderabbitai Bot reviewed Apr 9, 2026

View reviewed changes

Abdelsalam-Abbas changed the title ~~fix(ci): switch buildx to docker driver to fix disk-full build~~ fix(docker): combine pixi installs into single RUN to prevent disk-full Apr 9, 2026

Abdelsalam-Abbas requested review from k-chrispens, marcuscollins, saada and xraymemory April 9, 2026 21:11

marcuscollins approved these changes Apr 9, 2026

View reviewed changes

marcuscollins merged commit bc2cb94 into main Apr 9, 2026
1 check passed

k-chrispens deleted the worktree-fix-docker-buildx-driver branch April 22, 2026 00:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(docker): combine pixi installs into single RUN to prevent disk-full#206

fix(docker): combine pixi installs into single RUN to prevent disk-full#206
marcuscollins merged 5 commits intomainfrom
worktree-fix-docker-buildx-driver

Abdelsalam-Abbas commented Apr 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 9, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Abdelsalam-Abbas commented Apr 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause investigation

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Abdelsalam-Abbas commented Apr 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 9, 2026 •

edited

Loading