Skip to content

Commit 2da1e2d

Browse files
committed
fix(runs): make ACTL sampleworks image self-contained
1 parent ae77784 commit 2da1e2d

14 files changed

Lines changed: 517 additions & 205 deletions

File tree

.actlignore

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Keep ACTL sync focused on source. Large data/results should live under
2+
# /mnt/diffuse-shared or the pod home PVC, not in the synced checkout.
3+
.pixi/
4+
grid_search_results/
5+
outputs/
6+
data/
7+
initial_dataset_40*/
8+
checkpoints/
9+
release_data/
10+
*.ckpt
11+
*.pt
12+
*.tar.gz
13+
*.tgz

Dockerfile

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@
77
# Build:
88
# docker build -t sampleworks .
99
#
10-
# CI builds pull checkpoints automatically from Docker Hub via:
11-
# COPY --from=diffuseproject/sampleworks-checkpoints:latest
10+
# CI builds pull checkpoints automatically from Harbor via:
11+
# COPY --from=harbor.astera.sh/library/sampleworks-checkpoints:latest
1212
# No checkpoint files are needed in the build context or on the CI runner.
1313
#
1414
# To rebuild the checkpoints base image (only needed when checkpoints change):
@@ -56,7 +56,7 @@
5656
# /checkpoints/protenix_base_default_v0.5.0.pt - Protenix model (~1.4GB)
5757
#
5858
# Checkpoints base image:
59-
# All checkpoints live in diffuseproject/sampleworks-checkpoints:latest on Docker Hub.
59+
# All checkpoints live in harbor.astera.sh/library/sampleworks-checkpoints:latest.
6060
# To rebuild that image, see /data/users/diffuse/checkpoint-build/ on the GPU server.
6161

6262
# ============================================================================
@@ -108,7 +108,7 @@ RUN chmod +x /usr/local/bin/entrypoint.sh
108108
# ============================================================================
109109
# Checkpoints (~10 GB) rarely change, so this layer is placed before pixi
110110
# installs to stay cached even when dependencies update.
111-
COPY --from=diffuseproject/sampleworks-checkpoints:latest /checkpoints/ /checkpoints/
111+
COPY --from=harbor.astera.sh/library/sampleworks-checkpoints:latest /checkpoints/ /checkpoints/
112112

113113
# ============================================================================
114114
# Install all three environments: boltz, protenix, rf3
@@ -129,6 +129,12 @@ RUN pixi run -e boltz python -c "\
129129
from sampleworks.core.forward_models.xray.real_space_density_deps.ops import dilate_atom_centric; \
130130
print('CUDA extensions compiled successfully')" || echo "CUDA extension pre-compilation skipped (no GPU during build)"
131131

132+
COPY run_all_models.sh ./
133+
RUN chmod +x /app/run_all_models.sh \
134+
&& printf '#!/usr/bin/env bash\nexec /app/run_all_models.sh "$@"\n' > /usr/local/bin/run_all_models.sh \
135+
&& chmod +x /usr/local/bin/run_all_models.sh \
136+
&& printf '\n# ACTL scientist workflow: land in the baked Sampleworks app.\nif [[ $- == *i* ]] && [ -z "${SAMPLEWORKS_NO_AUTO_CD:-}" ] && [ -d /app ]; then\n cd /app\nfi\n' >> /root/.bashrc
137+
132138
# Set default checkpoint paths via environment variables
133139
ENV BOLTZ1_CHECKPOINT=/checkpoints/boltz1_conf.ckpt \
134140
BOLTZ2_CHECKPOINT=/checkpoints/boltz2_conf.ckpt \

README.md

Lines changed: 31 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -152,35 +152,50 @@ Output layout: `grid_search_results/<protein>/<model>[_<method>]/<scaler>/ens<N>
152152
Instructions for running evaluation and metrics scripts are coming soon.
153153

154154

155-
## Preset experiments (`sampleworks-runs`)
155+
## ACTL preset experiments (`run_all_models.sh` / `sampleworks-runs`)
156156

157-
For canonical multi-model/multi-GPU sweeps, the `sampleworks-runs` CLI orchestrates parallel `run_grid_search.py` jobs from a single TOML preset. Each preset declares its jobs (model, pixi env, GPU assignment, args); the runner launches them in parallel, tees per-job logs, and aggregates exit codes.
157+
For canonical multi-model/multi-GPU sweeps, `sampleworks-runs` orchestrates parallel `run_grid_search.py` jobs from a single TOML preset. Each preset declares the model, pixi env, GPU assignment, output subdir, and CLI args. The runner launches jobs in parallel, tees per-job logs, and aggregates exit codes.
158158

159-
**Pod-side prerequisite.** Bundled presets reference the canonical `/data/inputs`, `/data/results`, and `/root/.sampleworks` paths set up by the ACTL pod-init script. On a fresh sampleworks pod, run once per session:
159+
On ACTL, start the pod with the prebuilt image and shared storage, then run one command inside the pod shell:
160160

161161
```bash
162-
bash /mnt/diffuse-shared/raw/sampleworks/actl_setup_sampleworks_paths.sh
162+
actl pod up sampleworks-pr236 --profile 8x --image sampleworks --storage shared --pvc-size 200Gi --mount diffuse-shared --yes
163+
164+
# inside the ACTL pod shell
165+
# the sampleworks image drops interactive shells in /app
166+
run_all_models.sh --dry-run # inspect commands first
167+
run_all_models.sh # run /app/src/sampleworks/runs/presets/all_models.toml
163168
```
164169

165-
That creates symlinks pointing the canonical paths at the shared mount (and namespaces `/data/results` by hostname or `$SAMPLEWORKS_ACTL_RUN_NAME`). Overrides via env var (`DATA_DIR=...`) or CLI (`--set defaults.DATA_DIR=...`) work without the symlinks.
170+
The wrapper keeps the TOML preset as the source of truth. It only supplies ACTL-friendly defaults:
171+
172+
- `DATA_DIR=/mnt/diffuse-shared/raw/sampleworks/initial_dataset_40_occ_sweeps`
173+
- `RESULTS_DIR=/mnt/diffuse-shared/results/sampleworks/<pod>/all_models`
174+
- `MSA_CACHE_DIR=/mnt/diffuse-shared/cache/sampleworks/msa`
175+
- `PYTHONPATH=/app/src`, using the copy baked into the sampleworks image
176+
- direct `/app/.pixi/envs/<env>/bin/python` execution, so it reuses the environments baked into the sampleworks image without refreshing pixi caches
177+
- `/tmp` pixi/uv caches for any missing environment preparation, avoiding shared-storage Git cache issues
178+
179+
Common commands:
166180

167181
```bash
168-
pixi run -e rf3 sampleworks-runs --list # bundled presets
169-
pixi run -e rf3 sampleworks-runs rf3_partial # run a preset
170-
pixi run -e rf3 sampleworks-runs rf3_partial --show # inspect resolved values
171-
pixi run -e rf3 sampleworks-runs rf3_partial --dry-run # print pixi run commands, don't execute
172-
pixi run -e rf3 sampleworks-runs all_models --only rf3,protenix # subset jobs
173-
174-
# Override any value without editing the TOML:
175-
pixi run -e rf3 sampleworks-runs rf3_partial \
176-
--set jobs.rf3.gpus=7 \
182+
run_all_models.sh --list # bundled presets
183+
run_all_models.sh all_models --show # inspect resolved values
184+
run_all_models.sh all_models --only rf3,protenix # subset jobs
185+
run_all_models.sh rf3_partial # run a smaller preset
186+
187+
# Override paths or parameters without editing TOML:
188+
DATA_DIR=/mnt/diffuse-shared/raw/sampleworks/my_dataset run_all_models.sh rf3_partial
189+
run_all_models.sh rf3_partial \
190+
--set jobs.rf3.gpus=0 \
177191
--set jobs.rf3.args.gradient-weights="0.0 0.01 0.02"
178192
```
179193

180-
Bundled presets live in `src/sampleworks/runs/presets/*.toml`. Add a new preset by dropping a `.toml` file alongside them or pointing at any path:
194+
Bundled presets live in `src/sampleworks/runs/presets/*.toml`. You can also copy one, edit it, and run it by path:
181195

182196
```bash
183-
sampleworks-runs ./my_experiment.toml
197+
cp src/sampleworks/runs/presets/all_models.toml my_experiment.toml
198+
run_all_models.sh ./my_experiment.toml
184199
```
185200

186201
Env-var defaults (`DATA_DIR`, `RESULTS_DIR`, `MSA_CACHE_DIR`, `PROTEINS_CSV`) declared per preset are filled from the process environment when set, otherwise from the preset's `[defaults]` block.

0 commit comments

Comments
 (0)