Skip to content

Commit d86a299

Browse files
committed
feat(runs): auto-assign preset GPUs
1 parent e99e5f2 commit d86a299

19 files changed

Lines changed: 209 additions & 50 deletions

README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,8 @@ run_experiments full_8gpu --jobs rf3,protenix
195195
Standalone presets are available for each model/model family: `boltz`,
196196
`boltz1`, `boltz2`, `boltz2_xrd`, `boltz2_md`, `rf3`, and `protenix`.
197197
Additional comparison presets include `protenix_dual`, `rf3_protenix`, and RF3
198-
variants.
198+
variants. Single-job presets default to `gpu_count = 8`, so on an 8-GPU pod
199+
they use the whole machine.
199200

200201
Presets live in `experiments/*.toml` in your local checkout and on the pod at
201202
`/home/dev/workspace/experiments/*.toml`. To modify an experiment, edit or copy
@@ -210,10 +211,15 @@ run_experiments --preset my_rf3
210211
For one-off changes, use `--set` instead of editing TOML:
211212

212213
```bash
213-
run_experiments rf3 --set jobs.rf3.gpus=0,1
214+
run_experiments rf3 --set jobs.rf3.gpu_count=4
214215
run_experiments rf3 --set jobs.rf3.args.gradient-weights="0.0 0.01 0.02"
215216
```
216217

218+
Presets usually declare `gpu_count = N`, not fixed GPU IDs. The runner assigns
219+
visible GPUs automatically in job order, so the same preset works on different
220+
pod sizes. Use explicit `gpus = "0,1"` only when you need to pin a job to
221+
specific devices.
222+
217223
Defaults: inputs come from `/mnt/diffuse-shared/raw/sampleworks/...`, checkpoints
218224
from `/mnt/diffuse-shared/raw/checkpoints`, results go to
219225
`/mnt/diffuse-shared/results/sampleworks/<pod>/<target>/`, and MSA caches go to

experiments/boltz.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,13 @@ align-to-input = true
1818
[[jobs]]
1919
name = "boltz2_xrd"
2020
env = "boltz"
21-
gpus = "0,1"
21+
gpu_count = 4
2222
output_subdir = "boltz2_xrd"
2323
args = { model = "boltz2", method = "X-RAY DIFFRACTION", gradient-weights = "0.0 0.05 0.1 0.2 0.35 0.5" }
2424

2525
[[jobs]]
2626
name = "boltz2_md"
2727
env = "boltz"
28-
gpus = "2,3"
28+
gpu_count = 4
2929
output_subdir = "boltz2_md"
3030
args = { model = "boltz2", method = "MD", gradient-weights = "0.0 0.05 0.1 0.2 0.35 0.5" }

experiments/boltz1.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,6 @@ align-to-input = true
2222
[[jobs]]
2323
name = "boltz1"
2424
env = "boltz"
25-
gpus = "0,1"
25+
gpu_count = 8
2626
output_subdir = "boltz1"
2727
args = {}

experiments/boltz2.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,13 @@ align-to-input = true
1818
[[jobs]]
1919
name = "boltz2_xrd"
2020
env = "boltz"
21-
gpus = "0,1"
21+
gpu_count = 4
2222
output_subdir = "boltz2_xrd"
2323
args = { model = "boltz2", method = "X-RAY DIFFRACTION", gradient-weights = "0.0 0.05 0.1 0.2 0.35 0.5" }
2424

2525
[[jobs]]
2626
name = "boltz2_md"
2727
env = "boltz"
28-
gpus = "2,3"
28+
gpu_count = 4
2929
output_subdir = "boltz2_md"
3030
args = { model = "boltz2", method = "MD", gradient-weights = "0.0 0.05 0.1 0.2 0.35 0.5" }

experiments/boltz2_md.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,6 @@ align-to-input = true
2121
[[jobs]]
2222
name = "boltz2_md"
2323
env = "boltz"
24-
gpus = "0,1"
24+
gpu_count = 8
2525
output_subdir = "boltz2_md"
2626
args = {}

experiments/boltz2_xrd.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,6 @@ align-to-input = true
2121
[[jobs]]
2222
name = "boltz2_xrd"
2323
env = "boltz"
24-
gpus = "0,1"
24+
gpu_count = 8
2525
output_subdir = "boltz2_xrd"
2626
args = {}

experiments/full_8gpu.toml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,27 +18,27 @@ align-to-input = true
1818
[[jobs]]
1919
name = "boltz2_xrd"
2020
env = "boltz"
21-
gpus = "0,1"
21+
gpu_count = 2
2222
output_subdir = "boltz2_xrd"
2323
args = { model = "boltz2", method = "X-RAY DIFFRACTION", gradient-weights = "0.0 0.05 0.1 0.2 0.35 0.5" }
2424

2525
[[jobs]]
2626
name = "boltz2_md"
2727
env = "boltz"
28-
gpus = "2,3"
28+
gpu_count = 2
2929
output_subdir = "boltz2_md"
3030
args = { model = "boltz2", method = "MD", gradient-weights = "0.0 0.05 0.1 0.2 0.35 0.5" }
3131

3232
[[jobs]]
3333
name = "rf3"
3434
env = "rf3"
35-
gpus = "4,5"
35+
gpu_count = 2
3636
output_subdir = "rf3"
3737
args = { model = "rf3", gradient-weights = "0.0 0.005 0.01 0.02 0.035 0.05 0.1" }
3838

3939
[[jobs]]
4040
name = "protenix"
4141
env = "protenix"
42-
gpus = "6,7"
42+
gpu_count = 2
4343
output_subdir = "protenix"
4444
args = { model = "protenix", gradient-weights = "0.0 0.05 0.1 0.2 0.35 0.5" }

experiments/protenix.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,6 @@ align-to-input = true
2020
[[jobs]]
2121
name = "protenix"
2222
env = "protenix"
23-
gpus = "0,1"
23+
gpu_count = 8
2424
output_subdir = "protenix"
2525
args = {}

experiments/protenix_dual.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,13 @@ align-to-input = true
2222
[[jobs]]
2323
name = "protenix_tiny"
2424
env = "protenix"
25-
gpus = "2,3"
25+
gpu_count = 4
2626
output_subdir = "protenix_tiny"
2727
args = { model-checkpoint = "${PROTENIX_TINY_CHECKPOINT}" }
2828

2929
[[jobs]]
3030
name = "protenix_mini"
3131
env = "protenix"
32-
gpus = "6,7"
32+
gpu_count = 4
3333
output_subdir = "protenix_mini"
3434
args = { model-checkpoint = "${PROTENIX_MINI_CHECKPOINT}" }

experiments/rf3.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,6 @@ align-to-input = true
2222
[[jobs]]
2323
name = "rf3"
2424
env = "rf3"
25-
gpus = "0,1"
25+
gpu_count = 8
2626
output_subdir = "rf3"
2727
args = {}

0 commit comments

Comments
 (0)