feat(runs): add analysis preset harness by xraymemory · Pull Request #245 · diff-use/sampleworks

xraymemory · 2026-05-28T16:45:38Z

Summary

add analysis preset TOML configs and run_analysis entrypoints
extend sampleworks-runs loading/execution support for analysis presets
document evaluation harness usage and add runner coverage

Tests

Not run locally (PR creation only).

Summary by CodeRabbit

New Features
- Added analysis workflow system via run_analysis command for preset-driven evaluation jobs
- Added support for grid-search analysis, altloc detection/classification, and external evaluation tools
- New CLI options: --depth for controlling CIF file search depth; --skip-pre-jobs to re-run analyses without repeating patching steps
Documentation
- Updated evaluation guide with analysis workflow examples and configuration details
Tests
- Expanded test coverage for analysis orchestration and CLI validation

coderabbitai · 2026-05-28T16:45:53Z

📝 Walkthrough

Walkthrough

This PR adds a complete configurable preset-driven analysis job orchestration system by extending the Job schema to support per-job script selection and output-flag customization, parameterizing the CLI and preset loader to support arbitrary preset directories, refactoring the runner to orchestrate sequential pre-jobs followed by parallel main jobs with per-job script resolution, providing five analysis TOML preset configurations, introducing a run_analysis shell entry point with Pixi project detection and sync/mismatch policy enforcement, and integrating Docker container support with updated entrypoint routing.

Changes

Analysis Presets Framework

Layer / File(s)	Summary
Job and Preset schema with analysis environment support `src/sampleworks/runs/schema.py`	Job gains `script` (default `""`) and `output_arg` (default `"output-dir"`) fields with validation rejecting leading dashes. Preset gains `pre_jobs` list field. VALID_PIXI_ENVS expands to include analysis and dev environment variants. Preset validation checks job name uniqueness across both pre_jobs and main jobs.
CLI configuration framework and refactoring `src/sampleworks/runs/cli.py`	Introduces CliConfig frozen dataclass to parameterize preset directory discovery, default preset/aliases, and results-dir fallback logic. Refactors main() to delegate to run_cli(config) with centralized parser construction, --list support, config-driven target resolution, job filtering (preserving pre_jobs), expanded --show output (including _print_job helper), and results-dir selection using config defaults.
Analysis CLI entry point and wheel packaging `src/sampleworks/runs/analysis_cli.py`, `pyproject.toml`	New analysis_cli.py module with ANALYSIS_CLI_CONFIG for analysis-specific preset discovery (analyses/ directory) and main(argv) delegating to run_cli. Adds sampleworks-analysis console script entrypoint and forces analyses package inclusion in wheel build.
Parameterized preset discovery and loading `src/sampleworks/runs/loader.py`	Extends list_presets() and load_preset() to accept preset_dir_name and preset_dir_env_var parameters. Refactors directory discovery into _preset_dirs() (env override, SAMPLEWORKS_SOURCE_DIR, workspace default, upward search) and _find_upward_preset_dirs(). Parameterizes _read_toml and _find_preset_path. Recognizes pre_jobs as top-level override target. Introduces _build_jobs helper to populate script and output_arg fields during preset construction.
Runner invocation building and pre-jobs orchestration `src/sampleworks/runs/runner.py`	Factors invocation building into _build_invocations_for_jobs() supporting output injection via job.output_arg and per-job script resolution via _resolve_script_path(). Adds DISABLE_GPU_ASSIGNMENTS constant and _gpu_assignment_disables_gpus() for centralizing CPU-only job detection. Updates _build_argv() to resolve scripts and emit expanded list/tuple arguments. Refactors run() to separately build pre-invocations and main invocations, support skip_pre_jobs option, validate GPU assignments per phase, print phase-labeled dry-run/launch output, and execute pre-jobs sequentially via _run_sequential() before main jobs.
Analysis preset TOML configurations `analyses/analyze_grid_search.toml`, `analyses/external_tools.toml`, `analyses/all.toml`, `analyses/altloc_find.toml`, `analyses/altloc_classify.toml`	Five TOML preset files with defaults, shared_args parameterization, pre_jobs patching steps, and main job entries. analyze_grid_search: rscc (1 GPU), lddt/bond_geometry (no GPU). external_tools: tortoize/phenix_clashscore (no GPU) with shared patch pre-step. all: combines five evaluation jobs. altloc_find/altloc_classify: standalone CPU jobs independent of grid-search outputs.
run_analysis shell entry point with Pixi project detection `run_analysis`, `run_analysis.sh`	Bash script discovering Sampleworks repo root via is_sampleworks_root() and find_sampleworks_root_upwards(). Parses --preset, --results-dir, --jobs, positional target; enforces required GRID_SEARCH_RESULTS_DIR/INPUTS_DIR; derives analysis paths and environment exports. Detects Pixi project directory by comparing /app's pyproject.toml/pixi.lock against source. Implements pixi sync/mismatch policy: disallows runtime updates when prebuilt /app diverges from source; allows updates and sets rebuild flags when synced. Execs prebuilt Python runner directly (adjusting PATH/CONDA/CUDA) or pixi run for dynamic invocation. run_analysis.sh wrapper enforces strict shell mode and forwards to run_analysis binary.
Docker container updates and entrypoint routing `Dockerfile`, `docker-entrypoint.sh`	Dockerfile adds pixi install -e analysis, removes /app/analyses and run_* files from runtime image, creates /home/dev/workspace, copies run_analysis and run_analysis.sh to /usr/local/bin/, appends /root/.bashrc auto-cd hook. docker-entrypoint.sh extends help text, adds run_analysis to helper-command passthrough, validates analysis in environment case, updates error messages.
Documentation updates and evaluation utilities `README.md`, `scripts/eval/EVALUATION.md`, `src/sampleworks/eval/grid_search_eval_utils.py`, `src/sampleworks/runs/__init__.py`	README.md adds "Running preset analyses on ACTL (run_analysis)" section with environment exports, example commands, --set overrides, --skip-pre-jobs flag. EVALUATION.md updates CIF preparation and running guidance to position run_analysis as preferred entrypoint. grid_search_eval_utils.py adds --depth option for CIF recursion depth. Module docstring updated to reflect preset-orchestrator terminology.
Patch script exit code handling `scripts/patch_output_cif_files.py`	Refactors main() to return int exit codes (0 success, 1 no-match or errors). Updates docstring and main block to raise SystemExit(main(...)) so process status reflects return value.
Test coverage for analysis framework `tests/runs/test_runner.py`, `tests/runs/test_cli.py`	test_analysis_preset_builds_eval_script_invocations() validates pre-job patching and RSCC wiring. test_pre_jobs_run_before_main_jobs() confirms execution ordering. test_analysis_cli_lists_analysis_presets() checks preset enumeration. test_gpu_validation_ignores_cpu_jobs_but_checks_gpu_duplicates() ensures CPU jobs bypass validation while GPU duplicates error. test_output_arg_rejects_any_leading_dash() validates output_arg constraints. test_jobs_filters_positional_preset() verifies --jobs filtering with positional preset selection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

diff-use/sampleworks#240: Extended the same sampleworks.runs orchestration infrastructure (preset loader, CLI, runner) with analysis support and new features like pre_jobs, skip_pre_jobs, per-job script/output_arg fields.
diff-use/sampleworks#186: Introduced evaluation scripts (run_and_process_tortoize.py, run_and_process_phenix_clashscore.py) that this PR now orchestrates via analysis preset TOML configurations.

Suggested reviewers

marcuscollins
k-chrispens

Poem

🐰 Presets bloom in TOML rows,
Pre-jobs patch before the flows,
Pixi roots are now our test,
Scripts called per-job—truly blessed!
Analysis runs, at long last free. ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding an analysis preset harness (TOML configurations and entrypoints) to the runs module.
Docstring Coverage	✅ Passed	Docstring coverage is 93.10% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/run-analysis-eval-harness

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

This PR adds an “analysis preset” harness alongside the existing experiment preset runner, enabling evaluation scripts under scripts/eval/ to be launched via TOML presets (loaded from analyses/*.toml) using the same Python runner backend as sampleworks-runs.

Changes:

Introduces analysis presets (analyses/*.toml) plus a dedicated CLI entrypoint (sampleworks-analysis) and ACTL wrapper (run_analysis) to run evaluation jobs.
Extends the preset schema/runner to support per-job script and configurable output_arg injection (including disabling output-flag injection for eval scripts), and improves GPU validation to ignore CPU-only jobs (gpus="none").
Updates docs, Docker image setup, and tests to cover analysis preset loading/listing and eval invocation construction.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`tests/runs/test_runner.py`	Adds coverage for analysis preset invocation building, analysis preset listing, and GPU validation behavior with CPU-only jobs.
`src/sampleworks/runs/schema.py`	Extends job schema with `script` and `output_arg`; expands valid pixi envs to include analysis-related envs.
`src/sampleworks/runs/runner.py`	Supports custom job scripts + output-arg injection; adds list-arg formatting; improves GPU validation for `gpus="none"` jobs; resolves script paths against likely repo roots.
`src/sampleworks/runs/loader.py`	Generalizes preset discovery/loading to support both `experiments/` and `analyses/` directories via parameters/env vars.
`src/sampleworks/runs/cli.py`	Refactors CLI to be config-driven (`CliConfig`) so it can power both experiment and analysis entrypoints.
`src/sampleworks/runs/analysis_cli.py`	New `sampleworks-analysis` entrypoint configured to list/load/run analysis presets from `analyses/`.
`src/sampleworks/runs/__init__.py`	Updates package docstring to reflect multi-script orchestration and the new analysis CLI.
`scripts/eval/EVALUATION.md`	Documents `run_analysis` usage and shows equivalent direct script invocations.
`run_analysis.sh`	Adds backward-compatible `.sh` alias wrapper for `run_analysis`.
`run_analysis`	Adds ACTL-native wrapper that sets environment defaults and launches `sampleworks.runs.analysis_cli`.
`README.md`	Adds an ACTL section describing `run_analysis` and how to override settings with `--set`.
`pyproject.toml`	Registers `sampleworks-analysis` console script and includes `analyses/` in wheel packaging.
`Dockerfile`	Installs the `analysis` pixi env and ships `run_analysis` wrappers into the image.
`docker-entrypoint.sh`	Updates help/validation to recognize the `analysis` env and advertises `run_analysis`.
`analyses/grid_search.toml`	Adds core analysis preset for RSCC/LDDT/bond-geometry eval jobs.
`analyses/external_tools.toml`	Adds analysis preset for external-tool-based eval jobs (tortoize, phenix clashscore).
`analyses/altloc_find.toml`	Adds analysis preset for building altloc selections CSV.
`analyses/altloc_classify.toml`	Adds analysis preset for classifying altloc regions.
`analyses/all.toml`	Adds “run everything” analysis preset, including optional external tools.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        -*)
+            ;;
+        *)
+            if [[ -z "$target" ]]; then
+                target="$arg"


+    for root in _source_root_candidates():
+        candidate = root / expanded
+        if candidate.exists():
+            return str(candidate)
+    return str(expanded)


coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@README.md`:
- Around line 269-270: The README example uses `rscc` as an analysis preset but
`rscc` is a job name; update the example to call a valid preset (e.g.
`grid_search`) and then set job-specific overrides — for example use
`run_analysis grid_search --set shared_args.target-filename=refined.cif` and
demonstrate `--set jobs.rscc.gpus=0` to show how to override the `rscc` job;
locate the example near the `run_analysis` invocation and replace the invalid
`rscc` preset with `grid_search` (references: run_analysis, rscc, grid_search,
shared_args.target-filename, jobs.rscc.gpus).

In `@scripts/eval/EVALUATION.md`:
- Line 71: The example command uses a non-existent preset "rscc" (run_analysis
rscc --set shared_args.target-filename=refined.cif) which will break copy-paste;
update the example to reference a real preset (for example use "grid_search
--jobs rscc" or another actual preset name) and show the adjusted command using
that preset while keeping the --set shared_args.target-filename=refined.cif flag
so readers can run it successfully.

In `@src/sampleworks/runs/schema.py`:
- Around line 96-99: The validation currently only rejects values starting with
"--"; update the check on self.output_arg (in the Job/schema initializer where
the block uses self.name and self.output_arg) to reject any leading dash by
testing startswith("-") instead of startswith("--"), and raise the same
ValueError (including the job name and offending output_arg) so values like
"-output-dir" fail fast rather than producing malformed argv entries.

In `@tests/runs/test_runner.py`:
- Around line 270-347: The three tests
(test_analysis_preset_builds_eval_script_invocations,
test_analysis_cli_lists_analysis_presets,
test_gpu_validation_ignores_cpu_jobs_but_checks_gpu_duplicates) use short inline
docstrings; change each to a NumPy-style docstring: add a one-line summary, a
blank line, and a "Parameters" section (none for these tests -> list nothing or
omit) and a "Returns" section stating "None" (or omit if preferred by project
style), plus any short notes under "Notes" if needed; ensure the docstring is a
triple-quoted block directly under the def and follows NumPy formatting
conventions so linter/style checks pass.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4a080a16-4b27-40a3-9fdf-6a4e54a0e3d4

📥 Commits

Reviewing files that changed from the base of the PR and between 6595672 and 82fe167.

📒 Files selected for processing (19)

Dockerfile
README.md
analyses/all.toml
analyses/altloc_classify.toml
analyses/altloc_find.toml
analyses/external_tools.toml
analyses/grid_search.toml
docker-entrypoint.sh
pyproject.toml
run_analysis
run_analysis.sh
scripts/eval/EVALUATION.md
src/sampleworks/runs/__init__.py
src/sampleworks/runs/analysis_cli.py
src/sampleworks/runs/cli.py
src/sampleworks/runs/loader.py
src/sampleworks/runs/runner.py
src/sampleworks/runs/schema.py
tests/runs/test_runner.py

coderabbitai · 2026-05-28T20:31:15Z

Actionable comments posted: 0

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/sampleworks/runs/cli.py (1)

248-257: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Allow --jobs with a positional preset target.

Right now sampleworks-runs rf3 --jobs foo fails with “pass jobs either as the positional target or with --jobs” because the if jobs: error runs before checking whether target is a real preset name. That contradicts the parser help, which says --jobs filters the selected preset.

Suggested fix

-    if jobs:
-        parser.error("pass jobs either as the positional target or with --jobs, not both")
-
     if target.endswith(".toml") or "/" in target:
         parser.error("pass custom preset paths with --preset path/to/preset.toml")
 
-    if "," not in target and target in loader.list_presets(
+    if "," not in target and target in loader.list_presets(
         preset_dir_name=config.preset_dir_name,
         preset_dir_env_var=config.preset_dir_env_var,
     ):
-        return target, ""
+        return target, jobs
+
+    if jobs:
+        parser.error("pass jobs either as the positional target or with --jobs, not both")
 
     return config.default_preset, target

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/runs/cli.py` around lines 248 - 257, The code currently
rejects any use of --jobs before checking whether the positional target is a
named preset; change the logic so the jobs conflict is only enforced when the
positional target is not a known preset: call loader.list_presets(...) (using
the same preset_dir_name and preset_dir_env_var from config) to determine if
target is a preset, and only run parser.error("pass jobs either as the
positional target or with --jobs, not both") when jobs is set AND target is not
in that preset list (or when target is a comma-separated job list); i.e., move
or gate the existing if jobs: check behind the loader.list_presets(...) check so
--jobs is allowed when target is a preset.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Dockerfile`:
- Around line 108-115: The RUN shell block that conditionally pre-compiles CUDA
extensions is missing the closing "fi" and currently suppresses real compilation
failures with "|| echo ...", so fix the conditional by adding the matching "fi"
after the else branch and remove the "|| echo ..." that masks errors;
specifically locate the RUN that checks "[ ! -e /dev/nvidiactl ] && [ ! -e
/proc/driver/nvidia/version ]" and the pixi invocation ("pixi run -e boltz
python -c ... dilate_atom_centric") and ensure the else branch runs the pixi
command directly (or with "set -e") so compilation errors propagate and will
fail the build as intended.

---

Outside diff comments:
In `@src/sampleworks/runs/cli.py`:
- Around line 248-257: The code currently rejects any use of --jobs before
checking whether the positional target is a named preset; change the logic so
the jobs conflict is only enforced when the positional target is not a known
preset: call loader.list_presets(...) (using the same preset_dir_name and
preset_dir_env_var from config) to determine if target is a preset, and only run
parser.error("pass jobs either as the positional target or with --jobs, not
both") when jobs is set AND target is not in that preset list (or when target is
a comma-separated job list); i.e., move or gate the existing if jobs: check
behind the loader.list_presets(...) check so --jobs is allowed when target is a
preset.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6d6cf56d-ab0d-49d0-9092-01f74e0fa8f3

📥 Commits

Reviewing files that changed from the base of the PR and between 144c978 and e1f4a92.

📒 Files selected for processing (4)

Dockerfile
README.md
docker-entrypoint.sh
src/sampleworks/runs/cli.py

🚧 Files skipped from review as they are similar to previous changes (2)

docker-entrypoint.sh
README.md

marcuscollins

I will leave it to you to decide what to do with my comments. I don't see anything blocking.

marcuscollins · 2026-05-29T18:29:10Z

+description = "Classify altloc selections into side-chain, loop, and domain-shift categories."
+
+[defaults]
+GRID_SEARCH_RESULTS_DIR = "/data/results/grid_search_results"


this script shouldn't need the results of the grid search. We use it to come up with the parts of protein structures to analyze in more detail.

Does the script run without it?

Agreed. I removed the grid-search-specific defaults from the altloc presets in fc9e822: they now use ALTLOC_ANALYSIS_DIR and ALTLOC_INPUTS_DIR. classify_altloc_regions.py only needs the selections CSV, a CIF root, output path, and thresholds, so yes, it runs without grid-search results as long as those inputs are supplied.

marcuscollins · 2026-05-29T18:30:57Z

+DOMAIN_SHIFT_MIN_SPAN = "50"
+LOOP_LDDT_THRESHOLD = "0.75"
+
+[shared_args]


I find this name, "shared_args" a little confusing. These are just the arguments to the script, right?

Yes — in this schema shared_args means common CLI arguments merged into every job's script invocation, with per-job args overriding them. I left the name in place for this PR because it is already shared by experiment and analysis presets, but I agree script_args/common_args would be clearer if we want to add a compatibility alias later.

marcuscollins · 2026-05-29T18:31:35Z

+description = "Build an analysis protein-config CSV by finding altloc selections in input CIFs."
+
+[defaults]
+GRID_SEARCH_RESULTS_DIR = "/data/results/grid_search_results"


this one also shouldn't need this variable.

Agreed, fixed in fc9e822. altloc_find now uses ALTLOC_ANALYSIS_DIR and ALTLOC_INPUTS_DIR instead of depending on GRID_SEARCH_RESULTS_DIR/GRID_SEARCH_INPUTS_DIR.

marcuscollins · 2026-05-29T18:42:47Z

@@ -0,0 +1,53 @@
+description = "Core grid-search evaluations: RSCC, LDDT clustering, and bond geometry."


Rename this preset--it makes me think of running the grid search generation code. How about "analyze_grid_search.toml"?

Done in fc9e822: renamed the preset to analyses/analyze_grid_search.toml. I kept grid_search as an accepted CLI alias for compatibility, but the listed/canonical preset name is now analyze_grid_search.

marcuscollins · 2026-05-29T18:43:37Z


 from . import loader, runner
-from .schema import Preset
+from .schema import Job, Preset


Generally I favor absolute imports.

Done for the new runner CLI imports in fc9e822 — switched them to absolute sampleworks.runs... imports.

marcuscollins · 2026-05-29T19:28:06Z

    unique: list[Path] = []
    for candidate in candidates:
        resolved = candidate.expanduser().resolve(strict=False)
        if resolved not in seen:


Are the candidates not all unique?

Not necessarily. The raw candidate list can hit the same directory via env overrides, /home/dev/workspace, cwd parents, or the installed module path, and symlinks can converge after resolution. The de-dupe after resolve(strict=False) preserves precedence while avoiding duplicate preset discovery.

marcuscollins · 2026-05-29T20:15:41Z

+    candidates: list[Path] = []
+    for env_var in ("SAMPLEWORKS_SOURCE_DIR", "SAMPLEWORKS_SCRIPT_ROOT"):
+        override = os.environ.get(env_var)
+        if override:


There's a neat new-ish piece of Python that shortens these kinds of statements: https://peps.python.org/pep-0572/. You can now use:

if override := os.environ.get(env_var): candidates.append(Path(override))

In fact, you can even use:

candidates = [Path(override) for env_var in (...) if (override := os.environ.get(env_var) is not None)]

Good call — applied the simple walrus form in fc9e822. I kept the explicit loop rather than a compact list comprehension because it reads cleaner here and avoids the precedence trap around os.environ.get(env_var) is not None.

marcuscollins · 2026-05-29T20:21:59Z

+    return str(expanded)
+
+
+def _source_root_candidates() -> list[Path]:


It feels odd that we need this, like it is a sign of a deeper structural problem that we don't know where the scripts are and need to look for them. Thoughts? Is there a better way to do this?

I agree it's a bit of a smell. The reason is ACTL: source code is synced to /home/dev/workspace, while the baked image/package can live under /app, and we want script edits to take effect without rebuilding the image. The current search prefers explicit env vars and the synced checkout over baked paths. Longer term, I'd like the wrapper to always set SAMPLEWORKS_SOURCE_DIR and have the runner require that for relative scripts, with package-resource fallback only for packaged/local use.

marcuscollins · 2026-05-29T20:32:27Z

+    assert analysis_cli.main(["--list"]) == 0
+    listed = set(capsys.readouterr().out.splitlines())
+    assert {
+        "all",


I guess this will have to be updated if we add additional .toml files/presets?

For adding presets, no: this is an issubset check, so extra .toml files won't break it. It only needs an update when we intentionally rename/remove one of the core bundled presets, which happened here with grid_search -> analyze_grid_search and is reflected in fc9e822.

marcuscollins · 2026-05-29T20:34:24Z

+# This image carries pixi environments and checkpoints. Runtime source should
+# come from ACTL's synced checkout at /home/dev/workspace, not from stale code
+# baked into /app during image construction.
+RUN rm -rf /app/src /app/scripts /app/experiments /app/analyses \


Do we need to rebuilt the current images for us to be able to update code on the fly in actl?

For code-only changes, no rebuild should be needed: ACTL syncs the checkout to /home/dev/workspace, and the runner resolves scripts from that live checkout first. Rebuilds are still needed for dependency/pixi-lock changes or wrapper/image changes baked into /usr/local/bin; RUNTIME_PIXI=1 is the escape hatch when intentionally testing a dependency change inside an existing pod.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/sampleworks/runs/runner.py (1)
668-669: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: skip_pre_jobs logic is bypassed by unconditional rebuild.

Line 668 unconditionally rebuilds pre_invocations, overwriting the skip_pre_jobs conditional on line 651-653. When skip_pre_jobs=True, line 652 correctly sets pre_invocations = [], but line 668 then repopulates it by calling build_pre_invocations without checking the flag. This breaks the --skip-pre-jobs feature documented in README.md.
🐛 Proposed fix to respect skip_pre_jobs during rebuild
     for pixi_env in pixi_envs:
         _prepare_pixi_env(pixi_env)
-    pre_invocations = build_pre_invocations(preset, results_dir=results_dir)
+    pre_invocations = (
+        [] if skip_pre_jobs else build_pre_invocations(preset, results_dir=results_dir)
+    )
     invocations = build_invocations(preset, results_dir=results_dir)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/sampleworks/runs/runner.py` around lines 668 - 669, The code
unconditionally rebuilds pre_invocations by calling build_pre_invocations after
the skip_pre_jobs branch, which overrides the earlier pre_invocations = [] when
skip_pre_jobs is True; change the logic so build_pre_invocations(preset,
results_dir=results_dir) is only called when skip_pre_jobs is False (i.e.,
respect the skip_pre_jobs flag), leaving pre_invocations as the empty list
otherwise; update the section around the pre_invocations assignment that
currently calls build_pre_invocations and ensure build_invocations(preset,
results_dir=results_dir) remains unaffected.

🧹 Nitpick comments (2)

tests/runs/test_cli.py (1)
136-151: ⚡ Quick win

Test only verifies flag propagation, not actual pre-job skipping behavior.

This test mocks runner.run to check that skip_pre_jobs=True is passed, but doesn't verify that pre-jobs are actually skipped when the runner executes. Given the bug in runner.py line 668 (unconditional pre_invocations rebuild), this test passes while the feature is broken.

As per coding guidelines, "Avoid using mocks; test expected behavior directly instead." Consider enhancing this to be a behavioral test: build a minimal preset with pre_jobs, run with --skip-pre-jobs, and assert that the pre-job output/log is absent.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/runs/test_cli.py` around lines 136 - 151, Update the test
test_skip_pre_jobs_flag_is_passed_to_runner to be behavioral instead of mocking
runner.run: create a minimal preset containing a real pre_jobs (or
pre_invocations) task that produces identifiable output/logging, invoke
cli.main(["--preset", "<minimal_preset>", "--skip-pre-jobs"]) and assert the
process exit code is 0 and that the pre-job output/log entries are absent; do
not monkeypatch runner.run—call the real runner so the test validates that the
skip_pre_jobs flag actually prevents executing pre_jobs (references:
test_skip_pre_jobs_flag_is_passed_to_runner, cli.main, runner.run, and the
preset's pre_jobs/pre_invocations behavior).
README.md (1)
254-254: 💤 Low value

Minor: Mixed spelling variants of "analyze"/"analyse".

The preset name uses American spelling (analyze_grid_search) while the surrounding prose uses British spelling ("analyses"). Consider using consistent American spelling throughout: "Running preset analyses" → "Running preset analyzes" or use "analysis" as the noun form consistently.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@README.md` at line 254, The README has mixed spelling: the preset name
analyze_grid_search uses American spelling while the surrounding prose uses
British spelling "analyses"; update the surrounding text to use consistent
American spelling (e.g., change "analyses" or the phrase "Running preset
analyses" to "analyzes" or better yet replace with the noun "analysis") so the
preset identifier analyze_grid_search, other presets like all and
external_tools, and the prose use a consistent spelling; search for occurrences
of "analyses"/"analyse" near references to analyze_grid_search and adjust them
accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@src/sampleworks/runs/runner.py`:
- Around line 668-669: The code unconditionally rebuilds pre_invocations by
calling build_pre_invocations after the skip_pre_jobs branch, which overrides
the earlier pre_invocations = [] when skip_pre_jobs is True; change the logic so
build_pre_invocations(preset, results_dir=results_dir) is only called when
skip_pre_jobs is False (i.e., respect the skip_pre_jobs flag), leaving
pre_invocations as the empty list otherwise; update the section around the
pre_invocations assignment that currently calls build_pre_invocations and ensure
build_invocations(preset, results_dir=results_dir) remains unaffected.

---

Nitpick comments:
In `@README.md`:
- Line 254: The README has mixed spelling: the preset name analyze_grid_search
uses American spelling while the surrounding prose uses British spelling
"analyses"; update the surrounding text to use consistent American spelling
(e.g., change "analyses" or the phrase "Running preset analyses" to "analyzes"
or better yet replace with the noun "analysis") so the preset identifier
analyze_grid_search, other presets like all and external_tools, and the prose
use a consistent spelling; search for occurrences of "analyses"/"analyse" near
references to analyze_grid_search and adjust them accordingly.

In `@tests/runs/test_cli.py`:
- Around line 136-151: Update the test
test_skip_pre_jobs_flag_is_passed_to_runner to be behavioral instead of mocking
runner.run: create a minimal preset containing a real pre_jobs (or
pre_invocations) task that produces identifiable output/logging, invoke
cli.main(["--preset", "<minimal_preset>", "--skip-pre-jobs"]) and assert the
process exit code is 0 and that the pre-job output/log entries are absent; do
not monkeypatch runner.run—call the real runner so the test validates that the
skip_pre_jobs flag actually prevents executing pre_jobs (references:
test_skip_pre_jobs_flag_is_passed_to_runner, cli.main, runner.run, and the
preset's pre_jobs/pre_invocations behavior).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: da306317-a6ef-4cbc-80b0-76ab76af3988

📥 Commits

Reviewing files that changed from the base of the PR and between e1f4a92 and fc9e822.

📒 Files selected for processing (16)

Dockerfile
README.md
analyses/all.toml
analyses/altloc_classify.toml
analyses/altloc_find.toml
analyses/analyze_grid_search.toml
analyses/external_tools.toml
run_analysis
scripts/eval/EVALUATION.md
src/sampleworks/eval/grid_search_eval_utils.py
src/sampleworks/runs/analysis_cli.py
src/sampleworks/runs/cli.py
src/sampleworks/runs/runner.py
src/sampleworks/runs/schema.py
tests/runs/test_cli.py
tests/runs/test_runner.py

✅ Files skipped from review due to trivial changes (1)

scripts/eval/EVALUATION.md

🚧 Files skipped from review as they are similar to previous changes (8)

src/sampleworks/runs/analysis_cli.py
analyses/altloc_classify.toml
src/sampleworks/runs/schema.py
Dockerfile
analyses/external_tools.toml
tests/runs/test_runner.py
src/sampleworks/runs/cli.py
run_analysis

feat(runs): add analysis preset harness

82fe167

Copilot AI review requested due to automatic review settings May 28, 2026 16:45

xraymemory had a problem deploying to gpu-testing May 28, 2026 16:45 — with GitHub Actions Error

Copilot started reviewing on behalf of xraymemory May 28, 2026 16:45 View session

Copilot AI reviewed May 28, 2026

View reviewed changes

coderabbitai Bot reviewed May 28, 2026

View reviewed changes

Comment thread README.md Outdated

Comment thread scripts/eval/EVALUATION.md Outdated

Comment thread src/sampleworks/runs/schema.py Outdated

Comment thread tests/runs/test_runner.py

feat(runs): automate analysis CIF patching

144c978

xraymemory had a problem deploying to gpu-testing May 28, 2026 20:24 — with GitHub Actions Error

docs(runs): clarify filtered pre-jobs

c62684a

xraymemory had a problem deploying to gpu-testing May 28, 2026 20:31 — with GitHub Actions Error

Merge branch 'main' into feat/run-analysis-eval-harness

e1f4a92

xraymemory had a problem deploying to gpu-testing May 29, 2026 18:26 — with GitHub Actions Error

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Comment thread Dockerfile Outdated

marcuscollins approved these changes May 29, 2026

View reviewed changes

fix(runs): address analysis harness review comments

fc9e822

xraymemory requested a deployment to gpu-testing June 1, 2026 02:45 — with GitHub Actions Waiting

coderabbitai Bot reviewed Jun 1, 2026

View reviewed changes

xraymemory merged commit ae01dde into main Jun 1, 2026
8 of 11 checks passed

		@@ -0,0 +1,53 @@
		description = "Core grid-search evaluations: RSCC, LDDT clustering, and bond geometry."

		return str(expanded)


		def _source_root_candidates() -> list[Path]:

Conversation

xraymemory commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot commented May 28, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

marcuscollins left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

xraymemory commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading