Sampleworks

This repository is under active development. Please always use the latest version. If you encounter any problems, please create an issue on GitHub and include: the PDB ID, the CIF file you used, your density map(s), and log information.

We would welcome contributions from the community. We are most interested in:

new ModelWrappers for additional structure prediction models (especially smaller models which may be more steerable)
fast, differentiable modules to allow guidance from other experimental data modalities besides X-ray electron density.

Sampleworks is a Python framework for integrating generative biomolecular structure models with experimental data. Read our blog post for an introduction.

Why sampleworks?

Biomolecular structure prediction and design models are currently trained on single state structures and fail to accurately predict the ensemble of conformations each macromolecule occupies. But there is still hope! Current models show promise in capturing the underlying distribution of realistic macromolecular structures. We want to utilize the prior represented in these models and experimental observations to improve the sampling of the underlying ensemble present in the experiment and use this information to both understand biomolecular function and improve ensemble prediction.

Currently, each structure prediction model has a different implementation, requiring bespoke boilerplate code to plug each model into experimental guidance. Our goal is to resolve this and expand the experimental methods we can provide guidance with. This will open new opportunities for model evaluation directly against experimental data, and help unlock new sources of data for training the next generation of biomolecular structure predictors.

Installation

Requirements: Linux x86-64, CUDA 12, Python ≥ 3.11, < 3.14

1. Install Pixi

curl -fsSL https://pixi.sh/install.sh | sh

2. Clone and install

git clone [email protected]:diff-use/sampleworks.git
cd sampleworks
pixi install -a   # install all environments

Note: pixi install -a resolves all environments. This (currently) requires CUDA 12 and will fail on machines without it.

Each generative model has its own Pixi environment. Install only what you need:

pixi install -e boltz      # Boltz-1 / Boltz-2
pixi install -e protenix   # Protenix
pixi install -e rf3        # RosettaFold3

3. Download model checkpoints

Boltz-1 and Boltz-2 (stored in ~/.boltz/):

pixi run -e boltz python -c "
from boltz.main import download_boltz1, download_boltz2
import pathlib
cache = pathlib.Path('~/.boltz/').expanduser()
download_boltz1(cache)
download_boltz2(cache)
"

Protenix: checkpoint is downloaded automatically on first use.

RosettaFold3 (RF3): see the RC-Foundry repository for instructions. Default path: ~/.foundry/checkpoints/rf3_foundry_01_24_latest.ckpt

Quick Start

Run Boltz-2 pure guidance on the included 1VME example:

pixi run -e boltz sampleworks-guidance \
    --model boltz2 \
    --guidance-type pure_guidance \
    --protein 1VME \
    --model-checkpoint ~/.boltz/boltz2_conf.ckpt \
    --structure tests/resources/1vme/1vme_final_carved_edited_0.5occA_0.5occB.cif \
    --density tests/resources/1vme/1vme_final_carved_edited_0.5occA_0.5occB_1.80A.ccp4 \
    --resolution 1.8 \
    --output-dir output/boltz2_pure_guidance \
    --guidance-start 130 \
    --ensemble-size 4 \
    --augmentation \
    --align-to-input

Output files appear in output/boltz2_pure_guidance/: refined.cif (final ensemble), losses.txt, trajectory/, run.log.

CLI reference

sampleworks-guidance is the unified command-line interface for running guidance on a single structure.

Required arguments:

Argument	Description
`--model`	`boltz1`, `boltz2`, `protenix`, or `rf3`
`--guidance-type`	`pure_guidance` or `fk_steering`
`--protein`	Protein identifier (should match naming used in grid search / evaluation)
`--structure`	Path to input structure file (CIF)
`--density`	Path to density map (CCP4/MRC/MAP)
`--resolution`	Map resolution in Angstroms

Model-specific arguments (e.g. --method for boltz2, --msa-path for rf3) and guidance-type-specific arguments (e.g. --num-particles for fk_steering) are included automatically. Run sampleworks-guidance --model <model> --guidance-type <type> --help to see all available options.

Grid Search

run_grid_search.py sweeps a model across scalers, ensemble sizes, and gradient weights:

pixi run -e boltz python run_grid_search.py \
    --proteins proteins.csv \
    --models boltz2 \                # options: boltz1, boltz2, protenix, rf3 (make sure env aligns!)
    --methods "X-RAY DIFFRACTION" \  # only useful for Boltz-2, ignored otherwise
    --scalers pure_guidance \        # options: pure_guidance, fk_steering, or both as space-separated list
    --ensemble-sizes "1 4" \
    --gradient-weights "0.1 0.2" \
    --output-dir grid_search_results \
    --gradient-normalization \       # normalize guidance update magnitude to diffusion update magnitude
    --augmentation \                 # apply random rotations and translations at each step (defaults for inference with AF3-like models)
    --align-to-input                 # align to input structure at each step (required for density guidance to work since it is not rotation/translation invariant)

proteins.csv format

Required columns and format. Supported density map formats: .ccp4, .mrc, .map (not MTZ or SF-CIF yet).

name,structure,density,resolution
1abc,/data/structures/1abc.cif,/data/maps/1abc.ccp4,2.0
2xyz,/data/structures/2xyz.cif,/data/maps/2xyz.mrc,1.8

Key arguments:

Argument	Description	Default
`--proteins`	CSV with structure/density/resolution columns	required
`--models`	Model to run. One of `boltz1`, `boltz2`, `protenix`, `rf3`	required
`--scalers`	Guidance method(s) to sweep	`pure_guidance fk_steering`
`--ensemble-sizes`	Space-separated values, e.g. `"1 4"`	`"1 2 4 8"`
`--gradient-weights`	Space-separated values, e.g. `"0.1 0.2"`	`"0.01 0.1 0.2"`
`--methods`	Boltz-2 sampling method (required for boltz2)	`X-RAY DIFFRACTION`
`--max-parallel`	Parallel workers (default: number of GPUs)	`auto`
`--dry-run`	Print jobs without running them	off
`--force-all`	Re-run including already-successful jobs	off
`--only-failed`	Re-run only failed jobs	off
`--only-missing`	Run only jobs not yet started	off

Output layout: grid_search_results/<protein>/<model>[_<method>]/<scaler>/ens<N>_gw<W>/

Note: Jobs are skipped if a refined.cif file already exists in the output directory. Some flags (e.g., --use-tweedie, --gradient-normalization) are not reflected in the directory structure, so changing them alone won't trigger a re-run. Use --force-all to re-run all jobs regardless. This is under active development and will likely change soon.

Instructions for running evaluation and metrics scripts are coming soon.

Docker

TODO: Docker container documentation

Development

We use Pixi to manage development environments and dependencies. Each model has its own environment, e.g. boltz-dev, protenix-dev, rf3-dev. To install dev dependencies and run tests:

pixi install -e [model]-dev    # add pytest, ruff, ty
pixi run -e [model]-dev all-tests  # run tests
pixi run test-all            # run all tests across all environments

Prek hooks (various formatting, ruff + ty type checking):

pixi run -e [model]-dev prek install
pixi run -e [model]-dev prek install --hook-type commit-msg
pixi run -e [model]-dev prek run --all-files

See tests/README.md for full testing instructions.

macOS (experimental)

To develop on OS X, ensure you have homebrew installed and run the following commands to install dependencies:

Install hatch and uv
```
brew install hatch uv
```
Move/copy pyproject-hatch.toml to pyproject.toml
Use uvx hatch run <command> to run commands. Note the use of uvx instead of uv
Use uvx hatch run <env>:<command> to run commands in a specific environment <env>.

There are different (and as yet untested) environments for boltz. protenix won't currently work on a Mac due to the strict requirement of triton which requires an NVIDIA GPU. You may find similar issues with other environments. Debug as needed.

Commit Messages

This project uses Conventional Commits to automate versioning and changelog generation. Format:

<type>(<scope>): <summary>

Common types: feat, fix, docs, refactor, chore, test, perf. A commitizen pre-commit hook validates messages at commit time. See AGENTS.md for full details.

Name		Name	Last commit message	Last commit date
Latest commit History 379 Commits
.github/workflows		.github/workflows
scripts		scripts
src/sampleworks		src/sampleworks
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
GRID_SEARCH.md		GRID_SEARCH.md
LICENSE		LICENSE
README.md		README.md
docker-entrypoint.sh		docker-entrypoint.sh
pixi.lock		pixi.lock
pyproject-hatch.toml		pyproject-hatch.toml
pyproject.toml		pyproject.toml
run_all_models.sh		run_all_models.sh
run_grid_search.py		run_grid_search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sampleworks

Why sampleworks?

Installation

1. Install Pixi

2. Clone and install

3. Download model checkpoints

Quick Start

CLI reference

Grid Search

Docker

Development

macOS (experimental)

Commit Messages

About

Uh oh!

Releases 14

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sampleworks

Why sampleworks?

Installation

1. Install Pixi

2. Clone and install

3. Download model checkpoints

Quick Start

CLI reference

Grid Search

Docker

Development

macOS (experimental)

Commit Messages

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages