Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
df76c69
feat: add dockq modal app
y1zhou Apr 29, 2026
3440253
feat: add ppiflow modal workflow orchestrator
y1zhou Apr 29, 2026
8d39e0a
Use polars for DockQ tables
y1zhou May 7, 2026
c08bc53
chore: keep ppiflow workflow branch self-contained
y1zhou May 15, 2026
77dcb5e
Merge branch 'main' into codex/ppiflow-workflow-orchestrator
y1zhou May 15, 2026
b3a0619
Merge branch 'main' into codex/ppiflow-workflow-orchestrator
y1zhou May 15, 2026
5f8e15f
feat(cli): standardize app collection into dataclass
y1zhou May 15, 2026
808545c
fix(cli): clearer docs
y1zhou May 15, 2026
6367c11
Merge branch 'main' into codex/ppiflow-workflow-orchestrator
y1zhou May 18, 2026
f2f3fc5
build: add pytest dev dependency
y1zhou May 18, 2026
8700a11
feat: add workflow runtime foundation
y1zhou May 18, 2026
edcaf78
feat: add ppiflow v2 workflow definition
y1zhou May 18, 2026
a1220dd
fix: harden workflow runtime contracts
y1zhou May 18, 2026
49b2f25
docs: clarify ppiflow v2 workflow scope
y1zhou May 18, 2026
1d04ca4
refactor: organize workflow runtime core
y1zhou May 18, 2026
c3563d0
feat(cli): WIP of fetching workflow scripts, not just apps
y1zhou May 18, 2026
40087b8
feat: wire workflow catalog into cli
y1zhou May 18, 2026
52262f7
feat: expand workflow worker helpers
y1zhou May 18, 2026
94da320
fix: address workflow review blockers
y1zhou May 18, 2026
54f4fe8
feat: complete workflow runtime plan
y1zhou May 18, 2026
2ab97b0
fix: record workflow run status
y1zhou May 18, 2026
7d9a12f
fix: harden workflow runtime dispatch
y1zhou May 18, 2026
e6dca1f
fix: close workflow runtime durability gaps
y1zhou May 18, 2026
8172c20
fix: record workflow run metadata
y1zhou May 18, 2026
4f31073
docs: document workflow runtime decisions
y1zhou May 20, 2026
7c0bbd3
feat(schema): add pure app and storage contracts
y1zhou May 20, 2026
7d51410
feat(cli): add app and workflow namespaces
y1zhou May 20, 2026
6b3f932
feat(workflow): persist runtime state in sqlite
y1zhou May 20, 2026
8f5aaca
refactor(workflow): merge orchestrator modal boundary
y1zhou May 20, 2026
e8ffef2
fix(app): harden workflow output paths
y1zhou May 20, 2026
6c666f8
fix(helper): app runtime images will need to be patched with the new …
y1zhou May 20, 2026
bf9ae77
fix(alphafold3): look for MSA in correct subdirectory
y1zhou May 20, 2026
cdd6b23
fix: add subcommands to helper entry script
y1zhou May 20, 2026
bfd11d1
feat(helper): refactor app.catelog into helper module as its shared b…
y1zhou May 20, 2026
91836f2
build: explicitly add orjson as dependency
y1zhou May 20, 2026
edebc70
fix(workflow): preserve factory loading and remote call typing
y1zhou May 20, 2026
902fffc
refactor(workflow): simplify selectors and ppiflow step nodes
y1zhou May 20, 2026
c51d3bb
fix(cli): remove deprecated Typer flag arguments
y1zhou May 20, 2026
12afc91
docs(workflow): summarize runtime plan status
y1zhou May 20, 2026
3727d27
fix(workflow): stage steps inputs and preserve empty artifacts
y1zhou May 20, 2026
06442bf
fix(flowpacker): return workflow archive output kind
y1zhou May 20, 2026
7e02c30
docs(workflow): note archive adapter follow-up
y1zhou May 20, 2026
9b86da0
fix(cli): add namespaced app deploy command
y1zhou May 20, 2026
21830b9
fix(cli): allow modal to run without args when remote function is given
y1zhou May 22, 2026
e8984bd
ci(ruff): configure ruff in pyproject.toml rather than pre-commit config
y1zhou May 22, 2026
99f549f
perf(alphafold3): bump AF3 version to use cached featurization code
y1zhou May 22, 2026
472caac
feat(alphafold3): more robust inference in case of preemption or netw…
y1zhou May 22, 2026
df0e534
fix(alphafold3): avoid modifying dict in-place with del
y1zhou May 23, 2026
b36b1da
fix(workflow): address orchestration review feedback
y1zhou May 25, 2026
b6841a1
fix(gromacs): more robust handling of partially run trajectories
y1zhou May 26, 2026
b8a3822
feat(helper): stricter path sanitization
y1zhou May 26, 2026
ea3f6a0
refactor(helper): update dockq app and be clear that helper modules s…
y1zhou May 26, 2026
23688da
ci: maintenance
y1zhou May 26, 2026
2eda248
feat(workflow): support bounded scheduler submissions
y1zhou May 26, 2026
c203163
feat(workflow): add ShortMD GROMACS workflow
y1zhou May 26, 2026
a556471
fix(workflow): tolerate blank PPIFlow output parts
y1zhou May 26, 2026
24be4a5
test(app): cover DockQ helper reuse
y1zhou May 26, 2026
d6d5512
feat(workflow): simplify orchestrator submission
y1zhou May 26, 2026
a5d0f72
fix(workflow): close ledger before volume sync
y1zhou May 26, 2026
67888a9
fix(app): restore production run contracts
y1zhou May 27, 2026
f29981b
fix(workflow): isolate shortmd gromacs runs
y1zhou May 27, 2026
908e8f2
refactor(helper): move app.constant into helper.constant
y1zhou May 27, 2026
a7fdf0f
feat(workflow): simplify workflow orchestration by including dependen…
y1zhou May 27, 2026
8768e7a
refactor(workflow): share app volume path handling
y1zhou May 28, 2026
38d089a
refactor(workflow): clean up unused modules and methods
y1zhou May 28, 2026
7ba454d
refactor(ppiflow): remove the previous draft and related tests
y1zhou May 28, 2026
3044184
docs: update app development agent docs
y1zhou May 28, 2026
a0472f6
build: bump modal version to use the new APIs
y1zhou May 28, 2026
a99df97
fix: wrong binary names in examples/
y1zhou May 20, 2026
d43f1e2
refactor: use the new Image.pipe() API for patching
y1zhou May 28, 2026
e81a55c
fix: updated "app" subcommand for examples
y1zhou May 28, 2026
b83892b
fix(antifold): safeguards for pre-Python 3.11 imports
y1zhou May 28, 2026
015c863
docs(agent): add workflow development skill and update app dev skill
y1zhou May 29, 2026
aab09c2
feat: WIP of migrating to the new volume mounts API
y1zhou May 29, 2026
b7a31bd
fix(schema): normalize workflow storage paths
y1zhou May 31, 2026
9131cb8
fix(app): align volume mounts and path reporting
y1zhou May 31, 2026
5d9babd
feat(workflow): wire ppiflow through orchestrator api
y1zhou May 31, 2026
ece08c6
docs(agent): refresh app and workflow guidance
y1zhou May 31, 2026
27a458f
fix(workflow): stage only active ppiflow inputs
y1zhou May 31, 2026
c8febcf
fix(flowpacker): checkpoint volume mount
y1zhou Jun 1, 2026
49d3025
fix(af3score): follow new API guidelines
y1zhou Jun 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 16 additions & 6 deletions .agents/skills/biomodals-app-development/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,26 @@ For new apps, ask the user which data-flow class applies before choosing archite

## Implementation Rules

Keep app code compatible with `biomodals help` and app discovery:
Keep app code compatible with `biomodals app help` and app discovery:

- Name files `<toolname>_app.py` under `src/biomodals/app/<category>/`.
- Use a user-facing module docstring with upstream links, prerequisites, and output behavior.
- Add `# ruff: noqa: PLC0415` near the top.
- Use module-level `CONF = AppConfig(...)` for new apps; pin `repo_commit_hash` or `version`.
- Let `gpu` and `timeout` be overridden from `os.environ`.
- Build runtime images through `patch_image_for_helper(...)`.
- Prefer helpers from `biomodals.helper` and `biomodals.helper.shell` instead of open-coded shell, archive, copy, download, hashing, or warmup logic.
- Name local entrypoints `submit_<toolname>_task(...)` and use Google-style `Args:` docstrings so `biomodals help <app>` renders flags.
- Before adding app or workflow helpers, check `biomodals.helper` first.
Reuse existing helper APIs for local output paths, shell, archive, copy,
download, hashing, warmup, and serialization behavior; only define local
helpers when the behavior is app-specific and no shared helper fits.
- Prefer `CONF.mounts(...)` for model and output volumes. Import shared volumes
from `biomodals.helper.constant` only when a function needs a nonstandard
mountpoint, a shared database/cache volume, or an explicit `commit()`.
When using `Volume.with_mount_options(...)` directly, combine read-only and
subpath options in one call.
- Avoid extracting trivial two- or three-line helpers that are used only once or
twice. Inline them and add a short comment when the intent is not obvious.
- Name local entrypoints `submit_<toolname>_task(...)` and use Google-style `Args:` docstrings so `biomodals app help <app>` renders flags.
- Use `🧬` for local entrypoint status messages and `💊` for remote Modal-container status messages.
- Keep Modal function return values primitive when practical: `int`, `str`,
`float`, `bool`, `bytes`, `list`, `dict`, or `None`. Return complex objects
Expand All @@ -47,14 +57,14 @@ When reviewing or finishing an app change, check:
- Discovery: path, filename, app name, and local entrypoint name match CLI expectations.
- Reproducibility: upstream version or commit is pinned.
- Runtime boundaries: dependencies used only inside Modal images stay lazily imported.
- Volumes: model volumes are read-only for inference unless the tool writes caches there; writable volumes are committed after writes.
- Data flow: quick jobs return `.tar.zst` bytes via `package_outputs(...)`; persistent, resumable, or batch jobs use `CONF.get_out_volume()` or shared volumes.
- Volumes: model/cache mounts use app-specific subdirectories when practical; inference mounts are read-only unless the tool writes caches there; writable volumes are committed after writes; mounted volume paths are logged or returned as `VolumePath` when they cross app/workflow boundaries.
- Data flow: quick jobs return `.tar.zst` bytes via `package_outputs(...)`; persistent, resumable, or batch jobs use `CONF.output_volume`, `CONF.mounts(output_volume=True)`, or shared volumes.
- Modal return payloads: prefer primitive, `cloudpickle`-serializable values;
avoid returning `Path` objects directly or nested inside tuples, lists, dicts,
or dataclasses.
- Output safety: local output directories are created, existing tarballs are not overwritten accidentally, and final paths or Modal volume locations are printed.
- CLI docs: local entrypoint docstrings use exact Google-style `Args:` formatting with continuation indentation.
- Verification: run `prek run --files <changed files>` when practical, plus `uv run biomodals list` and `uv run biomodals help <app-name>` for CLI or discovery changes.
- Verification: run `prek run --files <changed files>` when practical, plus `uv run biomodals app list` and `uv run biomodals app help <app-name>` for CLI or discovery changes.

## Reference

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ This reference is the maintained app-development standard for files under `src/b

Biomodals apps are self-contained Modal applications wrapping bioinformatics tools. They live under `src/biomodals/app/<category>/`.

- Name app files `<toolname>_app.py`; the `_app.py` suffix is how `cli.py` discovers apps with `APP_HOME.glob("*/*_app.py")`.
- Name app files `<toolname>_app.py`; the `_app.py` suffix is how `biomodals.helper.catalog` discovers apps by scanning `src/biomodals/app/<category>/*_app.py`.
- Place apps in an appropriate category such as `fold/`, `design/`, `score/`, or `bioinfo/`.
- The CLI app name is the filename stem with `_app` stripped, for example `protenix_app.py` becomes `protenix`.
- The catalog entry name is the filename stem with `_app` stripped, for example `protenix_app.py` becomes `protenix`.
- Use section banners to keep modules scan-friendly:
- module docstring
- imports
Expand All @@ -20,7 +20,7 @@ Biomodals apps are self-contained Modal applications wrapping bioinformatics too

## Module Docstring

The module docstring is rendered verbatim by `biomodals help <app>` as Markdown. Keep it user-facing and include the upstream source URL, important prerequisites, caveats, and output behavior.
The module docstring is rendered verbatim by `biomodals app help <app>` as Markdown. Keep it user-facing and include the upstream source URL, important prerequisites, caveats, and output behavior.

Typical shape:

Expand Down Expand Up @@ -48,6 +48,12 @@ Use optional configuration tables only when the local entrypoint docstring is in

New apps should define module-level `CONF = AppConfig(...)`.

The pure Pydantic schema lives in `biomodals.schema.app.AppConfig`. Import the
Modal-compatible wrapper from `biomodals.app.config` when you need volume or
image helpers; otherwise import the schema directly. The wrapper adds
`output_volume`, `output_volume_name`, and `mounts(...)` helpers while keeping
the same schema fields and validators.

```python
from biomodals.app.config import AppConfig

Expand All @@ -62,6 +68,7 @@ CONF = AppConfig(
cuda_version="cu128",
gpu=os.environ.get("GPU", "L40S"),
timeout=int(os.environ.get("TIMEOUT", "3600")),
depends_on_apps=("gromacs",), # only for workflows that compose other apps
)
```

Expand All @@ -70,9 +77,11 @@ Rules:
- Pin either `repo_commit_hash` or `version`, or both.
- Let `gpu` and `timeout` be overridden by environment variables with sensible defaults.
- Use `CONF.default_env` when setting image environment variables. It provides standard UV, Hugging Face, Torch, and torch backend environment.
- Use `CONF.model_dir`, `CONF.git_clone_dir`, `CONF.model_volume_mountpoint`, and related fields instead of hardcoded paths.
- Use `CONF.git_clone_dir`, `CONF.model_volume_mountpoint`,
`CONF.model_volume_subdir`, and related fields instead of hardcoded paths.
- Set `depends_on_apps` only for workflow apps that compose other Biomodals apps; standalone apps should leave it empty.

Use an `AppInfo` dataclass only when grouping several related app constants improves readability. For a few simple constants, module-level constants such as `OUT_VOLUME` or `OUTPUTS_DIR` are acceptable.
Use an `AppInfo` dataclass only when grouping several related app constants improves readability. Prefer `CONF.output_volume`, `CONF.output_volume_name`, and `CONF.mounts(...)` over module-level output-volume aliases in new code.

## Image Construction

Expand All @@ -97,11 +106,41 @@ app = modal.App(CONF.name, image=runtime_image, tags=CONF.tags)

## Volumes

- Import shared volumes from `biomodals.app.constant`, such as `MODEL_VOLUME` or `MSA_CACHE_VOLUME`.
- Mount model weights read-only for inference when the function only reads model artifacts.
- Use `CONF.model_volume_mountpoint` for model volume mount paths.
- Commit volume changes explicitly after writes with `VOLUME.commit()`.
- Use `CONF.get_out_volume()` for app-specific persistent outputs.
- Import shared volumes from `biomodals.helper.constant`, such as `MODEL_VOLUME` or `MSA_CACHE_VOLUME`.
- Prefer `CONF.mounts(...)` for standard app model and output mounts. Use raw
`Volume.with_mount_options(...)` only for nonstandard mountpoints, shared
database/cache volumes, or code that must explicitly access a volume object.
- Mount only the subdirectory a function needs when a shared volume contains
app-specific data. For model weights under `MODEL_VOLUME`, the usual pattern
is `CONF.mounts(model_volume=True)`, which mounts `CONF.model_volume_subdir`
at `CONF.model_volume_mountpoint`.
- Mount model weights read-only for inference when the function only reads
model artifacts. If both read-only and subpath behavior are needed, use
`CONF.mounts(model_volume=True)`, or
`MODEL_VOLUME.with_mount_options(read_only=True, sub_path=...)` for custom
mountpoints.
Do not chain `MODEL_VOLUME.read_only().with_mount_options(...)`; Modal rejects
adding mount options after a read-only wrapper has already been created.
- Use `CONF.model_volume_mountpoint` for app-specific model directories. Use
`CONF.mounts(model_volume=True, is_huggingface=True)` only when the tool
stores Hugging Face-managed artifacts under `CONF.default_env` paths such as
`HF_HOME`.
- When upstream code expects a hardcoded cache path, mount the app-specific
shared-volume subdirectory at that path rather than changing unrelated app
logic. PaddleOCR, AbNatiV, and AntiFold are examples of this pattern.
- Shared cache volumes such as `MSA_CACHE_VOLUME` should also use subpath mounts
when an app only needs its own namespace. Shared database volumes that expose a
complete database root can stay mounted whole.
- For download/setup functions that populate model volumes, use
`CONF.mounts(model_volume=True, model_ro=False)` and commit the backing shared
volume after writes.
- Commit output or cache volume changes explicitly after writes with
`VOLUME.commit()`.
- Use `CONF.output_volume` and `CONF.mounts(output_volume=True)` for
app-specific persistent outputs; output volumes are normally mounted whole.
- Use `volume_path_from_mount_path(...)` when printing or returning remote
volume paths so logs show a validated `VolumePath` instead of an ambiguous
absolute container path.

## Remote Functions

Expand All @@ -117,6 +156,9 @@ Always specify a timeout with `CONF.timeout` or `MAX_TIMEOUT`. Add resource hint
`float`, `bool`, `bytes`, `list`, `dict`, or `None`. Return complex objects
only when they provide much more benefit than a primitive representation, and
the returned type must be serializable by `cloudpickle`.
- Workflow-compatible app functions are the main exception to the primitive
preference: return `AppRunResult` from `biomodals.schema` so workflows can
materialize `AppOutput` artifacts consistently.
- Keep `Path` objects internal to the local process or Modal container. Return
file paths, volume paths, and relative output paths as `str(path)`, including
paths nested inside tuples, lists, dicts, or dataclasses. Convert back with
Expand All @@ -131,7 +173,7 @@ Resource pattern:
cpu=(0.125, 16.125),
memory=(1024, 65536),
timeout=MAX_TIMEOUT,
volumes={CONF.model_volume_mountpoint: MODEL_VOLUME.read_only()},
volumes=CONF.mounts(model_volume=True),
)
```

Expand All @@ -152,6 +194,11 @@ Prefer existing helpers instead of reimplementing common behavior:
- `hash_string(s)` from `biomodals.helper` for cache keys.
- `patch_image_for_helper(image)` from `biomodals.helper` for Modal images.

Avoid extracting trivial two- or three-line helpers that are used once or twice.
Inline those operations with a short comment when that reads better. Add a
local helper only when the behavior is app-specific, repeated enough to clarify
the module, or absent from `biomodals.helper`.

## Local Entrypoint

The `@app.local_entrypoint()` function is the user-facing orchestration layer on the local machine.
Expand All @@ -164,7 +211,7 @@ The `@app.local_entrypoint()` function is the user-facing orchestration layer on
- Write returned tarball bytes locally.
- Print final local path or Modal volume location.

Docstring rules for `biomodals help`:
Docstring rules for `biomodals app help`:

- Use Google-style docstrings with an `Args:` section.
- Put `Args:` on its own line.
Expand All @@ -187,6 +234,9 @@ Choose architecture by job type:
- Short-lived inference usually sends local input bytes to remote functions and returns tarball bytes directly.
- Long-running apps should cache intermediate and final results in Modal volumes.
- Parallel or interruptible runs should use queues, locks, stable run IDs, and resumable runners where possible.
- Workflow-compatible app functions should reuse existing remote app behavior
where practical, preserve standalone local entrypoints unchanged, and return
`AppRunResult` with `VolumePath` storage for durable outputs.

Before choosing data flow for a new app, ask whether it is short-lived inference, long-running/cached, or parallel/resumable unless already clear from the request.

Expand All @@ -203,10 +253,10 @@ Older apps can use raw constants such as `GPU`, `TIMEOUT`, and `APP_NAME`. When

## Examples And Verification

- When app development changes invocation or adds a new app, add or update an example bash script under `examples/app/` using `biomodals run`.
- When app development changes invocation or adds a new app, add or update an example bash script under `examples/app/` using `biomodals app run`.
- Use small example inputs under `examples/data/` only when existing data is insufficient.
- For Modal functions, verify returned payloads are primitive or otherwise
intentionally complex and `cloudpickle`-serializable; convert returned paths to
strings.
- After edits, run `prek run --files <changed files>` when practical.
- For CLI or app discovery changes, smoke test `uv run biomodals list` and `uv run biomodals help <app-name>` when practical.
- For CLI or app discovery changes, smoke test `uv run biomodals app list` and `uv run biomodals app help <app-name>` when practical.
54 changes: 54 additions & 0 deletions .agents/skills/biomodals-workflow-development/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
---
name: biomodals-workflow-development
description: Use when creating, editing, or reviewing Biomodals workflow code under src/biomodals/workflow/, shared workflow schemas under src/biomodals/schema/, workflow-compatible app functions, or workflow CLI/tests, including ShortMD-style DAG construction, orchestrator composition, app dependency inclusion, workflow artifacts, and Modal volume handling.
---

# Biomodals Workflow Development

Use this skill for Biomodals workflow scripts, the reusable workflow runtime,
workflow schemas, and workflow-compatible app integration points.

## Core Workflow

Before making non-trivial workflow changes, read
`references/workflow-development.md` for the maintained standards.

Use `src/biomodals/workflow/shortmd_workflow.py` as the primary end-to-end
example for app-composed workflows. Ignore
`src/biomodals/workflow/ppiflow_workflow.py` as a reference pattern for now; it
is expected to be refactored.

## Working Rules

- Keep `biomodals.schema` pure Pydantic and free of Modal imports.
- Compose workflow apps with `from biomodals.workflow.core import orchestrator`
and `modal.App(...).include(orchestrator.app)`.
- Declare app dependencies on `AppConfig.depends_on_apps`, mirror them into
`CONF.tags["depends_on"]` for Modal UI metadata, and compose them with
`include_dependency_apps(app, CONF.depends_on_apps)`.
- Prefer included-app Modal handles over deployed-app lookup strings. Do not add
`modal.Function.from_name(...)` to new workflow code when the dependency app
can be included.
- Prefer `AppBackedNode` for nodes that primarily call app functions.
Add `WorkflowNativeNode` only for adapters, summaries, selectors, and
workflow-specific file-management glue.
- Store hydrated Modal functions/classes in a small `*ModalNamespace` dataclass
typed as `modal.Function` or `modal.Cls`, and exclude that namespace from DAG
hashing with `repr=False`, `compare=False`, and `metadata={"dag_hash": False}`.
- Define workflow-specific remote file-management functions as top-level
`@app.function`s in the workflow module and put their hydrated handles in the
workflow's `*ModalNamespace`. Do not make ordinary node methods Modal
functions.
- Import app-owned volume handles, volume names, and mountpoints from source app
modules. Avoid duplicating volume strings in workflow scripts.
- Use `volume_path_from_mount_path(...)` to convert mounted app paths into
`VolumePath` workflow storage references.
- Keep the core runtime slim. Add public orchestrator/runtime API only for clear
missing capabilities, not one-off workflow conveniences.

## Verification

For workflow changes, run focused pytest coverage first, then `prek run --files
<changed files>` when practical. For CLI or discovery changes, also smoke test
`uv run biomodals workflow list` and the affected `biomodals workflow help/run`
path.
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
interface:
display_name: "Biomodals Workflow Development"
short_description: "Build ShortMD-style Biomodals workflows"
default_prompt: "Use $biomodals-workflow-development to update a Biomodals workflow while following the ShortMD reference pattern."
Loading