Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Each model type gets a typed request/response contract (Pydantic). Batching, cac

PyPI: `pip install sheaf-serve`

## Current state: v0.10.0 shipped (Docker base image at `ghcr.io/korbonits/sheaf-serve` + KubeRay `RayService` example + `sheaf.build_app(spec)` public API)
## Current state: v0.11.0 shipped (ESMC + ESMFold2 backends, new `PROTEIN_LANGUAGE` + `STRUCTURE` model categories, `[protein]` extra, Biohub 2026-05-27 release)

Per-version ship notes live in git history and release tags. This doc tracks what exists *now* and the non-obvious design choices behind it. For feature-level changelog, see `git log`.

Expand Down Expand Up @@ -194,6 +194,19 @@ tests/ # test_<backend>_backend.py (mocked) + test_smoke_*.py (g
- **Request-time adapter validation returns 422, not 500** — when a request specifies `adapters=["foo"]` against a deployment with no `lora` configured, or specifies an adapter name not in `spec.lora.adapters`, both `_SheafDeployment.predict` and the Modal predict handler raise `HTTPException(422, ...)`. This is a client-error contract: the request is well-formed JSON but references unknown server state, so 422 (Unprocessable Entity) is the right code. `resolve_active_adapters` raising `ValueError` deeper in the stack would otherwise become a 500.
- **Diffusers `load_lora_weights` + `set_adapters` API** — both FLUX and SDXL backends call `pipeline.load_lora_weights(path_or_repo, adapter_name=name, [weight_name=file])` per adapter at load time, and `pipeline.set_adapters(names, adapter_weights=weights)` per sub-batch. Diffusers ≥0.27 + PEFT ≥0.7 handles all of the actual LoRA composition (named adapter slots, weight scaling, adapter merging) — sheaf's job is just to thread the per-deployment registry and the per-request selection through to those two calls.

### Protein models (v0.11 — Biohub ESMC + ESMFold2)

- **`MOLECULAR` and `PROTEIN_LANGUAGE` are separate model categories** — `MOLECULAR` (ESM-3, `MolecularResponse.embeddings: list[list[float]]`) returns one pooled vector per sequence; `PROTEIN_LANGUAGE` (ESMC, `ProteinLanguageResponse.logits/embeddings: list[list[list[float]]]`) returns ragged per-token tensors sliced to `seq_lens[i]`. Unifying them would force every caller to branch on `model_name` to interpret the shape — defeats the typed-contract premise. Documented in ADR-0001.
- **`STRUCTURE` is the first non-tensor output category** — `StructureResponse.structure: str` is a PDB or mmCIF text block, not numerical data. Caching is fine (in-process LRU, SHA-256 over the request) but the cached payload can be 40+ KB per fold; future structure backends (Boltz-1, Chai-1) can reuse the contract.
- **ESMC `MaskedLMOutput` has no `last_hidden_state`** — `transformers.AutoModelForMaskedLM` returns `MaskedLMOutput`, which exposes `.logits` + `.hidden_states` (when requested) but **not** `.last_hidden_state` (that's on `BaseModelOutput`). `ESMCBackend._run` forces `output_hidden_states=True` whenever `return_embeddings` is True and reads uniformly from `hidden_states[-1]`. The bug was a silent AttributeError lurking past mocked tests until the H100 smoke caught it; the test mock now mirrors the real `MaskedLMOutput` shape (no `last_hidden_state` attr).
- **ESMFold2 pLDDT is on `[0, 1]`, not `[0, 100]`** — verified empirically on Modal H100 (2026-05-27). Sheaf passes through faithfully without scaling (consistent with "validate at the boundary, don't transform in backends"); callers who want the conventional AlphaFold / ESMFold-v1 scale multiply by 100 themselves. `StructureResponse.plddt` docstring + ADR-0001 both document this.
- **ESMFold2 is single-sample-per-call upstream** — `ESMFold2InputBuilder().fold()` runs one structure at a time; `batch_predict` runs requests sequentially with no true batched forward. Per-request compute varies hugely with sequence length × `num_loops` × `num_samples`. Operators size Ray Serve replica count to expected concurrency rather than relying on intra-replica batching.
- **`[protein]` and `[molecular]` extras are mutually exclusive** — both ship a package named `esm` (Biohub's 2026 release vs the pre-2026 EvolutionaryScale PyPI 3.x). Declared in `[tool.uv].conflicts`; sheaf cannot serve ESM-3 and ESMC in the same process. Users who need both run them in separate Sheaf deployments.
- **`modal_server.py` has its own `AnyRequest` union** — deliberately separate from `sheaf.api.union` so Modal containers don't pull Ray as a transitive dep. Adding a new model type means updating **both** unions, plus the registry imports inside `_build_asgi_app`. The v0.11 PR initially missed the modal-side update for `ProteinLanguageRequest` + `StructureRequest` — protein requests would have 422'd on Modal until the follow-up fix.
- **`esm` git pin tracks Modal's reference example** — pinned to commit `81b3646c9429ea8458918415ad6a46178cb59833` (long SHA), matching `modal-labs/modal-examples 06_gpu_and_ml/protein-folding/esmfold2.py`. This is the revision verified via `examples/quickstart_protein_modal.py`; bump in lockstep with Modal's example when their next pin lands.
- **ESMFold2 `_to_float_list` / `_maybe_2d_list` helpers** — coerce upstream tensor-or-list outputs to plain `list[float]` / `list[list[float]]` for JSON serialisation. The pattern checks for `.cpu()` (torch tensor) and `.item()` (torch scalar) before falling back to `float(value)`, so test stubs can pass raw lists/floats without importing torch.
- **Forge / Biohub-Platform variants raise `NotImplementedError` at `load()`** — `esmc-300m-2024-12`, `esmc-600m-2024-12`, `esmfold2-fast-2026-05` require Biohub's HTTP-client SDK (`esm.sdk.esmc_client`, `SequenceStructureForgeInferenceClient`) with an API token. Sheaf's v0.11 backends explicitly reject these IDs at load with a pointer to ADR-0001; wiring up the Forge HTTP client is a deliberate future PR (separate code path, no local weights).

## Adding a new backend

1. Add typed request/response to `src/sheaf/api/<model_type>.py`
Expand Down
30 changes: 29 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,16 @@ Each model type gets a typed request/response contract. Batching, caching, and s
> curl https://korbonits--sheaf-demo-modalserver---init----locals---serve.modal.run/chronos/health
> ```

> **Requires Python 3.11+.** macOS's system `python3` is usually 3.10 — bootstrap a 3.11 venv first via [`uv`](https://docs.astral.sh/uv/) (`uv venv --python 3.11 .venv && source .venv/bin/activate`) or `pyenv`. The `[molecular]` extra (ESM-3) additionally requires Python 3.12+.
> **Requires Python 3.11+.** macOS's system `python3` is usually 3.10 — bootstrap a 3.11 venv first via [`uv`](https://docs.astral.sh/uv/) (`uv venv --python 3.11 .venv && source .venv/bin/activate`) or `pyenv`. The `[molecular]` and `[protein]` extras (ESM-3 / ESMC / ESMFold2) additionally require Python 3.12+, and the two are mutually exclusive (they share the `esm` import name from different packages).

```bash
pip install sheaf-serve # core only
pip install "sheaf-serve[time-series]" # + Chronos2 / TimesFM / Moirai
pip install "sheaf-serve[tabular]" # + TabPFN
pip install "sheaf-serve[molecular]" # + ESM-3 (Python 3.12+)
pip install "sheaf-serve[protein]" # + ESMC / ESMFold2 deps (Python 3.12+)
# then also (no PyPI release yet — pinned commit per upstream README):
pip install "esm@git+https://github.com/Biohub/esm.git@81b3646c9429ea8458918415ad6a46178cb59833"
pip install "sheaf-serve[genomics]" # + Nucleotide Transformer
pip install "sheaf-serve[small-molecule]" # + MolFormer
pip install "sheaf-serve[materials]" # + MACE-MP
Expand Down Expand Up @@ -179,6 +182,18 @@ See [`examples/`](examples/) for time series comparison, tabular, audio, vision,

---

## Protein models

Sheaf serves three protein foundation models, each via its own typed contract:

- **ESM-3** (`api/molecular.py`, backend `esm3`) — per-sequence pooled embeddings (mean / cls). Use for sequence-level similarity, clustering, and downstream featurization. `[molecular]` extra (Python 3.12+).
- **ESMC** (`api/protein_language.py`, backend `esmc`) — per-token logits + optional per-token embeddings from Biohub's 2026-05-27 release. Use when you need masked-LM logits, per-residue representations, or all-layer hidden states. Default model: `Biohub/ESMC-6B`. `[protein]` extra (Python 3.12+); 300M / 600M variants are Forge API-only and currently raise `NotImplementedError`.
- **ESMFold2** (`api/structure.py`, backend `esmfold2`) — protein structure prediction with inference-time scaling. Exposes `num_loops`, `num_sampling_steps`, `num_samples`, `seed` as first-class request fields; returns PDB / mmCIF + pLDDT + pTM/ipTM + optional PAE. Default model: `biohub/ESMFold2`. `[protein]` extra (Python 3.12+).

`[molecular]` (ESM-3) and `[protein]` (ESMC + ESMFold2) share the `esm` import name from different upstream packages — install one **or** the other in a given environment. See [`docs/adr/0001-esmc-esmfold2-integration.md`](docs/adr/0001-esmc-esmfold2-integration.md) for the rationale.

Biohub release announcement: <https://github.com/Biohub/esm> · preprint: <https://biohub.ai/papers/esm_protein.pdf>.

## Supported model types

| Type | Status | Backends |
Expand All @@ -193,6 +208,8 @@ See [`examples/`](examples/) for time series comparison, tabular, audio, vision,
| Depth estimation | ✅ v0.3 | Depth Anything v2 |
| Object detection | ✅ v0.3 | DETR / RT-DETR |
| Protein / molecular | ✅ v0.3 | ESM-3 (Python 3.12+) |
| Protein language modeling | ✅ v0.11 | ESMC 6B (Biohub) |
| Protein structure prediction | ✅ v0.11 | ESMFold2 (Biohub) — inference-time scaling |
| Genomics | ✅ v0.3 | Nucleotide Transformer |
| Small molecule | ✅ v0.3 | MolFormer-XL |
| Materials science | ✅ v0.3 | MACE-MP-0 |
Expand Down Expand Up @@ -309,6 +326,17 @@ Today sheaf ships three deployment paths: `ModelServer` (a local Ray cluster you
- [ ] `examples/k8s/` with a `RayService` manifest — KubeRay's canonical Ray-on-K8s shape — and a short `README.md` covering prereqs (KubeRay operator installed), `kubectl apply`, and a port-forward smoke test.
- [ ] GitHub Actions workflow that builds + pushes the Dockerfile to `ghcr.io/korbonits/sheaf-serve:vX.Y.Z` on `v*` tag push, mirroring the PyPI publish flow.

**v0.11 — Biohub protein-biology release integration**

Biohub's "world model of protein biology" landed 2026-05-27 under MIT. Sheaf integrates the two model artifacts as first-class typed contracts; ESM Atlas (dataset) is out of scope. See [`docs/adr/0001-esmc-esmfold2-integration.md`](docs/adr/0001-esmc-esmfold2-integration.md).

- [x] `ESMCBackend` — per-token logits + per-token embeddings via `transformers.AutoModelForMaskedLM`, default `Biohub/ESMC-6B`.
- [x] `ESMFold2Backend` — protein structure prediction with `num_loops` / `num_sampling_steps` / `num_samples` / `seed` as first-class request fields, returning PDB / mmCIF + pLDDT + pTM/ipTM + optional PAE.
- [x] New `STRUCTURE` model category — first non-tensor output category (structure file as text).
- [x] `[protein]` install extra; `esm` from `git+https://github.com/Biohub/esm.git@81b3646c9429ea8458918415ad6a46178cb59833` documented (no PyPI release yet).
- [x] End-to-end GPU smoke — `examples/quickstart_protein_modal.py` runs `ESMFold2Backend` on H100 via Modal (~70s cold start to a persistent volume, sub-second per fold). 53-residue target → 43,088-char mmCIF, pTM=0.2465.
- [ ] Forge / Biohub-Platform HTTP-client variants for the ESMC 300M / 600M / ESMFold2-fast API-only models.

---

## Architecture
Expand Down
Loading
Loading