korbonits · korbonits · May 28, 2026 · May 27, 2026 · May 28, 2026 · May 28, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -8,7 +8,7 @@ Each model type gets a typed request/response contract (Pydantic). Batching, cac
 
 PyPI: `pip install sheaf-serve`
 
-## Current state: v0.10.0 shipped (Docker base image at `ghcr.io/korbonits/sheaf-serve` + KubeRay `RayService` example + `sheaf.build_app(spec)` public API)
+## Current state: v0.11.0 shipped (ESMC + ESMFold2 backends, new `PROTEIN_LANGUAGE` + `STRUCTURE` model categories, `[protein]` extra, Biohub 2026-05-27 release)
 
 Per-version ship notes live in git history and release tags. This doc tracks what exists *now* and the non-obvious design choices behind it. For feature-level changelog, see `git log`.
 
@@ -194,6 +194,19 @@ tests/                 # test_<backend>_backend.py (mocked) + test_smoke_*.py (g
 - **Request-time adapter validation returns 422, not 500** — when a request specifies `adapters=["foo"]` against a deployment with no `lora` configured, or specifies an adapter name not in `spec.lora.adapters`, both `_SheafDeployment.predict` and the Modal predict handler raise `HTTPException(422, ...)`. This is a client-error contract: the request is well-formed JSON but references unknown server state, so 422 (Unprocessable Entity) is the right code. `resolve_active_adapters` raising `ValueError` deeper in the stack would otherwise become a 500.
 - **Diffusers `load_lora_weights` + `set_adapters` API** — both FLUX and SDXL backends call `pipeline.load_lora_weights(path_or_repo, adapter_name=name, [weight_name=file])` per adapter at load time, and `pipeline.set_adapters(names, adapter_weights=weights)` per sub-batch. Diffusers ≥0.27 + PEFT ≥0.7 handles all of the actual LoRA composition (named adapter slots, weight scaling, adapter merging) — sheaf's job is just to thread the per-deployment registry and the per-request selection through to those two calls.
 
+### Protein models (v0.11 — Biohub ESMC + ESMFold2)
+
+- **`MOLECULAR` and `PROTEIN_LANGUAGE` are separate model categories** — `MOLECULAR` (ESM-3, `MolecularResponse.embeddings: list[list[float]]`) returns one pooled vector per sequence; `PROTEIN_LANGUAGE` (ESMC, `ProteinLanguageResponse.logits/embeddings: list[list[list[float]]]`) returns ragged per-token tensors sliced to `seq_lens[i]`. Unifying them would force every caller to branch on `model_name` to interpret the shape — defeats the typed-contract premise. Documented in ADR-0001.
+- **`STRUCTURE` is the first non-tensor output category** — `StructureResponse.structure: str` is a PDB or mmCIF text block, not numerical data. Caching is fine (in-process LRU, SHA-256 over the request) but the cached payload can be 40+ KB per fold; future structure backends (Boltz-1, Chai-1) can reuse the contract.
+- **ESMC `MaskedLMOutput` has no `last_hidden_state`** — `transformers.AutoModelForMaskedLM` returns `MaskedLMOutput`, which exposes `.logits` + `.hidden_states` (when requested) but **not** `.last_hidden_state` (that's on `BaseModelOutput`). `ESMCBackend._run` forces `output_hidden_states=True` whenever `return_embeddings` is True and reads uniformly from `hidden_states[-1]`. The bug was a silent AttributeError lurking past mocked tests until the H100 smoke caught it; the test mock now mirrors the real `MaskedLMOutput` shape (no `last_hidden_state` attr).
+- **ESMFold2 pLDDT is on `[0, 1]`, not `[0, 100]`** — verified empirically on Modal H100 (2026-05-27). Sheaf passes through faithfully without scaling (consistent with "validate at the boundary, don't transform in backends"); callers who want the conventional AlphaFold / ESMFold-v1 scale multiply by 100 themselves. `StructureResponse.plddt` docstring + ADR-0001 both document this.
+- **ESMFold2 is single-sample-per-call upstream** — `ESMFold2InputBuilder().fold()` runs one structure at a time; `batch_predict` runs requests sequentially with no true batched forward. Per-request compute varies hugely with sequence length × `num_loops` × `num_samples`. Operators size Ray Serve replica count to expected concurrency rather than relying on intra-replica batching.
+- **`[protein]` and `[molecular]` extras are mutually exclusive** — both ship a package named `esm` (Biohub's 2026 release vs the pre-2026 EvolutionaryScale PyPI 3.x). Declared in `[tool.uv].conflicts`; sheaf cannot serve ESM-3 and ESMC in the same process. Users who need both run them in separate Sheaf deployments.
+- **`modal_server.py` has its own `AnyRequest` union** — deliberately separate from `sheaf.api.union` so Modal containers don't pull Ray as a transitive dep. Adding a new model type means updating **both** unions, plus the registry imports inside `_build_asgi_app`. The v0.11 PR initially missed the modal-side update for `ProteinLanguageRequest` + `StructureRequest` — protein requests would have 422'd on Modal until the follow-up fix.
+- **`esm` git pin tracks Modal's reference example** — pinned to commit `81b3646c9429ea8458918415ad6a46178cb59833` (long SHA), matching `modal-labs/modal-examples 06_gpu_and_ml/protein-folding/esmfold2.py`. This is the revision verified via `examples/quickstart_protein_modal.py`; bump in lockstep with Modal's example when their next pin lands.
+- **ESMFold2 `_to_float_list` / `_maybe_2d_list` helpers** — coerce upstream tensor-or-list outputs to plain `list[float]` / `list[list[float]]` for JSON serialisation. The pattern checks for `.cpu()` (torch tensor) and `.item()` (torch scalar) before falling back to `float(value)`, so test stubs can pass raw lists/floats without importing torch.
+- **Forge / Biohub-Platform variants raise `NotImplementedError` at `load()`** — `esmc-300m-2024-12`, `esmc-600m-2024-12`, `esmfold2-fast-2026-05` require Biohub's HTTP-client SDK (`esm.sdk.esmc_client`, `SequenceStructureForgeInferenceClient`) with an API token. Sheaf's v0.11 backends explicitly reject these IDs at load with a pointer to ADR-0001; wiring up the Forge HTTP client is a deliberate future PR (separate code path, no local weights).
+
 ## Adding a new backend
 
 1. Add typed request/response to `src/sheaf/api/<model_type>.py`

diff --git a/README.md b/README.md
@@ -29,13 +29,16 @@ Each model type gets a typed request/response contract. Batching, caching, and s
 > curl https://korbonits--sheaf-demo-modalserver---init----locals---serve.modal.run/chronos/health
 > ```
 
-> **Requires Python 3.11+.** macOS's system `python3` is usually 3.10 — bootstrap a 3.11 venv first via [`uv`](https://docs.astral.sh/uv/) (`uv venv --python 3.11 .venv && source .venv/bin/activate`) or `pyenv`. The `[molecular]` extra (ESM-3) additionally requires Python 3.12+.
+> **Requires Python 3.11+.** macOS's system `python3` is usually 3.10 — bootstrap a 3.11 venv first via [`uv`](https://docs.astral.sh/uv/) (`uv venv --python 3.11 .venv && source .venv/bin/activate`) or `pyenv`. The `[molecular]` and `[protein]` extras (ESM-3 / ESMC / ESMFold2) additionally require Python 3.12+, and the two are mutually exclusive (they share the `esm` import name from different packages).
 
 ```bash
 pip install sheaf-serve                           # core only
 pip install "sheaf-serve[time-series]"            # + Chronos2 / TimesFM / Moirai
 pip install "sheaf-serve[tabular]"                # + TabPFN
 pip install "sheaf-serve[molecular]"              # + ESM-3  (Python 3.12+)
+pip install "sheaf-serve[protein]"                # + ESMC / ESMFold2 deps (Python 3.12+)
+# then also (no PyPI release yet — pinned commit per upstream README):
+pip install "esm@git+https://github.com/Biohub/esm.git@81b3646c9429ea8458918415ad6a46178cb59833"
 pip install "sheaf-serve[genomics]"               # + Nucleotide Transformer
 pip install "sheaf-serve[small-molecule]"         # + MolFormer
 pip install "sheaf-serve[materials]"              # + MACE-MP
@@ -179,6 +182,18 @@ See [`examples/`](examples/) for time series comparison, tabular, audio, vision,
 
 ---
 
+## Protein models
+
+Sheaf serves three protein foundation models, each via its own typed contract:
+
+- **ESM-3** (`api/molecular.py`, backend `esm3`) — per-sequence pooled embeddings (mean / cls). Use for sequence-level similarity, clustering, and downstream featurization. `[molecular]` extra (Python 3.12+).
+- **ESMC** (`api/protein_language.py`, backend `esmc`) — per-token logits + optional per-token embeddings from Biohub's 2026-05-27 release. Use when you need masked-LM logits, per-residue representations, or all-layer hidden states. Default model: `Biohub/ESMC-6B`. `[protein]` extra (Python 3.12+); 300M / 600M variants are Forge API-only and currently raise `NotImplementedError`.
+- **ESMFold2** (`api/structure.py`, backend `esmfold2`) — protein structure prediction with inference-time scaling. Exposes `num_loops`, `num_sampling_steps`, `num_samples`, `seed` as first-class request fields; returns PDB / mmCIF + pLDDT + pTM/ipTM + optional PAE. Default model: `biohub/ESMFold2`. `[protein]` extra (Python 3.12+).
+
+`[molecular]` (ESM-3) and `[protein]` (ESMC + ESMFold2) share the `esm` import name from different upstream packages — install one **or** the other in a given environment. See [`docs/adr/0001-esmc-esmfold2-integration.md`](docs/adr/0001-esmc-esmfold2-integration.md) for the rationale.
+
+Biohub release announcement: <https://github.com/Biohub/esm> · preprint: <https://biohub.ai/papers/esm_protein.pdf>.
+
 ## Supported model types
 
 | Type | Status | Backends |
@@ -193,6 +208,8 @@ See [`examples/`](examples/) for time series comparison, tabular, audio, vision,
 | Depth estimation | ✅ v0.3 | Depth Anything v2 |
 | Object detection | ✅ v0.3 | DETR / RT-DETR |
 | Protein / molecular | ✅ v0.3 | ESM-3 (Python 3.12+) |
+| Protein language modeling | ✅ v0.11 | ESMC 6B (Biohub) |
+| Protein structure prediction | ✅ v0.11 | ESMFold2 (Biohub) — inference-time scaling |
 | Genomics | ✅ v0.3 | Nucleotide Transformer |
 | Small molecule | ✅ v0.3 | MolFormer-XL |
 | Materials science | ✅ v0.3 | MACE-MP-0 |
@@ -309,6 +326,17 @@ Today sheaf ships three deployment paths: `ModelServer` (a local Ray cluster you
 - [ ] `examples/k8s/` with a `RayService` manifest — KubeRay's canonical Ray-on-K8s shape — and a short `README.md` covering prereqs (KubeRay operator installed), `kubectl apply`, and a port-forward smoke test.
 - [ ] GitHub Actions workflow that builds + pushes the Dockerfile to `ghcr.io/korbonits/sheaf-serve:vX.Y.Z` on `v*` tag push, mirroring the PyPI publish flow.
 
+**v0.11 — Biohub protein-biology release integration**
+
+Biohub's "world model of protein biology" landed 2026-05-27 under MIT.  Sheaf integrates the two model artifacts as first-class typed contracts; ESM Atlas (dataset) is out of scope.  See [`docs/adr/0001-esmc-esmfold2-integration.md`](docs/adr/0001-esmc-esmfold2-integration.md).
+
+- [x] `ESMCBackend` — per-token logits + per-token embeddings via `transformers.AutoModelForMaskedLM`, default `Biohub/ESMC-6B`.
+- [x] `ESMFold2Backend` — protein structure prediction with `num_loops` / `num_sampling_steps` / `num_samples` / `seed` as first-class request fields, returning PDB / mmCIF + pLDDT + pTM/ipTM + optional PAE.
+- [x] New `STRUCTURE` model category — first non-tensor output category (structure file as text).
+- [x] `[protein]` install extra; `esm` from `git+https://github.com/Biohub/esm.git@81b3646c9429ea8458918415ad6a46178cb59833` documented (no PyPI release yet).
+- [x] End-to-end GPU smoke — `examples/quickstart_protein_modal.py` runs `ESMFold2Backend` on H100 via Modal (~70s cold start to a persistent volume, sub-second per fold). 53-residue target → 43,088-char mmCIF, pTM=0.2465.
+- [ ] Forge / Biohub-Platform HTTP-client variants for the ESMC 300M / 600M / ESMFold2-fast API-only models.
+
 ---
 
 ## Architecture