Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
4a06de0
Add ESMFold backend: inference + attention/activation trace export + …
jayvenn21 Feb 21, 2026
947bece
Switch ESMFold backend to HuggingFace implementation to avoid OpenFol…
jayvenn21 Feb 21, 2026
e2be72e
Fix HF ESMFold forward: do not pass output_attentions/output_hidden_s…
jayvenn21 Feb 21, 2026
0d4f5a3
Stabilize ESMFold backend: fix hook-based trace extraction, clean up …
jayvenn21 Mar 11, 2026
1b29afb
Address PR #44 review: pin transformers, .DS_Store/.gitignore, trace …
jayvenn21 Mar 23, 2026
aed3215
Extract s_s folding trunk activations and enforce safetensors (#2)
rohan5986 Mar 23, 2026
ca21eb5
Add VizFold-compatible text attention export to HuggingFace backend (#1)
JeevanandanRamasamy Mar 23, 2026
fb777c9
Gate s_s extraction under trace_mode and persist to trace archive
jayvenn21 Mar 24, 2026
4b86182
Add trace extraction smoke test and reproducibility docs
jayvenn21 Mar 24, 2026
a3a9ac5
Document s_s extraction and structure PDB fallback chain
jayvenn21 Mar 26, 2026
db6caba
Add structure module hooks for IPA attention and per-recycle backbone…
jayvenn21 Mar 27, 2026
eebb1af
Capture s_s and s_z at every recycling iteration via trunk hook (#3)
rohan5986 Apr 2, 2026
d86c4ad
Extract Evoformer Trunk Intermediates from ESMFold (#4)
JeevanandanRamasamy Apr 3, 2026
f3618ce
Add validation tests for ESMFold intermediate outputs (#5)
Mose-Kim02 Apr 9, 2026
bc3505b
ESMFold backend correctness and cleanup (#7)
JeevanandanRamasamy Apr 21, 2026
4342a19
VizFold Interactive Dashboard - 3Dmol Structure & Trace Explorer (#6)
rohan5986 Apr 24, 2026
65245e1
Removed duplicate extraction logic, fixed scaffold CSS, and deleted u…
rohan5986 Apr 24, 2026
f491ef7
Refactor frontend viz: responsive heatmap, camera persistence, bidire…
jayvenn21 Apr 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
.vscode/
.idea/
.DS_Store
__pycache__/
*.egg-info
build
Expand All @@ -18,3 +19,13 @@ cutlass/
*.sto
*.a3m
*.hhr

# Backend Generated Files
test_output/
__pycache__/
*.pt
*.pdb

# Frontend Dependencies
frontend/node_modules/
frontend/dist/
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,16 @@ This repository has two main components:

---

## How do I test running OpenFold / VizFold?

| Where | How |
|-------|-----|
| **Locally** | Use **`viz_attention_demo_base.ipynb`**. Open it, set `BASE_DATA_DIR` (or the template MMCIF path in the inference command), run the setup once (e.g. `bash scripts/download_alphafold_params.sh openfold/resources`), then run the cells. Cell 2 runs OpenFold + attention extraction; cells 3–4 run the VizFold visualizations. |
| **HPC with CyberShuttle** | Use **`viz_attention_demo.ipynb`**. It uses Airavata magics to request a GPU runtime (`cybershuttle.yml`), then clones the [attention-viz-demo](https://github.com/vizfold/attention-viz-demo) repo and runs the same pipeline there. Set `BASE_DATA_DIR` to your cluster’s AlphaFold DB path (e.g. `/depot/itap/datasets/alphafold/db`). |
| **HPC without CyberShuttle** | Same workflow as local, but from the cluster: clone this repo, create the env (e.g. from `cybershuttle.yml` or OpenFold docs), download params, then run `run_pretrained_openfold.py` via your job scheduler (e.g. `sbatch`). Optionally run the `visualize_attention_*` scripts on the outputs, or run `viz_attention_demo_base.ipynb` in Jupyter on the cluster if available. |

---

Link to Openfold implimentation - [README_vizfold_openfold.md](https://github.com/vizfold/vizfold-foundation/blob/main/README_vizfold_openfold.md)

---
Expand Down
139 changes: 139 additions & 0 deletions docs/esmfold.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# ESMFold backend

The ESMFold backend runs [ESMFold](https://github.com/facebookresearch/esm) via **HuggingFace Transformers** (`EsmForProteinFolding`) and writes **VizFold-compatible** trace archives: structure + optional attention and activation tensors with metadata. Using Transformers avoids the OpenFold build dependency (no CUDA compilation on cluster).

## Install

**Option A – conda (recommended)**
Use `environment-mac.yml` (Mac) or `environment.yml` (Linux):

```bash
conda env create -f environment-mac.yml # or environment.yml on Linux
conda activate openfold-env
```

**Option B – pip (after PyTorch is installed)**
From repo root:

```bash
pip install -r requirements-esmfold.txt
# Optional: pip install -e . for vizfold package
```

`requirements-esmfold.txt` pins `transformers>=4.36.0`. PyTorch must be installed separately.

## Run locally

**Structure only (fast):**

```bash
python run_pretrained_esmf.py \
--fasta examples/monomer/fasta_dir_6KWC/6KWC.fasta \
--out outputs/esmf_6KWC \
--trace_mode none
```

**Structure + attention + activations:**

```bash
python run_pretrained_esmf.py \
--fasta examples/monomer/fasta_dir_6KWC/6KWC.fasta \
--out outputs/esmf_6KWC \
--model facebook/esmfold_v1 \
--device cuda \
--trace_mode attention+activations \
--layers all \
--save_fp16
```

**Limit layers/heads (saves memory and disk):**

```bash
python run_pretrained_esmf.py \
--fasta examples/monomer/fasta_dir_6KWC/6KWC.fasta \
--out outputs/esmf_6KWC \
--trace_mode attention \
--layers 0,1,2,5 \
--heads 0,1,2
```

**Structure + IPA attention + per-recycle backbone (structure module traces):**

```bash
python run_pretrained_esmf.py \
--fasta examples/monomer/fasta_dir_6KWC/6KWC.fasta \
--out outputs/esmf_6KWC \
--trace_mode attention+activations \
--structure_traces \
--save_fp16
```

## Output layout

After a run, `--out` contains:

```
outputs/esmf_6KWC/
meta.json # Run metadata (backend, model, shapes, seed, etc.)
structure/
predicted.pdb # Predicted structure (PDB)
predicted.pt # Optional coordinate tensor
trace/
attention/
layer_000.pt
layer_001.pt
...
activations/
layer_000.pt
...
trunk/ # Evoformer intermediates (per-block + final)
block_000_seq.pt # [L, C_s] per-block sequence state (last recycle)
block_000_pair.pt # [L, L, C_z] per-block pair state (last recycle)
...
s_s.pt # [L, C_s] final trunk single representations
s_z.pt # [L, L, C_z] final trunk pair representations
structure_module/ # Only with --structure_traces
ipa_attention/
recycle_00_block_00.pt # IPA attention [H, N, N]
...
backbone/
recycle_00_positions.pt # Per-recycle backbone coords
recycle_00_states.pt # Per-recycle single representations
...
summary.json # Per-layer attention entropy, sparsity, norms
index.json # Maps layer/head to path, dtype, shape
attention_files/
msa_row_attn_layer0.txt # VizFold text format (top-k per head)
...
logs.txt # Log lines from the run
```

## meta.json

Includes:

- `backend`, `model_name`, `date_time`, `device`, `dtype`
- `sequence_length`, `input_fasta_hash`, `input_fasta_path`
- `layer_count`, `head_count`, `trace_mode`, `tensor_format` (fp16/fp32)
- `trace_formats`: which output formats were produced (`pt`, `txt`)
- `shapes_recorded`: per-file shapes for attention, activations, trunk, and structure module
- `seed`, `deterministic` (if set)
- `repo_commit` (if run from a git repo)

## Reproducibility

- `--seed 42` fixes the PyTorch RNG.
- `--deterministic` sets CuDNN deterministic mode (can be slower).

Both are recorded in `meta.json`.

## Long sequences

Attention storage is O(N²). For long proteins the script warns and suggests:

- `--trace_mode activations` (no attention), or
- `--layers 0,1,2` to save only a few layers.

## Running on ICE (SLURM)

See [hpc_ice.md](hpc_ice.md) for batch submission, environment setup, and a short smoke test.
181 changes: 181 additions & 0 deletions docs/esmfold_backend_repro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# ESMFold Backend Reproducibility Guide

This document describes how to run the HuggingFace-based ESMFold backend with
VizFold-compatible trace export using the shared `feature/esmfold-backend`
integration branch.

It provides instructions for verifying structure inference, attention extraction,
activation extraction, and expected archive outputs locally and on the ICE cluster.

## Environment Setup (Local)

Create a virtual environment:

```bash
python3 -m venv .venv
source .venv/bin/activate
```

Install dependencies:

```bash
pip install -r requirements-esmfold.txt
pip install torch
pip install -e . --no-build-isolation
```

## Structure-Only Inference Test

Run:

```bash
python run_pretrained_esmf.py \
--fasta examples/monomer/fasta_dir_6KWC/6KWC.fasta \
--out outputs/test_run \
--trace_mode none \
--device cpu
```

Expected outputs:

- `outputs/test_run/meta.json`
- `outputs/test_run/structure/predicted.pdb`

Successful execution confirms:

- model loads correctly
- inference pipeline runs end-to-end
- archive metadata generation works

## Trace Extraction Test (Attention + Activations)

Run:

```bash
python run_pretrained_esmf.py \
--fasta examples/monomer/fasta_dir_6KWC/6KWC.fasta \
--out outputs/test_trace \
--trace_mode attention+activations \
--device cpu
```

Expected outputs:

- `outputs/test_trace/meta.json`
- `outputs/test_trace/structure/predicted.pdb`
- `outputs/test_trace/trace/`

Trace directory should contain:

- `trace/attention/`
- `trace/activations/`

## Verified Tensor Outputs (Local Validation)

Successful execution produces:

- 36 attention tensors
- 36 activation tensors
- 72 total `.pt` trace tensors

Attention tensors follow expected shape:

`[B, H, N, N]`

where:

- B = batch size
- H = number of attention heads
- N = sequence length (after special-token slicing)

This confirms compatibility with VizFold's visualization pipeline.

## Archive Structure Validation

Expected archive layout:

```
outputs/test_trace/
├── meta.json
├── structure/
│ └── predicted.pdb
└── trace/
├── attention/
└── activations/
```

This structure matches the OpenFold-compatible VizFold archive schema.

## Running on ICE Cluster (PACE)

Login:

```bash
ssh <gt_username>@login-ice.pace.gatech.edu
```

Navigate to the repository:

```bash
cd attention-viz-demo
```

Activate environment:

```bash
source .venv/bin/activate
```

Run structure inference:

```bash
python run_pretrained_esmf.py \
--fasta examples/monomer/fasta_dir_6KWC/6KWC.fasta \
--out outputs/test_run \
--trace_mode none \
--device cuda
```

Run trace extraction:

```bash
python run_pretrained_esmf.py \
--fasta examples/monomer/fasta_dir_6KWC/6KWC.fasta \
--out outputs/test_trace \
--trace_mode attention+activations \
--device cuda
```

Expected outputs:

- `structure/predicted.pdb`
- `meta.json`
- `trace/attention/`
- `trace/activations/`

GPU execution confirms cluster compatibility for larger inference workloads.


## Additional Intermediate Output Validation

The latest shared `feature/esmfold-backend` branch now exports additional intermediate outputs beyond the original encoder attention and activation traces.

Verified local outputs include:

- 36 attention tensors in `trace/attention/`
- 36 activation tensors in `trace/activations/`
- ~98 Evoformer trunk intermediate tensors in `trace/trunk/`
- 36 VizFold attention text files in `attention_files/`

Expected tensor shapes include:

- attention tensors: `[B, H, N, N]`
- activation tensors: `[B, N, D]`
- pair representations (`s_z`): `[N, N, D]`

If recycling outputs are enabled, they are expected to appear under `trace/trunk/` with keys such as:

- `recycle_*_s_s`
- `recycle_*_s_z`

If structure-module / IPA outputs are enabled, they should also be saved as `.pt` tensors in the trace archive and can be validated separately for expected attention dimensions.
Loading