Add Evoformer extraction hooks for intermediate MSA and pair representations by PranavNarala1 · Pull Request #71 · AI2Science/vizfold-foundation

PranavNarala1 · 2026-04-28T03:41:53Z

Summary

This PR adds Evoformer extraction support to the OpenFold inference workflow by introducing forward-hook-based instrumentation for selected Evoformer layers. During inference, the new extraction path captures intermediate MSA and pair representations, stores them in a structured dictionary, and saves them as a reusable artifact for downstream analysis and visualization.

What this PR adds

This PR adds the extraction layer for intermediate Evoformer representations. In particular, it introduces:

forward hooks for selected Evoformer blocks
capture of intermediate msa and pair tensors during inference
clean hook registration and removal
structured dictionary output keyed by layer and tensor type
a small inspection utility for validating saved extraction artifacts

Files added / updated

run_evoformer_hook_pretrained_openfold.py
Adds Evoformer instrumentation flags and integrates hook-based extraction into the inference path.
openfold/utils/evoformer_instrumentation.py
Contains the extraction logic for attaching hooks, recording tensors, and saving captured outputs.
openfold/utils/evoformer_run_artifact.py
Provides utilities for working with saved extraction artifacts and downstream visualization workflows.
inspect_evoformer_reps.py
Helper script for validating saved .pt extraction files by printing keys, shapes, and summary statistics.
openfold/utils/import_weights.py
Includes the local compatibility fix needed for the current environment.
openfold/__init__.py
Small import cleanup needed for this setup.

Output format

Captured intermediate outputs are saved in a dictionary with keys of the form:

layer_00.msa
layer_00.pair
layer_12.msa
layer_12.pair
layer_24.msa
layer_24.pair
layer_47.msa
layer_47.pair

This makes the output easy to consume for downstream tensor processing, visualization, and interface work.

How to test

Run inference with Evoformer extraction enabled:

python3 run_evoformer_hook_pretrained_openfold.py \
    ./examples/monomer/fasta_dir_6KWC \
    /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/pdb_mmcif/mmcif_files \
    --use_precomputed_alignments ./examples/monomer/alignments \
    --output_dir ./outputs/my_outputs_align_6KWC_demo_tri_18 \
    --config_preset model_1_ptm \
    --jax_param_path /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/params/params_model_1_ptm.npz \
    --uniref90_database_path /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/uniref90/uniref90.fasta \
    --mgnify_database_path /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/mgnify/mgy_clusters_2022_05.fa \
    --pdb70_database_path /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/pdb70/pdb70 \
    --uniclust30_database_path /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/uniclust30/uniclust30_2018_08 \
    --bfd_database_path /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
    --save_outputs \
    --skip_relaxation \
    --model_device cuda:0 \
    --attn_map_dir ./outputs/attention_files_6KWC_demo_tri_18 \
    --num_recycles_save 1 \
    --triangle_residue_idx 18 \
    --demo_attn \
    --instrument_evoformer \
    --instrument_layers 0,12,24,47 \
    --instrument_out_dir ./outputs/instrumentation

After the run, confirm that the extraction artifact exists:

./outputs/instrumentation/6KWC_1_model_1_ptm_evoformer_reps.pt

Then inspect it with:

python inspect_evoformer_reps.py ./outputs/instrumentation/6KWC_1_model_1_ptm_evoformer_reps.pt

Expected behavior:

inference completes successfully
a .pt file is saved
the output contains layer-specific msa and pair tensors
keys and shapes are printed correctly by the inspection script

Validation performed

Tested on a real inference run with selected Evoformer layers 0, 12, 24, 47.

The saved artifact contained:

layer_00.msa
layer_00.pair
layer_12.msa
layer_12.pair
layer_24.msa
layer_24.pair
layer_47.msa
layer_47.pair

Observed tensor shapes:

msa: (516, 191, 256)
pair: (191, 191, 128)

Additional validation:

early vs. late msa layers had matching shapes but were not identical
early vs. late pair layers had matching shapes but were not identical

This confirmed that the hooks fired correctly, captured nontrivial intermediate tensors, and produced stable layer-specific outputs.

Limitations

Current extraction is based on Evoformer block outputs, not finer-grained submodule attention hooks.
The saved tensors can be large, so extraction is currently best used on selected layers rather than all layers at once.
This PR focuses on extraction only; downstream tensor processing, visualization, and interface features are handled separately.

Why this matters

Before this change, the workflow supported exported attention summaries, but not general intermediate Evoformer representation capture. This PR adds the extraction backbone needed for downstream visualization and analysis of internal model representations.

…ook for Issue AI2Science#8

Feature/Issue AI2Science#8 — Web Interface for Visualizing Intermediate Representations

…essing Added representation tensor utilities for Issue AI2Science#8

…essing Remove headers for consistency

Final checkin

…-demo

sherrylicodes · 2026-04-29T15:46:53Z

Really cool! Great for downstream visualization work. One suggestion is to include a small metadata object alongside the saved .pt artifact, like selected layers, tensor shapes, residue count, model/config preset, recycle index. Could make it easier for offline readers or visualization tools to validate the artifact before loading large tensors and to map msa or pair outputs into future UI views.

Also, since the tensors can be large, it could be good to document if the saved artifact supports partial loading or if downstream tools should convert it into a chunked format like Zarr.

Pranav Narala and others added 18 commits March 13, 2026 13:24

adding graphing code from openfold output

19df29c

adding in evoformer hook instrumentation and plotting utils

77a2d02

Create run_evoformer_hook_pretrained_openfold.py

afa4819

Add Flask web interface, visualization script, and updated demo noteb…

9617bff

…ook for Issue AI2Science#8

Merge pull request #1 from SruthiVangavolu7/feature/issue8sruthi

aaec509

Feature/Issue AI2Science#8 — Web Interface for Visualizing Intermediate Representations

Add representation tensor processing utilities

befec41

Add tests for representation tensor utilities

7d48c44

Merge pull request AI2Science#2 from priyavisingh/priyavi-tensor-proc…

b64a870

…essing Added representation tensor utilities for Issue AI2Science#8

Remove headers for consistency

4d2b998

Merge pull request AI2Science#3 from priyavisingh/priyavi-tensor-proc…

2d129e0

…essing Remove headers for consistency

Add Evoformer extraction hooks and inspection utilities

2cec8ab

adding in new cell for testing evoformer hook extraction

4f10378

Merge pull request #1 from PranavNarala1/final-checkin

a76a7af

Final checkin

Add Evoformer extraction hooks and inspection utilities

a246a85

adding in new cell for testing evoformer hook extraction

bc0f394

Merge branch 'PranavNarala1-main'

ac98a6e

Merge branch 'main' of https://github.com/PranavNarala1/attention-viz…

24cad22

…-demo

Resolve notebook conflict against priyavi main

16b4f67

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Evoformer extraction hooks for intermediate MSA and pair representations#71

Add Evoformer extraction hooks for intermediate MSA and pair representations#71
PranavNarala1 wants to merge 18 commits into
AI2Science:mainfrom
PranavNarala1:main

PranavNarala1 commented Apr 28, 2026

Uh oh!

sherrylicodes commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

PranavNarala1 commented Apr 28, 2026

Summary

What this PR adds

Files added / updated

Output format

How to test

Validation performed

Limitations

Why this matters

Uh oh!

sherrylicodes commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants