Add Evoformer extraction hooks for intermediate MSA and pair representations#71
Open
PranavNarala1 wants to merge 18 commits into
Open
Add Evoformer extraction hooks for intermediate MSA and pair representations#71PranavNarala1 wants to merge 18 commits into
PranavNarala1 wants to merge 18 commits into
Conversation
Feature/Issue AI2Science#8 — Web Interface for Visualizing Intermediate Representations
…essing Added representation tensor utilities for Issue AI2Science#8
…essing Remove headers for consistency
Final checkin
|
Really cool! Great for downstream visualization work. One suggestion is to include a small metadata object alongside the saved .pt artifact, like selected layers, tensor shapes, residue count, model/config preset, recycle index. Could make it easier for offline readers or visualization tools to validate the artifact before loading large tensors and to map msa or pair outputs into future UI views. Also, since the tensors can be large, it could be good to document if the saved artifact supports partial loading or if downstream tools should convert it into a chunked format like Zarr. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds Evoformer extraction support to the OpenFold inference workflow by introducing forward-hook-based instrumentation for selected Evoformer layers. During inference, the new extraction path captures intermediate MSA and pair representations, stores them in a structured dictionary, and saves them as a reusable artifact for downstream analysis and visualization.
What this PR adds
This PR adds the extraction layer for intermediate Evoformer representations. In particular, it introduces:
msaandpairtensors during inferenceFiles added / updated
run_evoformer_hook_pretrained_openfold.pyAdds Evoformer instrumentation flags and integrates hook-based extraction into the inference path.
openfold/utils/evoformer_instrumentation.pyContains the extraction logic for attaching hooks, recording tensors, and saving captured outputs.
openfold/utils/evoformer_run_artifact.pyProvides utilities for working with saved extraction artifacts and downstream visualization workflows.
inspect_evoformer_reps.pyHelper script for validating saved
.ptextraction files by printing keys, shapes, and summary statistics.openfold/utils/import_weights.pyIncludes the local compatibility fix needed for the current environment.
openfold/__init__.pySmall import cleanup needed for this setup.
Output format
Captured intermediate outputs are saved in a dictionary with keys of the form:
layer_00.msalayer_00.pairlayer_12.msalayer_12.pairlayer_24.msalayer_24.pairlayer_47.msalayer_47.pairThis makes the output easy to consume for downstream tensor processing, visualization, and interface work.
How to test
Run inference with Evoformer extraction enabled:
python3 run_evoformer_hook_pretrained_openfold.py \ ./examples/monomer/fasta_dir_6KWC \ /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/pdb_mmcif/mmcif_files \ --use_precomputed_alignments ./examples/monomer/alignments \ --output_dir ./outputs/my_outputs_align_6KWC_demo_tri_18 \ --config_preset model_1_ptm \ --jax_param_path /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/params/params_model_1_ptm.npz \ --uniref90_database_path /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/uniref90/uniref90.fasta \ --mgnify_database_path /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/mgnify/mgy_clusters_2022_05.fa \ --pdb70_database_path /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/pdb70/pdb70 \ --uniclust30_database_path /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/uniclust30/uniclust30_2018_08 \ --bfd_database_path /storage/ice1/shared/d-pace_community/alphafold/alphafold_2.3.2_data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \ --save_outputs \ --skip_relaxation \ --model_device cuda:0 \ --attn_map_dir ./outputs/attention_files_6KWC_demo_tri_18 \ --num_recycles_save 1 \ --triangle_residue_idx 18 \ --demo_attn \ --instrument_evoformer \ --instrument_layers 0,12,24,47 \ --instrument_out_dir ./outputs/instrumentationAfter the run, confirm that the extraction artifact exists:
./outputs/instrumentation/6KWC_1_model_1_ptm_evoformer_reps.ptThen inspect it with:
Expected behavior:
.ptfile is savedmsaandpairtensorsValidation performed
Tested on a real inference run with selected Evoformer layers
0, 12, 24, 47.The saved artifact contained:
layer_00.msalayer_00.pairlayer_12.msalayer_12.pairlayer_24.msalayer_24.pairlayer_47.msalayer_47.pairObserved tensor shapes:
msa:(516, 191, 256)pair:(191, 191, 128)Additional validation:
msalayers had matching shapes but were not identicalpairlayers had matching shapes but were not identicalThis confirmed that the hooks fired correctly, captured nontrivial intermediate tensors, and produced stable layer-specific outputs.
Limitations
Why this matters
Before this change, the workflow supported exported attention summaries, but not general intermediate Evoformer representation capture. This PR adds the extraction backbone needed for downstream visualization and analysis of internal model representations.