Skip to content

ipSAE ranking, Dockerfile upgrade, nipah Glycoprotein G design support#86

Open
lhallee wants to merge 20 commits into
HannesStark:mainfrom
lhallee:main
Open

ipSAE ranking, Dockerfile upgrade, nipah Glycoprotein G design support#86
lhallee wants to merge 20 commits into
HannesStark:mainfrom
lhallee:main

Conversation

@lhallee
Copy link
Copy Markdown

@lhallee lhallee commented Nov 26, 2025

Hello @HannesStark,

The Nipah Binder Competition is ongoing, which leverages ipSAE as the main metric for ranking results. I wanted to add some easy functionality so that boltzgen naturally returns pae in an easy to read format, and a script calc_ipsae.py that automatically scores and ranks all of the final outputs of the model.

I also added an enhanced Dockerfile, which fixes a python version bug and makes sure the user is setup with up to date pytorch, etc.

Added to the examples is a simple workflow for designing Nipah Glycoprotein G binders, with the solved PDB structure leveraged.

To design competition binders with docker

docker build -t boltzgen .
docker run --rm --gpus all --ipc=host -v "$PWD:/workdir" -v "$PWD/cache:/cache" -v "$PWD/example:/example" boltzgen run /example/nipah_glycoprotein_g/nipah.yaml --output /workdir/nipah_out --protocol protein-anything --num_designs 2

Then, rank by ipSAE

python calc_ipsae.py --output_dir nipah_out

Which places a summary here:
<output_dir>/designs_ranked_by_ipsae_summary.csv
and fasta here:
<output_dir>/designs_ranked_by_ipsae.fasta

Ran the workflow above with --num_designs 128 and got some decent results. Looks like a high ipSAE outputs from boltzgen are fairly sparse but obviously possible!

image

Obviously feel free to reorganize and use the contributed code however you'd like.
Best,
Logan

@sniarchos
Copy link
Copy Markdown

Hi @lhallee, thank you very much for that! I was wondering, where exactly do you find the full PAE matrices? I tried to trace them from your edits on some test runs of mine, but I see no full PAE matrices in these npz files, only aggregating ones eg interaction_pae.

@lhallee
Copy link
Copy Markdown
Author

lhallee commented Nov 27, 2025

Hey @zehanort,

If I understand the boltzgen code correctly, there is a (num_diffusion_samples, num_residue, num_residue) pae tensor now returned in the .npz files located <output_dir>/final_ranked_designs/final_30_designs/pae/*.npz. The calc_ipsae.py samples each of the num_diffusion_samples (5) and takes the max score of the 5 pae matrices by default..

Notably, the ipSAE is the "min" ipSAE version which was found to have the highest discriminative power in the meta analysis linked above.

@HannesStark
Copy link
Copy Markdown
Owner

Hi together, the easiest place to make these edits should be in the confidence_utils.py

If you add it there (e.g. similar to how the iiptm score is computed there) I would add it to the repo.

@y1zhou
Copy link
Copy Markdown
Contributor

y1zhou commented Dec 8, 2025

@lhallee Thanks for the great work! I made a fork of the original ipSAE repo to make interfacing with the scores easier from other Python code. Feel free to use this instead of your subprocessing method. The results in scores.by_res_scores and scores.chain_pair_scores are lists of named tuples, so you should be able to convert them directly into pandas dataframes if needed with pd.DataFrame(scores.chain_pair_scores).

If designing a binder against a multi-chain target, I also think it makes sense to group all target chains into one during the ipSAE calculations. But for now it's just a hypothesis that I haven't had the chance to test yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants