GPDL

Generative Protein Design by Language model

GPDL is a deep learning method to design novel and high quality scaffold backbone given the desired motif residue topologies and sequences. Included in this code repository are two distinct methods, each offering a balance between generation speed and output quality.

Note

You can also try our notebook in Colab without any dependency.

💻 Environment set-up

Conda environment

We used the same environment from https://github.com/facebookresearch/esm

# install esmfold and openfold 
conda create -n protein_design python=3.8
conda activate protein_design
conda install pip
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
conda install -c conda-forge biotite
pip install "fair-esm[esmfold]"
pip install 'dllogger @ git+https://github.com/NVIDIA/dllogger.git'
pip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@4b41059694619831a7db195b7e0988fc4ff3a307'

# install esm_if
conda create -n esm_if python=3.9
conda activate esm_if
conda install pytorch cudatoolkit=11.3 -c pytorch
conda install pyg -c pyg -c conda-forge
conda install pip
pip install biotite==0.36.1
pip install git+https://github.com/facebookresearch/esm.git

Third party source code

Our repo keeps a fork of ProteinMPNN in ./ProteinMPNN. Our conda environment is sufficient for running the ProteinMPNN codes to generate sequences compatible with our backbones.

git clone https://github.com/dauparas/ProteinMPNN.git

🔮 GPDL tutorial

GPDL takes three-steps module by seeding-fix bb design-optimization in example.sh. Usually it needs 30 minutes for 100 backbones to generate protein scaffolds. Here is the bash parameters:

protein_name - Output path prefix
dir_name - Usually same as protein_name
inpaint_seq - This defines the motif information format, like "m,Ax-y,m", where m,n is the generated scaffold length and the motif position begins from A chain residue x to A chain residue y
mask_len - The scaffold length of each segment which should be the same with inpaint_seq
motif_id - The motif position which should be the same with inpaint_seq
max_mut- Maximum residues can be mutated in the beginning.
step - Total MCMC iteration steps.
temp_dir - Temp folders for MCMC and seeding module results
final_des_dir - Final output PDBs
reference - Native extraction PDB
inp_num - Total number of designs in each bash file
bias_AA_jsonl - Path to a dictionary which specifies AA composion bias if neededi, e.g. {A: -1.1, F: 0.7} would make A less likely and F more likely (the same as ProteinMPNN).

✏️ Citation

If you use the framework in your research, please cite the following paper.

@article {GPDL,
    Author={Bo Zhang, Kexin Liu, Zhuoqi Zheng, Junjie Zhu, Zhengxin Li, Yunfeiyang Liu, Junxi Mu, Ting Wei, Hai-Feng Chen},  
    title={Protein language model supervised motif-scaffolding design with GPDL},  
    year={2025},
    url={https://doi.org/10.1016/j.ijbiomac.2025.148441},
    journal={International Journal of Biological Macromolecules}
}

Github codebase author : Bo Zhang, Kexin Liu, Zhuoqi Zheng

E-mail : {zhangbo777,lkxlkx,h2knight}@sjtu.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
ProteinMPNN		ProteinMPNN
data		data
filtering_scripts		filtering_scripts
gpdl_hallucination		gpdl_hallucination
gpdl_inpainting		gpdl_inpainting
gpdl_sample		gpdl_sample
img		img
scripts		scripts
.gitattributes		.gitattributes
2FYD.pdb		2FYD.pdb
GPDL_colab.ipynb		GPDL_colab.ipynb
LICENSE		LICENSE
README.md		README.md
example.sh		example.sh
sample_sequences.py		sample_sequences.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GPDL

💻 Environment set-up

Conda environment

Third party source code

🔮 GPDL tutorial

✏️ Citation

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

sirius777coder/GPDL

Folders and files

Latest commit

History

Repository files navigation

GPDL

💻 Environment set-up

Conda environment

Third party source code

🔮 GPDL tutorial

✏️ Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages