Integrating Flagship Pioneering protein inverse folding model (RLDIF) into bionemo #212
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the "sub-packages" directory I have added a directory called "bionemo-rldif.” In this directory is the code to run a protein inverse folding model developed by Flagship Pioneering. I added a src and test sub-directory following the structure of other projects in the "sub-packages" directory and tested this code locally and confirmed it works. Ideally users can use this code to import rldif and run our developed inverse folding model for their protein structures.
Usage
The intended usage of the code can be seen in the README of the bionemo-rldif directory. From what I saw from other packages, pip-installing the model and then importing it was the preferred usage so adapted the code to accommodate for that.
Testing
Since I don't have the AWS credentials needed to build the Docker container and the required
torch-scatter
package version conflicts with the PyTorch version installed by bionemo-core, I wasn't able to test the code in Docker. However, I successfully ran the test by moving the test file from thetest
directory to thesrc
directory and executing:This test confirms the model can generate reasonable sequences from a given pdb file, validating the core functionality.
I did review this document, as stated earlier due to this AWS credential issue most of what is on there I cannot run apart from the code etiquette portion which I also left alone for now because I was not sure if that applied to this project since it is supposed to be used in a straightforward manner for model inference. If it does apply feel free to point this out and I'll comment all relevant code and functions.
Yes, please see my response to the earlier question.
SKIP_CI
label to your PR?Given I was not able to test the CI cannot put this label.
PYTEST_NOT_REQUIRED
label to your PR?Given I was not able to run the PYTEST beyond the way I described, I cannot put this label.
JET_NOT_REQUIRED
label to your PR?I also cannot put this label.