-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about Training result #3
Comments
Before this, I tried using the Human dataset and ran the training code with the same hyperparameter settings, and the results look quite good
So is this a problem with the dataset itself? |
Thank you for your interest in our work. The inputs to our proposed model are SELFIES of drugs and Structure-Aware Sequences of proteins. The original dataset must be preprocessed according to the provided instructions in README.md. The following is a sample for guidance: ![]() Additionally, if you wish to use SMILES for drugs and amino acid sequences for proteins as inputs, you can utilise the original dataset directly. In this case, you only need to replace the current drug encoder (SELFormer) and protein encoder (Saprot) accordingly. |
Congratulations on obtaining the SELFIES dataset and the Structure-Aware (SA) sequences of proteins! The splitting strategy of the dataset affects the prediction results. To reproduce the performance in the paper, the next step is to integrate the SELFIES and SA datasets you acquired with established benchmarks such as BindingDB, BioSNAP, or Human. Furthermore, I recommend checking the dataset against the data in the figure below after processing the dataset. This will help bring your experimental results in line with those reported in our paper. ![]() In addition, if the UniProt ID or corresponding 3D structure file (.cif) is not available, the 3D structure should be predicted with the AlphaFold model directly using the amino acid sequence. |
Dear Zhao,
I am a graduate student currently attempting to reproduce the training results from your paper. Following the guidance in the README.md, I have set up the environment and downloaded the BindingDB dataset. However, when I run the command you provided:
I find that the training process is not proceeding well. Here is the training log recorded in wandb:
I am currently using the BindingDB data from /bindingdb/random in the DrugBAN repository.
data:image/s3,"s3://crabby-images/94e7e/94e7e9b049292b50a43eed9cabb5a2eaa58e362f" alt="image"
I'm not sure if this is correct or I missed some important steps.
I would greatly appreciate it if you could provide some guidance.
The text was updated successfully, but these errors were encountered: