-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about Uniprot IDs #1
Comments
Thank you for your interest in our work. In the following, I outline the process of accessing the UniProt IDs for the three publicly available datasets: BindingDB Dataset: On the BindingDB website, you can access the files "BindingDB_UniProt.txt" and "BindingDBTargetSequences.fasta." By matching these files, you will extract the UniProt IDs for all amino acid sequences. BioSNAP Dataset: The current BioSNAP dataset already includes the corresponding UniProt IDs, so no additional processing is required. Human Dataset: First, you can find human amino acid sequences in the UniProt database, which should also have corresponding 3D structures on AlphaFoldDB (covering roughly 8,500 sequences). You can then match the Human dataset with the downloaded UniProt dataset to obtain the respective UniProt IDs. In addition, if the UniProt ID or corresponding 3D structure file (.cif) is not available, the 3D structure can also be predicted with the AlphaFold model directly using the amino acid sequence. |
Thanks for your quick response! Now I understand that Uniprot IDs allow us to retrieve 3D structures from AlphaFoldDB. I just wanted to confirm if this is primarily for the purpose of saving time. |
Thank you for your question! Yes, precisely—the UniProt IDs serve as a bridge to quickly retrieve the 3D structures from AlphaFoldDB, making the process more efficient. |
Thanks, Zhaohan! |
Hi Zhaohan,
FusionDTI is a fantastic work. I am currently attempting to test it for drug-protein interaction prediction, but I am uncertain how to obtain the Uniprot IDs for all the protein sequences in the bindingdb.csv file. Could you please give more guidance on this step?
"The first step, if you do not have Uniprot IDs, you will need to obtain them from the UniProt website based on existing amino acid sequences, protein names, etc. Then save them as a comma-delimited text file."
Thanks for your help in advance!
The text was updated successfully, but these errors were encountered: