Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Uniprot IDs #1

Open
GainGod-Xu opened this issue Nov 5, 2024 · 4 comments
Open

Question about Uniprot IDs #1

GainGod-Xu opened this issue Nov 5, 2024 · 4 comments

Comments

@GainGod-Xu
Copy link

GainGod-Xu commented Nov 5, 2024

Hi Zhaohan,

FusionDTI is a fantastic work. I am currently attempting to test it for drug-protein interaction prediction, but I am uncertain how to obtain the Uniprot IDs for all the protein sequences in the bindingdb.csv file. Could you please give more guidance on this step?

"The first step, if you do not have Uniprot IDs, you will need to obtain them from the UniProt website based on existing amino acid sequences, protein names, etc. Then save them as a comma-delimited text file."

Thanks for your help in advance!

@ZhaohanM
Copy link
Owner

ZhaohanM commented Nov 5, 2024

Thank you for your interest in our work. In the following, I outline the process of accessing the UniProt IDs for the three publicly available datasets:

BindingDB Dataset: On the BindingDB website, you can access the files "BindingDB_UniProt.txt" and "BindingDBTargetSequences.fasta." By matching these files, you will extract the UniProt IDs for all amino acid sequences.

BioSNAP Dataset: The current BioSNAP dataset already includes the corresponding UniProt IDs, so no additional processing is required.

Human Dataset: First, you can find human amino acid sequences in the UniProt database, which should also have corresponding 3D structures on AlphaFoldDB (covering roughly 8,500 sequences). You can then match the Human dataset with the downloaded UniProt dataset to obtain the respective UniProt IDs.

In addition, if the UniProt ID or corresponding 3D structure file (.cif) is not available, the 3D structure can also be predicted with the AlphaFold model directly using the amino acid sequence.

@GainGod-Xu
Copy link
Author

Thanks for your quick response!

Now I understand that Uniprot IDs allow us to retrieve 3D structures from AlphaFoldDB. I just wanted to confirm if this is primarily for the purpose of saving time.

@ZhaohanM
Copy link
Owner

ZhaohanM commented Nov 5, 2024

Thank you for your question! Yes, precisely—the UniProt IDs serve as a bridge to quickly retrieve the 3D structures from AlphaFoldDB, making the process more efficient.

@GainGod-Xu
Copy link
Author

Thanks, Zhaohan!

@ZhaohanM ZhaohanM pinned this issue Nov 18, 2024
@ZhaohanM ZhaohanM unpinned this issue Nov 18, 2024
@ZhaohanM ZhaohanM reopened this Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants