-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
genome DB is unavailable #196
Comments
Sorry. Can you clarify. Did you use esearch on the command line with the same inputs and it worked? A quick debug shows that the error is returned from the entrez utils server. For me it returned:
|
I accessed the genome database through the CLI following these instructions: https://www.ncbi.nlm.nih.gov/datasets/docs/v2/how-tos/genomes/get-genome-metadata/ with the following command datasets summary genome taxon 'bats' [... the rest of the command is just to reformat the output ...] That command did give me output Assembly Accession Assembly Name Annotation Name Annotation Release Date Organism Name So I just assumed that the genome database is accessible? Unless the database called "genome" through the CLI "dataset" command is different from the database called "genome" accessed through R? |
That looks like a different API. |
I did not know that, thanks for the clarification. Thus the problem that the function entrez_db_summary(db = "genome") returns nothing is on the NCBI end, and there is nothing you can do about that? |
That is correct. The issues you are having are all directly because of the data provided by NCBI's servers and not NCBI has replaced the webpages corresponding to these databases with the datasets API you've mentioned. I can't tell you for sure if access to this database via |
I tested these two links, and they worked as expected.
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=6060535&retmote=rsr
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=rentrez
Here is my problem.
I received an error message for this search:
R> entrez_search(db = "genome", term = "sphingidae")
Error in ans[[1]] : subscript out of bounds
I checked if the genome database was listed:
R> entrez_dbs()
[1] "pubmed" "protein" "nuccore" "ipg" "nucleotide" "structure"
[7] "genome" "annotinfo" "assembly" "bioproject" "biosample" "blastdbinfo"
[13] "books" "cdd" "clinvar" "gap" "gapplus" "grasp"
[19] "dbvar" "gene" "gds" "geoprofiles" "medgen" "mesh"
[25] "nlmcatalog" "omim" "orgtrack" "pmc" "popset" "proteinclusters"
[31] "pcassay" "protfam" "pccompound" "pcsubstance" "seqannot" "snp"
[37] "sra" "taxonomy" "biocollections" "gtr"
But then I found that the DB is unavailable:
R> entrez_db_summary(db = "genome")
DbName: genome
MenuName: Genome
Description: Genomic sequences, contigs, and maps
DbBuild:
Warning: pback220: DB is unavailable
While the other databases are updated, e.g.:
R> entrez_db_summary(db = "taxonomy")
DbName: taxonomy
MenuName: Taxonomy
Description: Taxonomy db
DbBuild: Build240912-1410.1
Count: 2744579
LastUpdate: 2024/09/12 16:00
I tested whether the genome database was available through the CLI, and that was the case:
$ datasets summary genome taxon 'bats' --assembly-source refseq --as-json-lines | dataformat tsv genome --fields accession,assminfo-name,annotinfo-name,annotinfo-release-date,organism-name
Assembly Accession Assembly Name Annotation Name Annotation Release Date Organism Name
GCF_004115265.2 mRhiFer1_v1.p GCF_004115265.2-RS_2023_02 2023-02-27 Rhinolophus ferrumequinum
GCF_022682495.1 HLdesRot8A GCF_022682495.1-RS_2023_02 2023-02-27 Desmodus rotundus
GCF_027574615.1 DD_ASM_mEF_20220401 GCF_027574615.1-RS_2023_03 2023-03-28 Eptesicus fuscus
GCF_004126475.2 mPhyDis1.pri.v3 NCBI Phyllostomus discolor Annotation Release 101 2020-08-31 Phyllostomus discolor
[...]
In conclusion, it seems to me that the genome database is accessible, but not through the R entrez interface?
The text was updated successfully, but these errors were encountered: