Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using NCBI taxid as parameter in download_refseq #34

Open
pedres opened this issue Oct 21, 2024 · 4 comments
Open

Using NCBI taxid as parameter in download_refseq #34

pedres opened this issue Oct 21, 2024 · 4 comments

Comments

@pedres
Copy link

pedres commented Oct 21, 2024

Hi,
I am just starting to use your package. I have list of bacteria I would like to look for in my samples. However, I did a test with a small test following the tutorial and found that the sapply function did not worked. I found that it was due to "Salmonella enterica subsp. enterica serovar Typhi". I think that I have "curated" the list retaining only the bacteria with NCBI taxid and correcting its taxonomy. Curiously, this bacteria has a NCBI taxid (90370). Are you considering to use NCBI taxids as parameter to download_refseq function? In fact, in https://www.ncbi.nlm.nih.gov/datasets/taxonomy/90370/ appear a lot of genomes of this bacteria.
Thank you very much
Manuel
tax_reportOK.zip

@seanlu96
Copy link
Contributor

Hi Manuel, Could you provide the errors that you're receiving with your tests? I took a look at the vignette for the sapply(all_species, download_refseq) section think you are referencing with that taxon (Salmonella enterica subsp. enterica serovar Typhi), and I didn't run into any issues
Thanks!
Sean

@pedres
Copy link
Author

pedres commented Oct 21, 2024

Hi Sean,
If a run
bacteria<-readRDS("tax_reportOK.rds")
somePATHOs<-c(562,1280,573,1313,470,287,1773,1352,547,1319,36470,
28901, 90370,1351,583,613,544,727,620,590,54388,57045,57046,581)

somePATHOs<-bacteria%>%filter(taxid %in% somePATHOs)%>%dplyr::select(tax)%>%dplyr::pull(tax)

sapply(somePATHOs, download_refseq,
reference = FALSE, representative = TRUE, compress = TRUE,
out_dir = target_ref_temp, caching = TRUE)

I get: Error in FUN(X[[i]], ...) : No rank detected
If I remove "Salmonella enterica subsp. enterica serovar Typhi" with somePATHOs[2:12] the function works well
sapply(somePATHOs[2:12], download_refseq,
reference = FALSE, representative = TRUE, compress = TRUE,
out_dir = target_ref_temp, caching = TRUE)

@pedres
Copy link
Author

pedres commented Oct 21, 2024

Hi I also have the same error running

sapply("Salmonella enterica subsp. enterica serovar Typhi", download_refseq,
reference = FALSE, representative = TRUE, compress = TRUE,
out_dir = target_ref_temp, caching = TRUE)

sapply("Salmonella enterica subsp. enterica serovar Typhi", download_refseq,
reference = FALSE, representative = FALSE, compress = TRUE,
out_dir = target_ref_temp, caching = TRUE)

Another question. In the tutorial download_refseq run with reference=FALSE and representative=FALSE, but the default parameters are set to reference=TRUE and representative=FALSE. Are default options for these parameters the best option?
Thank you very much for your help
Manuel

@aubreyodom
Copy link
Collaborator

Hi @pedres , we don't have a time estimate but we're actively looking into the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants