-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected EOF Errors During Actinopterygii Genomes Download in RefSeq #294
Comments
Hi @mkrg01, Thanks for opening this issue. We were able to reproduce this bug and we are looking into a fix. In the meantime, try adding the The downloaded data files will be gzip compressed on your computer. I'll post a comment on this thread when the bug has been fixed. Best, Eric Cox, PhD [Contractor] (he/him/his) |
Hello @ericcox1, Thank you for working on this issue. I tried running it again with the In any case, I am looking forward to the bug fix. Thank you. |
Hi @mkrg01, Although I was able to reproduce the bug the first day that you reported it, I have tried several times and have not been able to reproduce it since. Would you mind checking if you still see the bug on your end? Best, |
Hi @ericcox1, I tried downloading the datasets yesterday, but I still saw the same bug... The scripts are as follows: datasets download genome taxon 7898 --dehydrated --reference --annotated --include gbff --assembly-source RefSeq --filename data/raw_data/Actinopterygii_dataset.zip
unzip data/raw_data/Actinopterygii_dataset.zip -d data/raw_data/Actinopterygii_dataset
datasets rehydrate --directory data/raw_data/Actinopterygii_dataset/ --gzip --no-progressbar I used a docker image (docker://aurelia01/deep_adapt_ncbi:v2) when running the command this time (and also the first time I posted on this issue). The |
Thanks for the update @mkrg01, we will continue looking into it. I'll comment on this thread with updates. |
Hello developers, I just wanted to add that this is not an isolated issue. I have been experiencing the same bug when downloading large datasets.
I am running datasets version 15.33.0. In the meantime, while you look for a fix for this issue, is there another way that I can download this dataset? Our research project depends on it. This issue does not occur when downloading smaller datasets, for example, all of "Erwiniacaea"... only with large datasets. |
Hi @alpole23, Thanks for your comment on this issue. We are aiming to release a fix for this issue later today. Best, |
Wonderful, thank you! I tested the fix with the Yersiniacea family and received another
It does appear that all of the files rehydrated except for the three "EOF" error files. So, I assume it has something to do with a flaw in the GenBank entry? What exactly does "unexpected EOF" mean? P.S. the files do exist when I check for them via the FTP site... so I was able to manually download the ones with the "unexpected EOF" error. |
Thanks @alpole23, I appreciate the detailed report. Your help debugging this is much appreciated and if you don't mind I have some more questions for you.
Thanks! |
Thank you for the release of the new version, @ericcox1. I attempted to download genomes using In fact, my use case was a bit different. I was trying to download genomes specifically for Teleostei (taxid: 32443), which is a subclade of Actinopterygii. The command I used is below:
The error messages I received are as follows (only a part is displayed): I could find these files were created. However, file sizes seem too small (e.g., the size of |
Thanks @mkrg01, I'll share this with the development team. |
Hi ericcox1, I have a log file of all of the rehydrate errors that I encountered when downloading and rehydrating GenBank sequences from taxon Enterobacterales. I have attached that list here if you and your team would like to use for troubleshooting. |
Hialpole23, Thanks for sharing this! Nuala |
We are continue to work on this issue. With the latest production release, we have implemented a feature to delete invalid files during rehydration. -Eric |
Background:
Encountered multiple
unexpected EOF
errors while attempting to download the RefSeq genomes of Actinopterygii (taxon id: 7898) usingdatasets
version 15.30.0.Steps to Reproduce:
Initial Download Command:
Unzipping the Package:
Rehydration Process (Error Occurs Here):
Observed Error Messages:
During the rehydration step, the process repeatedly fails with
unexpected EOF
errors. The error log is as follows:I would greatly appreciate your assistance in addressing this matter.
The text was updated successfully, but these errors were encountered: