Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data download for Stable Diffusion fails #731

Closed
coppock opened this issue Apr 23, 2024 · 4 comments · Fixed by #752
Closed

Data download for Stable Diffusion fails #731

coppock opened this issue Apr 23, 2024 · 4 comments · Fixed by #752

Comments

@coppock
Copy link

coppock commented Apr 23, 2024

After building the Docker image provided in stable_diffusion, the first data download command fails as follows:

root@0d839dc3dd25:/workspace# scripts/datasets/laion400m-filtered-download-moments.sh --output-dir /datasets/laion-400m/webdataset-moments-filtered
scripts/datasets/laion400m-filtered-download-moments.sh: line 18: rclone: command not found
scripts/datasets/laion400m-filtered-download-moments.sh: line 20: rclone: command not found
scripts/datasets/laion400m-filtered-download-moments.sh: line 22: rclone: command not found
sha512sum: sha512sums.txt: No such file or directory
root@0d839dc3dd25:/workspace# 
@amasin2111
Copy link

Observing same issue

@ahmadki
Copy link
Contributor

ahmadki commented Jul 2, 2024

The issue originated after merging: #712

The dataset was being downloaded from MLC S3 bucket directly using wget, the PR changed the download method to rclone+cloudflare.
rclone is not installed in the docker image so I added it in: #752

@amasin2111
Copy link

Even if download the rclone separately, then use the script laion400m-filtered-download-images.sh, we were getting an error that the source directory doesn't exist. Specifically below command is giving this error
rclone copy mlc-training:mlcommons-training-wg-public/stable_diffusion/datasets/laion-400m/moments-webdataset-filtered/ ${OUTPUT_DIR} --include="*.tar" -P"

@ahmadki
Copy link
Contributor

ahmadki commented Jul 2, 2024

I just saw #751, I'll look into at and solve the issue ASAP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants