Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve download and file management #72

Open
1 of 4 tasks
LTDakin opened this issue Mar 25, 2025 · 1 comment
Open
1 of 4 tasks

Improve download and file management #72

LTDakin opened this issue Mar 25, 2025 · 1 comment
Assignees

Comments

@LTDakin
Copy link
Contributor

LTDakin commented Mar 25, 2025

We use a naive catch-all get-fits utility function that downloads the entire fits file from the archive. Some functions don't need the whole file, for example getting the source catalog. Instead download just the CAT header from the archive and reduce memory usage/endpoint latency.

Also we removed temp files to stabilize datalab for beta users, but lost the ability to cache them. This would be nice to reimplement with a smarter system that keeps track of available tmp space left. LFU queue for the FITS files.

  • set up a cleaning routine to remove files after operations
  • Write a util function that downloads specific header data using ffspec in astropy
  • rewrite /source-catalog/ to only download CAT header
  • Retain downloaded files in tmp space, managed by a LFU queue that removes the least used file when we reach near disk capacity
@LTDakin LTDakin self-assigned this Mar 31, 2025
@jnation3406
Copy link
Contributor

I think we should use fs_spec for /source-catalog/ since it only wants the smaller CAT and header, but for the data operations that use the data, fs_spec won't save us much since downloading the file is similar to downloading just the data from the file in size. Also, the last task I added about retaining files for a short time to help some things like raw_image wouldn't work if we fs_spec everything.

So I think what we want is something that checks if the file is local - if not, it either downloads the file locally or fs_spec depending on if it needs the image data or not. And when it downloads locally, we leave it there but add a service to delete temp files older than 1 hour (easy way), or we keep track of downloaded files and their sizes in redis and delete files when we get closer to the disk limit, basically implementing a LFU queue (hard way).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants