-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallelize reconstructions with submitit
#477
Conversation
submitit
parallelizationsubmitit
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #477 +/- ##
=====================================
Coverage 9.54% 9.54%
=====================================
Files 30 30
Lines 4591 4591
=====================================
Hits 438 438
Misses 4153 4153 ☔ View full report in Codecov by Sentry. |
|
@talonchandler Here are some comments trying to use this:
Minor
|
Thanks for testing @edyoshikun!
Hmmm...I couldn't get your script to fail after ~10 trials. We might need to pair on your failures. You got a SIGTERM...what terminal are you running from? Did you always get the same failure? (I saw one set of failures in
You helped me find a bug that accidentally set the number of requested CPUs to the number of timepoints. This branch now requests the same number of CPUs as the
Fixed! Thanks for flagging.
Hmmm...on a first pass I'm not seeing an easy way to do this (without a submitit PR or duplication of a large submitit class), and I'm trying to understand your debugging case where folders would help? Naively, the only difference between a folder and the current behavior is a
I think I've fixed this by always printing white when the monitor exits. If this persists we might need to pair to debug this. Thanks for your helpful review! |
It ran now without any issues. I want to say the culprit is
I was using the linux shell on the HPC launching the jobs from |
@edyoshikun I just finished an end-to-end test, and I happy with this branch. I also committed a change that resets terminal colors when this is complete. In my opinion this is ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR uses
submitit
to parallelize reconstructions with the following features:slurm
is available, submits batched jobs to nodes (one job per ome-zarr position)slurm
is not available, submits jobs to individual processes (one PID per ome-zarr position)--ram-multiplier
This PR also features several nice-to-have features:
-i plate.zarr
, which it will expand into a list of positionsmonitor_jobs
utility that makes well-job-node mappings very clear---handy for when things fail