-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue launching job within an existing job array job #1782
Comments
Can you check if this context manager solves your issue? Lines 296 to 330 in 189bcc3
If not, can you identify which env variable(s) are still present and interfere with launching new jobs? If needed, we can update the list of variables that are hidden by the context manager. |
@baldassarreFe thank you for getting back to us. This would take care of the variable, but we're worried this solution would alter the state of the existing job's environment variables and thus break requeuing. Is there a way to reinstate the existing job's environment variables after this cleanup step? |
I believe the context manager does exactly what you need. The env is temporarily altered and it's then restored as it was before. You can check by printing the env before and after just to be sure. |
Great thank you!
…On Wed, Dec 4, 2024 at 9:17 AM Federico Baldassarre < ***@***.***> wrote:
I believe the context manager does exactly what you need. The env is
temporarily altered and it's then restored as it was before. You can check
by printing the env before and after just to be sure.
—
Reply to this email directly, view it on GitHub
<#1782 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABSBIOEY334C6JB5FIKLWNT2D4FGDAVCNFSM6AAAAABSRN5WWSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMJXGU3DINJRGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
In our workflow we launch a job array with submitit SLURM for model training. Within each job, we then launch separate jobs for evaluations in our Python code. We confirmed the workflow works when the main training job is not a job array. When the main training job is a job array, the subsequent evaluation jobs look for the
.pkl
with the wrong parent job IDs and exist with an error.The hacky workaround we have for now is to manually remove the environment variable
Is there a recommended workflow to make sure launching a job within an existing submitit job array works as expected?
The text was updated successfully, but these errors were encountered: