-
Notifications
You must be signed in to change notification settings - Fork 212
Remove unnecessary test sleep #4037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This removes about 15 seconds of test runtime, for --config local tests.
This delay was introduced as part of new tests for resource records, in:
Test that monitoring resource rows are recorded for a long task (#1932)
but then later on, the monitoring code was modified to always send one
final record, even for very fast tasks, which meant the delay was no longer
needed, in:
Separate accumulation and sending of resource info (#2380)
Collaborator
Author
|
looks like this reveals a different test race condition - converting to draft |
benclifford
added a commit
that referenced
this pull request
Dec 2, 2025
This was uncovered by removal of an unrelated sleep in PR #4037, but I think it will present itself (as missing monitoring data, rather than as an error/exception) in user-facing code. Prior to this PR: A monitoring message could be written to the new_dir and then shortly after the exit event could be set, sufficiently close in time that the monitoring radio receiver loop exited without seeing those new message files. This PR modifies the exit behaviour of that loop to have one final iteration with the following ordering of events: i) monitoring messages are written ii) task completes iii) parsl begins to shut down iv) monitoring radio exit event is set by DFK v) monitoring radio loop observes exit event vi) monitoring radio loop performs one final processing of directory The new behaviour here is step vi, that a final directory processing will always happen strictly after the exit event is set, which is strictly after the monitoring messages are written in step i, assuming directories are consistently observable from different places in the filesystem. The misbehaviour can be observed fairly easily by increasing the delay time of the loop before this PR for example to 10 seconds and running the test suite. With this race condition addressed, the loop poll period can be made longer and this PR arbitrarily increases it from 1 second to 10 seconds - although it could also be made configurable.
Collaborator
Author
|
see #4041 attempting to address the race condition uncovered here |
github-merge-queue bot
pushed a commit
that referenced
this pull request
Dec 2, 2025
This was uncovered by removal of an unrelated sleep in PR #4037, but I think it will present itself (as missing monitoring data, rather than as an error/exception) in user-facing code. ### Prior to this PR A monitoring message could be written to the new_dir and then shortly after the exit event could be set, sufficiently close in time that the monitoring radio receiver loop exited without seeing those new message files. This PR modifies the exit behaviour of that loop to have one final iteration with the following ordering of events: 1. monitoring messages are written 2. task completes 3. parsl begins to shut down 4. monitoring radio exit event is set by DFK 5. monitoring radio loop observes exit event ### As of this PR 6. monitoring radio loop performs one final processing of directory The new behaviour here is step 6, that a final directory processing will always happen strictly after the exit event is set, which is strictly after the monitoring messages are written in step 1, assuming directories are consistently observable from different places in the filesystem. The misbehaviour can be observed by increasing the delay time of the loop before this PR (for example to 10 seconds) and running the test suite. With this race condition addressed, the loop poll period can be made longer and this PR arbitrarily increases it from 1 second to 10 seconds - although it could also be made configurable. # Changed Behaviour I expect some situations where end of task monitoring data may have been missing to now not be missing that data. ## Type of change - Bug fix
khk-globus
pushed a commit
that referenced
this pull request
Dec 2, 2025
This was uncovered by removal of an unrelated sleep in PR #4037, but I think it will present itself (as missing monitoring data, rather than as an error/exception) in user-facing code. Prior to this PR: A monitoring message could be written to the new_dir and then shortly after the exit event could be set, sufficiently close in time that the monitoring radio receiver loop exited without seeing those new message files. This PR modifies the exit behaviour of that loop to have one final iteration with the following ordering of events: i) monitoring messages are written ii) task completes iii) parsl begins to shut down iv) monitoring radio exit event is set by DFK v) monitoring radio loop observes exit event vi) monitoring radio loop performs one final processing of directory The new behaviour here is step vi, that a final directory processing will always happen strictly after the exit event is set, which is strictly after the monitoring messages are written in step i, assuming directories are consistently observable from different places in the filesystem. The misbehaviour can be observed fairly easily by increasing the delay time of the loop before this PR for example to 10 seconds and running the test suite. With this race condition addressed, the loop poll period can be made longer and this PR arbitrarily increases it from 1 second to 10 seconds - although it could also be made configurable.
github-merge-queue bot
pushed a commit
that referenced
this pull request
Dec 2, 2025
This was uncovered by removal of an unrelated sleep in PR #4037, but I think it will present itself (as missing monitoring data, rather than as an error/exception) in user-facing code. ### Prior to this PR A monitoring message could be written to the new_dir and then shortly after the exit event could be set, sufficiently close in time that the monitoring radio receiver loop exited without seeing those new message files. This PR modifies the exit behaviour of that loop to have one final iteration with the following ordering of events: 1. monitoring messages are written 2. task completes 3. parsl begins to shut down 4. monitoring radio exit event is set by DFK 5. monitoring radio loop observes exit event ### As of this PR 6. monitoring radio loop performs one final processing of directory The new behaviour here is step 6, that a final directory processing will always happen strictly after the exit event is set, which is strictly after the monitoring messages are written in step 1, assuming directories are consistently observable from different places in the filesystem. The misbehaviour can be observed by increasing the delay time of the loop before this PR (for example to 10 seconds) and running the test suite. With this race condition addressed, the loop poll period can be made longer and this PR arbitrarily increases it from 1 second to 10 seconds - although it could also be made configurable. # Changed Behaviour I expect some situations where end of task monitoring data may have been missing to now not be missing that data. ## Type of change - Bug fix
khk-globus
approved these changes
Dec 4, 2025
Collaborator
khk-globus
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amusing how 3 seconds turns into 15 seconds.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This removes about 15 seconds of test runtime, for --config local tests.
This delay was introduced as part of new tests for resource records, in:
but then later on, the monitoring code was modified to always send one final record, even for very fast tasks, which meant the delay was no longer needed, in:
Changed Behaviour
none
Type of change