-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Communicator refactor #496
base: main
Are you sure you want to change the base?
Conversation
d512137
to
ad1f6b9
Compare
7a33788
to
d89676c
Compare
dbcd3f1
to
4d3b668
Compare
9bccfe6
to
f2ff331
Compare
2d250ac
to
a94c2dd
Compare
6c6fd66
to
1a0ceb8
Compare
64901fb
to
252166c
Compare
is_new (bool): Indicates whether the "gotten" values are new, | ||
based on the write_id. | ||
""" | ||
if not synchronize: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not synchronize: | |
if synchronize: |
|
||
fullcomm.Barrier() | ||
global_toc("Finalize Complete") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global_toc("Finalize Complete") | |
global_toc("Finalized") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
global_toc("Finalize Complete") | |
global_toc("All cylinders finalized") |
Maybe this instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or ..
global_toc("Finalize Complete") | |
global_toc("Cylinder Finalization Complete") |
This PR finishes the work started in #476 by utilizing the new API introduced for communication in #476 at a higher level. This refactoring has a few major outcomes:
Spoke.got_kill_signal
only updates theSHUTDOWN
field. This mirrors how things are currently done in the hub and removes from spooky action at a distance from the code.Hub
andSpoke
classes and is now inSPCommunicator
FieldLengths
, which centrally stores the length of each field. Validation is done automatically and generically inSPCommunicator.register_recv_field
SPCommunicator
classes now register as a class attribute what they send and receive. This isn't necessary, but makes it possible to write the methodSPCommunicator.register_receive_fields
in a generic way. But maybe we should just be doing calls tosuper()
with traditional inheritance instead?Spoke.got_kill_signal
only checks the shutdown field. Further, instead of all cylinder ranks agreeing on shutting down by checkingis_new
, we now terminate when the first one gets the shutdown indicator through anSPCommunicator.allreduce_or
call to synchronize the action. This results in (a) faster termination and (b) more flexibility:Spoke.got_kill_signal
can be called multiple times to jump out of expensive function executions.