-
Notifications
You must be signed in to change notification settings - Fork 362
Consolidating OpenACC device-host memory transfers #1315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
abishekg7
wants to merge
7
commits into
MPAS-Dev:develop
Choose a base branch
from
abishekg7:atmosphere/acc_mem_move_per_timestep
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Consolidating OpenACC device-host memory transfers #1315
abishekg7
wants to merge
7
commits into
MPAS-Dev:develop
from
abishekg7:atmosphere/acc_mem_move_per_timestep
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ac98504
to
4845ce2
Compare
This PR consolidates much of the OpenACC host and device data transfers during the course of the dynamical execution to two subroutines mpas_atm_pre_dynamics _h2d and mpas_atm_post_dynamics_d2h that are called before and after the call to atm_srk3 subroutine. Due to atm_compute_solve_diagnostics also being called once before the start of model run, we also have a pair of subroutines mpas_atm _pre_computesolvediag_h2d and mpas_atm_post_computesolvediag_d2h to handle data movements around the first call to atm_compute_solve_diagnostics. Any fields copied onto the device in these subroutines are removed from explicit data movement statements in the dynamical core. The mesh/time-invariant fields are still copied onto the device in mpas_atm_ dynamics_init and removed from the device in mpas_atm_dynamics_finalize, with the exception of select fields moved in mpas_atm_pre_computesolvediag_h2d and mpas_atm_post_computesolvediag_d2h. This is a special case due to atm_compute_ solve_diagnostics being called for the first time before the call to mpas_atm_ dynamics_init This PR also includes explicit host-device data transfers in the mpas_atm_iau, mpas_atmphys_interface and mpas_atmphys_todynamics modules to ensure that the physics and IAU regions, which run on CPU, use the latest values from the dynamical core running on GPUs, and vice versa. In addition, this PR also includes explicit data transfers around halo exchanges in the atm_srk3 subroutine. These subroutines for data routines, and the acc update statements are an interim solution until we have a book-keeping method in place. This PR also introduces a couple of new timers to keep track of the cost of data transfers.
e8c9c64
to
e4c2509
Compare
@mgduda I think it might be ready for a second look. I did try to move the |
…t_2d This commit introduces two OpenACC data transfer routines, mpas_reconstruct_2d_h2d and mpas_reconstruct_2d_d2h in order to remove the data transfers from the mpas_reconstruct_2d routine itself. This also allows us to remove extraneous data movements within the atm_srk3 routine. mpas_reconstruct_2d_h2d and mpas_reconstruct_2d_d2h are called before and after the call to mpas_reconstruct in atm_mpas_init_block. And the reconstructed vector fields are also copied to and from the device before and after every dynamics call in mpas_atm_pre_dynamics_h2d and mpas_atm_post_dynamics_d2h.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR consolidates much of the OpenACC host and device data transfers during the course of the dynamical execution to two subroutines
mpas_atm_pre_dynamics_h2d
andmpas_atm_post_dynamics_d2h
that are called before and after the call toatm_srk3
subroutine. Due toatm_compute_solve_diagnostics
also being called once before the start of model run, we also have a pair of subroutinesmpas_atm_pre_computesolvediag_h2d
andmpas_atm_post_computesolvediag_d2h
to handle data movements around the first call toatm_compute_solve_diagnostics
. Any fields copied onto the device in these subroutines are removed from explicit data movement statements in the dynamical core.The mesh/time-invariant fields are still copied onto the device in
mpas_atm_dynamics_init
and removed from the device inmpas_atm_dynamics_finalize
, with the exception of select fields moved inmpas_atm_pre_computesolvediag_h2d
andmpas_atm_post_computesolvediag_d2h
. This is a special case due toatm_compute_solve_diagnostics
being called for the first time before the call tompas_atm_dynamics_init
This PR also includes explicit host-device data transfers in the
mpas_atm_iau
,mpas_atmphys_interface
andmpas_atmphys_todynamics
modules to ensure that the physics and IAU regions, which run on CPU, use the latest values from the dynamical core running on GPUs, and vice versa. In addition, this PR also includes explicit data transfers around halo exchanges in theatm_srk3
subroutine.These subroutines for data routines, and the
acc update
statements are an interim solution until we have a book-keeping method in place.This PR also introduces a couple of new timers to keep track of the cost of data transfers.