-
Notifications
You must be signed in to change notification settings - Fork 11
Allow building with HIP devices #118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: development
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -51,7 +51,7 @@ void ExaEpi::Utils::getTestParams (TestParams& params, /*!< Test parameters */ | |
| } else if (ic_type == "urbanpop") { | ||
| params.ic_type = ICType::UrbanPop; | ||
| pp.get("urbanpop_filename", params.urbanpop_filename); | ||
| #ifdef AMREX_USE_CUDA | ||
| #if defined(AMREX_USE_CUDA) || defined(AMREX_USE_HIP) | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JaeseungYeom Can you try
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think @stevenhofmeyr set box size to 500 for nvidia gpu only, hence the use of AMREX_USE_CUDA. If we change this to AMREX_USE_GPU, this value will apply to all other GPUs including Intel and AMD ones.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tannguyen153 I think that's how it should be. When using GPUs, whether it's AMD/Intel/NVIDIA, the box size should be larger to minimize MPI communications between boxes on the same GPU, right?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right I think box size should be large enough for all GPUs activated by AMREX_USE_GPU. We can also tune the box size for specific GPUs and enumerate the initial values with AMREX_USE_CUDA, AMREX_USE_HIP and AMREX_USE_SYCL, etc.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am currently testing with that value. I am getting oom error with the value set to 100. I will experiment with it, and let you know.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JaeseungYeom : I just checked the code again - this is a runtime parameter
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tannguyen153 : this is specifically for the UrbanPop code, and defaults to 100 for CPUs and 500 for GPUs. For the census code, it defaults to 16. It's so much more for UrbanPop because the underlying grid is lat/lng, and so we have many grid points that have no communities, unlike the packed allocation for the census where the underlying grid does not relate to physical lat/lng.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JaeseungYeom : you're likely getting oom for smaller box sizes because you'll have too many boxes (most of which will be empty).
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I confirm that I can avoid OOM error using 8 nodes instead of 4 nodes. It looks like the memory is limited on tuolumne as it is shared between GPU and CPU. So, what is the final suggestion? Just remove the HIP flag or separate it from CUDA? Do you want me to try UrbanPop data and play with agent.max_box_size?
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could experiment with We should add extra info there for any new default cases. These parameters are also described in the docs. |
||
| params.max_box_size = 500; | ||
| #else | ||
| params.max_box_size = 100; | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.