Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ if(NOT ${amrex}_POPULATED)
if(AMReX_GPU_BACKEND STREQUAL "CUDA")
enable_language(CUDA)
# include(AMReX_SetupCUDA)
elseif(AMReX_GPU_BACKEND STREQUAL "HIP")
Comment thread
debog marked this conversation as resolved.
endif()

# Bring the populated content into the build
Expand Down
4 changes: 3 additions & 1 deletion src/InteractionModel.H
Original file line number Diff line number Diff line change
Expand Up @@ -286,8 +286,10 @@ struct GetCommunityIndex {
int num_comms;
};

#ifdef AMREX_USE_CUDA
#if defined(AMREX_USE_CUDA)
#define devMemset cudaMemset
#elif defined(AMREX_USE_HIP)
#define devMemset (void)hipMemset
#else
#define devMemset memset
#endif
Expand Down
2 changes: 1 addition & 1 deletion src/Utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ void ExaEpi::Utils::getTestParams (TestParams& params, /*!< Test parameters */
} else if (ic_type == "urbanpop") {
params.ic_type = ICType::UrbanPop;
pp.get("urbanpop_filename", params.urbanpop_filename);
#ifdef AMREX_USE_CUDA
#if defined(AMREX_USE_CUDA) || defined(AMREX_USE_HIP)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JaeseungYeom Can you try #if defined(AMREX_USE_GPU) instead of #if defined(AMREX_USE_CUDA) || defined(AMREX_USE_HIP) here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @stevenhofmeyr set box size to 500 for nvidia gpu only, hence the use of AMREX_USE_CUDA. If we change this to AMREX_USE_GPU, this value will apply to all other GPUs including Intel and AMD ones.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannguyen153 I think that's how it should be. When using GPUs, whether it's AMD/Intel/NVIDIA, the box size should be larger to minimize MPI communications between boxes on the same GPU, right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I think box size should be large enough for all GPUs activated by AMREX_USE_GPU. We can also tune the box size for specific GPUs and enumerate the initial values with AMREX_USE_CUDA, AMREX_USE_HIP and AMREX_USE_SYCL, etc.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am currently testing with that value. I am getting oom error with the value set to 100. I will experiment with it, and let you know.

[yeom2@tuolumne1038:bin]$ srun -N 4 -n 16 --exclusive ./agent inputs.ca 
Initializing AMReX (25.04-9-g30a9768150c4)...
MPI initialized with 16 MPI processes
MPI initialized with thread support level 0
Initializing HIP...
HIP initialized with 16 devices.
2130.442s: flux-shell[1]: ERROR: oom: Memory cgroup out of memory: killed 1 task on tuolumne1042.
2130.442s: flux-shell[1]: ERROR: oom: memory.peak = 240.32831G

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JaeseungYeom : I just checked the code again - this is a runtime parameter agent.max_box_size

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tannguyen153 : this is specifically for the UrbanPop code, and defaults to 100 for CPUs and 500 for GPUs. For the census code, it defaults to 16. It's so much more for UrbanPop because the underlying grid is lat/lng, and so we have many grid points that have no communities, unlike the packed allocation for the census where the underlying grid does not relate to physical lat/lng.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JaeseungYeom : you're likely getting oom for smaller box sizes because you'll have too many boxes (most of which will be empty).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm that I can avoid OOM error using 8 nodes instead of 4 nodes. It looks like the memory is limited on tuolumne as it is shared between GPU and CPU. So, what is the final suggestion? Just remove the HIP flag or separate it from CUDA? Do you want me to try UrbanPop data and play with agent.max_box_size?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could experiment with agent.max_box_size to find the best settings. Ideally the code in Utils.cpp will set usable defaults for every common situation.
The defaults should also be described in examples/inputs.defaults, e.g.:

# if ic_type is census
# agent.max_box_size = 16
# if ic_type is urbanpop and using GPUs
# agent.max_box_size = 500
# if ic_type is urbanpop and not using GPUs
# agent.max_box_size = 100

We should add extra info there for any new default cases. These parameters are also described in the docs.

params.max_box_size = 500;
#else
params.max_box_size = 100;
Expand Down