Skip to content

Memory access issue with CUDA #3612

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jhuang2601 opened this issue Mar 27, 2025 · 0 comments · May be fixed by #3611
Open

Memory access issue with CUDA #3612

jhuang2601 opened this issue Mar 27, 2025 · 0 comments · May be fixed by #3611
Assignees
Labels
type: bug Something isn't working type: new A new issue has been created and requires attention

Comments

@jhuang2601
Copy link
Contributor

Both TPLs and GEOS can be compiled on Maple GPUs with clang 17.0.4 .
However, only flow only simulations (e.g., SPE 11) can be running there.
Even with serial run, any case (tested five tutorial examples) involving mechanical deformation failed at the first time step with following error:

terminate called after throwing an instance of 'umpire::runtime_error'
  what():  ! Umpire runtime_error [/shared/data1/Users/j0551570/Compilation/Develop_250324/thirdPartyLibs/build-maple-llvm-release/chai/src/chai/src/tpl/umpire/src/umpire/alloc/CudaMallocAllocator.hpp:62]: cudaFree( ptr = 0xfffa2e000000 ) failed with error: an illegal memory access was encountered

I tried to use --trace-data-migration to trace host-device data migration for running Kirsch problem, which is a simple mechanical example with linear elasticity.

Allocated   49.0 KB to the DEVICE: LvArray::Array<long long, 1, camp::int_seq<long, 0l>, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/nodeManager/mechanicsSolver_totalDisplacement_dofIndex Free memory on device: 94.0 GB
Moved   49.0 KB to the DEVICE: LvArray::Array<long long, 1, camp::int_seq<long, 0l>, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/nodeManager/mechanicsSolver_totalDisplacement_dofIndex
Allocated  147.0 KB to the DEVICE: LvArray::Array<double, 2, camp::int_seq<long, 1l, 0l>, int, LvArray::ChaiBuffer>  Free memory on device: 94.0 GB
Moved  147.0 KB to the DEVICE: LvArray::Array<double, 2, camp::int_seq<long, 1l, 0l>, int, LvArray::ChaiBuffer> 
Allocated  147.0 KB to the DEVICE: LvArray::Array<double, 2, camp::int_seq<long, 1l, 0l>, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/nodeManager/totalDisplacement Free memory on device: 94.0 GB
Moved  147.0 KB to the DEVICE: LvArray::Array<double, 2, camp::int_seq<long, 1l, 0l>, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/nodeManager/totalDisplacement
Allocated  250.0 KB to the DEVICE: LvArray::Array<double, 2, camp::int_seq<long, 0l, 1l>, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/ElementRegions/elementRegionsGroup/Omega/elementSubRegions/cb1/rock_density Free memory on device: 94.0 GB
Moved  250.0 KB to the DEVICE: LvArray::Array<double, 2, camp::int_seq<long, 0l, 1l>, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/ElementRegions/elementRegionsGroup/Omega/elementSubRegions/cb1/rock_density
Freed     8.0 B to the HOST  : LvArray::Array<int, 1, camp::int_seq<long, 0l>, int, LvArray::ChaiBuffer>  Free memory on device: 0.0 B
Freed   24.5 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/nodeManager/elemList/m_sizes Free memory on device: 0.0 B
Freed   24.5 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/nodeManager/elemList/m_offsets Free memory on device: 0.0 B
Freed  392.1 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/nodeManager/elemList/m_values Free memory on device: 0.0 B
Freed   24.5 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/nodeManager/elemSubRegionList/m_sizes Free memory on device: 0.0 B
Freed   24.5 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/nodeManager/elemSubRegionList/m_offsets Free memory on device: 0.0 B
Freed  392.1 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/nodeManager/elemSubRegionList/m_values Free memory on device: 0.0 B
Freed   24.5 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/nodeManager/elemRegionList/m_sizes Free memory on device: 0.0 B
Freed   24.5 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/nodeManager/elemRegionList/m_offsets Free memory on device: 0.0 B
Freed  392.1 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> Problem/domain/MeshBodies/mesh1/meshLevels/Level0/nodeManager/elemRegionList/m_values Free memory on device: 0.0 B
Freed   24.5 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> /m_sizes Free memory on device: 0.0 B
Freed   24.5 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> /m_offsets Free memory on device: 0.0 B
Freed  417.6 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> /m_values Free memory on device: 0.0 B
Freed   24.5 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> /m_sizes Free memory on device: 0.0 B
Freed   24.5 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> /m_offsets Free memory on device: 0.0 B
Freed  324.6 KB to the HOST  : LvArray::ArrayOfArrays<int, int, LvArray::ChaiBuffer> /m_values Free memory on device: 0.0 B
Freed  147.0 KB to the HOST  : LvArray::Array<double, 2, camp::int_seq<long, 1l, 0l>, int, LvArray::ChaiBuffer>  Free memory on device: 0.0 B
Freed  147.0 KB to the DEVICE: LvArray::Array<double, 2, camp::int_seq<long, 1l, 0l>, int, LvArray::ChaiBuffer>  Free memory on device: 0.0 B 

Issue came after this step

Problem/domain/MeshBodies/mesh1/meshLevels/Level0/ElementRegions/elementRegionsGroup/Omega/elementSubRegions/cb1/rock_density
 Freed    8.0 B to the HOST : LvArray::Array<int, 1, camp::int_seq<long, 0l>, int, LvArray::ChaiBuffer> Free memory on device: 0.0 B

What is the possible cause of this memory access issue with CUDA? and how to fix it?

@jhuang2601 jhuang2601 added type: new A new issue has been created and requires attention type: bug Something isn't working labels Mar 27, 2025
@jhuang2601 jhuang2601 linked a pull request Mar 27, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working type: new A new issue has been created and requires attention
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants