-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion error with boundary planes #1525
Comments
Is |
It is hard to say since the same boundary plane dataset run with one set of GPUs and failed with another set. Wondering if it can be a Kestrel specific issue. |
Potential improvement: assert -> abort statement to run better diagnostics |
Had a repeat today. Happens more often on CPU. I have been mostly running on GPU before the current issues with queue and did not observe it. |
I will close this for now and reopen in future, if required. |
Bug description
I was running an inflow-outflow simulation with boundary plane data from precursor and got an assertion error. Interestingly, one of the cases ran fine while the other crashed.
4 GPU Nodes on Kestrel - Crashed
5 GPU Nodes on Kestrel - Ran till completion
Steps to reproduce
I can share the case file in Kestrel.
Steps to reproduce the behavior:
Kestrel
module purge
module load binutils
module load PrgEnv-nvhpc
module load cray-libsci/22.12.1.1
module load cmake
module load cmake/3.27.9
module load cray-python
module load netcdf-fortran/4.6.1-oneapi
module load craype-x86-genoa
module load craype-accel-nvidia90
export MPICH_GPU_SUPPORT_ENABLED=0
export CUDAFLAGS="-L/nopt/nrel/apps/gpu_stack/libraries-gcc/06-24/linux-rhel8-zen4/gcc-12.3.0/hdf5-1.14.3-zoremvtiklvvkbtr43olrq3x546pflxe/lib -I/nopt/nrel/apps/gpu_stack/libraries-gcc/06-24/linux-rhel8-zen4/gcc-12.3.0/hdf5-1.14.3-zoremvtiklvvkbtr43olrq3x546pflxe/include -lhdf5 -lhdf5_hl -I${MPICH_DIR}/include -L${MPICH_DIR}/lib -lmpi ${PE_MPICH_GTL_DIR_nvidia90} ${PE_MPICH_GTL_LIBS_nvidia90}"
export CXXFLAGS="-L/nopt/nrel/apps/gpu_stack/libraries-gcc/06-24/linux-rhel8-zen4/gcc-12.3.0/hdf5-1.14.3-zoremvtiklvvkbtr43olrq3x546pflxe/lib -I/nopt/nrel/apps/gpu_stack/libraries-gcc/06-24/linux-rhel8-zen4/gcc-12.3.0/hdf5-1.14.3-zoremvtiklvvkbtr43olrq3x546pflxe/include -lhdf5 -lhdf5_hl -I${MPICH_DIR}/include -L${MPICH_DIR}/lib -lmpi ${PE_MPICH_GTL_DIR_nvidia90} ${PE_MPICH_GTL_LIBS_nvidia90}"
export HDF5_USE_FILE_LOCKING=FALSE
export MPICH_OFI_SKIP_NIC_SYMMETRY_TEST=1
Input file attachments
Error (paste or attach):```
terminate called after throwing an instance of 'std::runtime_error'
what(): Assertion `(m_in_times[0] <= time + constants::LOOSE_TOL) && (time < m_in_times.back() + constants::LOOSE_TOL)
' failed, file "/projects/total/codes/main/amr-wind/amr-wind/wind_energy/ABLBoundaryPlane.cpp", line 1067
==============================================================================
AMR-Wind (https://github.com/exawind/amr-wind)
AMR-Wind version :: v3.4.0
AMR-Wind Git SHA :: 38d1b9f
AMReX version :: 25.01-16-g92d35c2c8163
Exec. time :: Wed Mar 5 11:57:16 2025
Build time :: Feb 12 2025 06:44:50
C++ compiler :: GNU 8.5.0
MPI :: ON (Num. ranks = 16)
GPU :: ON (Backend: CUDA)
OpenMP :: OFF
Enabled third-party libraries:
NetCDF 4.9.2
The text was updated successfully, but these errors were encountered: