Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FFT causes the run to stop #2122

Closed
wang1202 opened this issue Feb 14, 2025 · 4 comments
Closed

FFT causes the run to stop #2122

wang1202 opened this issue Feb 14, 2025 · 4 comments

Comments

@wang1202
Copy link

Hello,

I found that some recent updates might have caused FFT to fail. I always applied FFT when running FlowInAFlow. This week, after I pulled the latest changes, rebuilt the code using cmake_with_fft.sh in Build/ and ran it, I found that FFT causes the run to stop. The code still runs fine if I set erf.use_fft = 0. Below is the input file I used.

# ------------------  INPUTS TO MAIN PROGRAM  -------------------
stop_time = 300.

amrex.fpe_trap_invalid = 1

erf.anelastic = 1

erf.use_fft = 1

erf.init_type = "uniform"

fabarray.mfiter_tile_size = 1024 1024 1024

# PROBLEM SIZE & GEOMETRY
geometry.prob_lo = -1. -1.  0.
geometry.prob_hi =  1.  1.  1.

#coarse
amr.n_cell       = 64  64  32

#fine
#amr.n_cell       = 128 128 64

geometry.is_periodic = 0 0 0

xlo.type = "SlipWall"
xhi.type = "SlipWall"
ylo.type = "SlipWall"
yhi.type = "SlipWall"
zlo.type = "SlipWall"
zhi.type = "SlipWall"
#geometry.is_periodic = 1 1 0

erf.dycore_horiz_adv_type  = Blended_5th6th
erf.dycore_vert_adv_type   = Blended_5th6th
erf.dryscal_horiz_adv_type = WENOZ5
erf.dryscal_vert_adv_type  = WENOZ5

xlo.theta = 288.
xhi.theta = 288.
ylo.theta = 288.
yhi.theta = 288.
zlo.theta = 294.
zhi.theta = 282.

xlo.density = 1.2225686
xhi.density = 1.2225686
ylo.density = 1.2225686
yhi.density = 1.2225686
zlo.density = 1.2225686
zhi.density = 1.2225686

# TIME STEP CONTROL
erf.cfl            = 0.8
erf.substepping_cfl = 0.8

# DIAGNOSTICS & VERBOSITY
erf.sum_interval   = 5       # timesteps between computing mass
erf.v              = 5       # verbosity in ERF.cpp, default = 1
amr.v              = 5       # verbosity in Amr.cpp, default = 1

# REFINEMENT / REGRIDDING
amr.max_level       = 0       # maximum level number allowed

# CHECKPOINT FILES
erf.check_file      = chk        # root name of checkpoint file
erf.check_int       = 2000       # number of timesteps between checkpoints

# PLOTFILES
erf.plot_file_1     = plt        # prefix of plotfile name
erf.plot_int_1      = 1000        # number of timesteps between plotfiles
erf.plot_vars_1     = density x_velocity y_velocity z_velocity pressure temp theta diss

erf.use_gravity = true

# SOLVER CHOICE
erf.molec_diff_type  = "None"
erf.alpha_T = 0.0
erf.alpha_C = 0.0

erf.les_type  = "Smagorinsky"
erf.Cs        = 0.1
erf.Pr_t      = 0.33333333333333

# PROBLEM PARAMETERS
prob.rho_0 = 1.0
prob.T_0          = 288.
prob.T_0_Pert_Mag = 0.1
prob.U_0_Pert_Mag = 0.01
prob.V_0_Pert_Mag = 0.01
prob.W_0_Pert_Mag = 0.01

This is the output of the second time step:

[Level 0 step 2] ADVANCE from time = 1 to 2.1 with dt = 1.1
Making slow rhs at time 1 for fast variables advancing from 1 to 2.1
 No-substepping time integration at level 0 to 2.1 with dt = 1.1
Time integration of scalars at level 0 from 1 to 2.1 with dt = 1.1 using RHS created at 1
Making slow rhs at time 2.1 for fast variables advancing from 1 to 2.1
 No-substepping time integration at level 0 to 2.1 with dt = 1.1
SIGILL Invalid, privileged, or ill-formed instruction
See Backtrace.0 file for details
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI COMM 3 DUP FROM 0
  Proc: [[54447,0],0]
  Errorcode: 4

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------

And this is the backtrace file:

 0: amrex::BLBackTrace::print_backtrace_info(__sFILE*) (in erf_flow_in_a_box) + 64

 1: amrex::BLBackTrace::handler(int) (in erf_flow_in_a_box) + 644

 2: _sigtramp (in libsystem_platform.dylib) + 56

 3: void amrex::ParallelFor<void amrex::FFT::R2X<double>::post_forward_doit<2, amrex::FArrayBox, amrex::FFT::Poisson<amrex::MultiFab>::solve(amrex::MultiFab&, amrex::MultiFab const&)::'lambda'(int, int, int, auto&)>(amrex::FArrayBox*, amrex::FFT::Poisson<amrex::MultiFab>::solve(amrex::MultiFab&, amrex::MultiFab const&)::'lambda'(int, int, int, auto&) const&)::'lambda'(int, int, int), 3>(amrex::BoxND<3> const&, auto const&) (in erf_flow_in_a_box) + 808

 4: void amrex::FFT::R2X<double>::forwardThenBackward_doit_1<amrex::FFT::Poisson<amrex::MultiFab>::solve(amrex::MultiFab&, amrex::MultiFab const&)::'lambda'(int, int, int, auto&)>(amrex::MultiFab const&, amrex::MultiFab&, auto const&, amrex::IntVectND<3> const&, amrex::Periodicity const&) (in erf_flow_in_a_box) + 2224

 5: amrex::FFT::Poisson<amrex::MultiFab>::solve(amrex::MultiFab&, amrex::MultiFab const&) (in erf_flow_in_a_box) + 756

 6: ERF::solve_with_fft(int, amrex::MultiFab&, amrex::MultiFab&, std::__1::array<amrex::MultiFab, 3ul>&) (in erf_flow_in_a_box) + 584

 7: ERF::project_velocities(int, double, amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, amrex::MultiFab&) (in erf_flow_in_a_box) + 10708

 8: std::__1::__function::__func<ERF::advance_dycore(int, amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::Geometry, double, double)::$_0, std::__1::allocator<ERF::advance_dycore(int, amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::Geometry, double, double)::$_0>, void (amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, double, double, int)>::operator()(amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, double&&, double&&, int&&) (in erf_flow_in_a_box) + 21668

 9: MRISplitIntegrator<amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>>::advance(amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, double, double) (in erf_flow_in_a_box) + 712

10: ERF::advance_dycore(int, amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, amrex::Vector<amrex::MultiFab, std::__1::allocator<amrex::MultiFab>>&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::MultiFab&, amrex::Geometry, double, double) (in erf_flow_in_a_box) + 8192

11: ERF::Advance(int, double, double, int, int) (in erf_flow_in_a_box) + 3904

12: ERF::timeStep(int, double, int) (in erf_flow_in_a_box) + 988

13: ERF::Evolve() (in erf_flow_in_a_box) + 432

14: main (in erf_flow_in_a_box) + 1680

15: start (in dyld) + 2840
@wang1202
Copy link
Author

Update: I found that using fixed dt can stabilized the simulation for more time steps, but still crash at some point. Without using FFT the simulation can run with erf.cfl = 0.9. In other words, FFT just make the simulation very unstable, but I remember it worked well previously.

@asalmgren
Copy link
Collaborator

@wang1202 -- I'm unable to reproduce the error you're seeing, and the error message itself makes me a little suspicious that it's not an ERF-specific error. I built both DEBUG and non-DEBUG, using cmake and gmake, and tried running with 1 rank and 4 ranks. All cases ran well past the second step with your inputs file. Can you make sure you're running with the most recent ERF development branch, and git submodule update as well?

@wang1202
Copy link
Author

@asalmgren -- Thank you for replying over the weekend and for your detailed testing. Your testing is very helpful. I found that this issue may be related to the compiling packages. The simulation works on the lab's HPC, where I built it with GCC 10.3.0 and installed FFTW using Conda. On my laptop, the C++ compiler is Clang, and I tried installing FFTW from two sources. If I install FFTW via Homebrew, the run crashes at step 2. If I install FFTW via Conda (as I did on the lab's HPC), the run crashes at step 6. Since this issue is not specific to ERF, I think I can close it here. I can share more information with you if I find a combination that works on my laptop.

@asalmgren
Copy link
Collaborator

asalmgren commented Feb 17, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants