Gradual slowdown of Amr-wind solver performance #1442

lawrenceccheung · 2025-01-14T05:53:31Z

Bug description

After running the ExaWind driver or AMR-Wind solver for 10,000's or 100,000's of iterations, there is sometimes a noticeable slowdown in the solver performance. Solve times which were initially on the order of the 3-4 secs/iter can grow to 8-9 secs/iter.

This example is a case is from @ndevelder using the exawind hybrid solver, and showing that the slowdown is coming from the AMR-Wind solver alone:

It also appears in AMR-Wind only solutions, in this this case a 9 turbine wind farm case run with OpenFAST coupling

Here the typical solve time per iterations starts out around ~0.5 sec/iter and then grows to ~1 sec/iter about 40,000 iterations later. What's interesting is that if you restart the case, the solve time go back to ~0.5 sec/iter before slowly growing again.

Timing data from the log files can be extracted and plotted using

grep WallClockTime log1.txt |gnuplot -p -e "set yr [0:10]; plot '<cat' using 2:6;"

for AMR-Wind log files and

grep "AMR-Wind::Total" log | gnuplot -p -e "set yr [0:10]; plot '<cat' using 2:5;"

for ExaWind log files.

Note the number of solver iterations remains constant in AMR-Wind, here is a plot of the MAC and Nodal projection iterations required over the length the run:

Note also that the solve process also seems relatively unaffected, see the before restart/after restarts snippet from the log file below.

Steps to reproduce

Steps to reproduce the behavior:

Compiler used
- GCC
- LLVM
- oneapi (Intel)
- nvcc (NVIDIA)
- rocm (AMD)
- with MPI
- Clang
Operating system
- Linux
- OSX
- Windows
- other (do tell ;)):
Hardware:
- CPU
- GPU
Machine details ():
Observed this on runs with:

Frontier (GPU)
Sandia HPC (CPU)

Input file attachments
Error (paste or attach):

Step: 106124 dt: 0.02 Time: 28009.96 to 28009.98
CFL: 0.292407 (conv: 0.101057 diff: 0 src: 0.236542 )

Godunov:
  System                     Iters      Initial residual        Final residual
  ----------------------------------------------------------------------------
  MAC_projection                 4           1.706609085       1.653896045e-06
  temperature_solve              2       0.0001600051844       1.516013981e-10
  tke_solve                      1        0.001830700216       1.433928062e-06
  velocity_solve                 1        0.002116535213       2.247267522e-06
  Nodal_projection               4           2.918882696       5.527631235e-07

WallClockTime: 106124 Pre: 0.0393 Solve: 1.014 Post: 0.0176 Total: 1.071
Solve time per cell: 9.08e-06

Step: 106124 dt: 0.02 Time: 28009.96 to 28009.98
CFL: 0.292413 (conv: 0.101057 diff: 0 src: 0.236548 )

Godunov:
  System                     Iters      Initial residual        Final residual
  ----------------------------------------------------------------------------
  MAC_projection                 4           1.706301224       1.476950891e-06
  temperature_solve              2       0.0001600004604       1.516582415e-10
  tke_solve                      1        0.001830697993       1.433928058e-06
  velocity_solve                 1        0.002116534853       2.247266377e-06
  Nodal_projection               4           2.918889138       5.520496715e-07

WallClockTime: 106124 Pre: 0.0338 Solve: 0.5783 Post: 0.00379 Total: 0.616
Solve time per cell: 7.768e-06

If this is a segfault, a stack trace from a debug build (paste or attach):

<!-- stack trace -->

AMR-Wind information

Problem has existed since at least

==============================================================================
                AMR-Wind (https://github.com/exawind/amr-wind)

  AMR-Wind version :: v2.0.0-4-gc70c279e
  AMR-Wind Git SHA :: c70c279eb6901edc4466d6f96f10e522ca6b62f9
  AMReX version    :: 24.03-36-g748f8dfea597

  Exec. time       :: Mon May 27 03:00:45 2024
  Build time       :: May 20 2024 00:00:24
  C++ compiler     :: Clang 15.0.0

  MPI              :: ON    (Num. ranks = 2400)
  GPU              :: ON    (Backend: HIP)
  OpenMP           :: OFF

  Enabled third-party libraries: 
    NetCDF    4.7.4
    HYPRE     2.31.0
    OpenFAST

The text was updated successfully, but these errors were encountered:

marchdf · 2025-01-14T16:00:19Z

Thanks @lawrenceccheung . Were you able to check just the precursor sims? I would like to rule that out.

lawrenceccheung · 2025-01-14T16:23:29Z

So this is interesting... in all of the precursor (ABL only) cases I checked, the AMR-Wind solve times have been incredibly consistent, no increasing trend in time/iter. Note that I only checked runs on Sandia hardware, Frontier is down today but I can check when that machine comes back up.

This might point to something going on in the I/O for boundary inflow/outflow, or something else that happens when we do these turbine simulations.

Lawrence

marchdf · 2025-01-14T16:44:52Z

Thanks for checking that. Glad the precursor is fine.

Yeah... or maybe openfast somehow? I am trying to think of the best way to check this... maybe running out the uniform ALM regtest (which doesn't use boundary planes) could help eliminate candidates?

lawrenceccheung · 2025-01-14T16:48:56Z

Yes, openfast was one of my initial thoughts too, because of the amount of I/O and memory that it requires. But in the exawind hybrid solver runs, the openfast coupling runs through Nalu-Wind. The solve times/iteration look pretty good on the Nalu-Wind side, so I'm thinking it might be something else.

ndevelder · 2025-01-14T17:34:17Z

@lawrenceccheung maybe we should also try a precursor with increasing refinement levels turned on? I think we wanted to do one of those for the benchmark anyway? I could imagine that just adding levels is a possible culprit?

lawrenceccheung · 2025-01-14T18:22:16Z

That's a good thought, all of the turbine cases have refinement regions that could a slow memory leak. Let me see if there's any ABL case that I've run before that has refinement regions, and if that has any impact on long term performance.

moprak-nrel · 2025-01-15T18:41:34Z

Here is a plot from the HFM benchmark neutral case using AMR-wind. It was run on Kestrel using 20 (CPU) nodes without refinement zones. I don't quite see as catastrophic a slowdown as your case @lawrenceccheung. However, there is a very small gradual slowdown over ~75,000 time steps.

Solve times (red), and moving average (blue) of solve times:

Post restart:

lawrenceccheung · 2025-01-15T19:24:46Z

Thanks @moprak-nrel. I just checked the NREL5MW ALM case that we ran for the ExaWind benchmarks, and there we see that the solve times remain relatively steady across 50,000+ iterations:

This is interesting because that ALM case includes boundary I/O planes and refinements. So I'm not sure yet what could be the problem, but will continue looking across all of the cases that we have.

marchdf · 2025-01-15T19:35:40Z

Ooof the plot thickens. Keep posting data @lawrenceccheung and hopefully we can make some sense of this.

moprak-nrel · 2025-02-03T22:52:22Z

Looked at a more recent smaller run, [nx ny nz] = [256 144 96] run on kestrel. The solve times appear very steady over 100,000 steps.

github-actions · 2025-03-06T02:39:24Z

This issue is stale because it has been open 30 days with no activity.

lawrenceccheung · 2025-03-19T16:31:25Z

Adding some of the latest timing information from the NREL5MW benchmark case that @ndevelder is running. This is a blade-resolved hybrid ExaWind solver simulation with a tower and 3 blades each in their own overset group. We're seeing an unexplained jump in the AMR-Wind solver time early on, and also this long term increase in solve times. The case is run on Sandia Flight with CPU's.

ndevelder · 2025-03-19T17:38:34Z

And if it wasn't clear, the x-axis on the Nalu-Wind portion of the second plot is not right...these need more work to incorporate the number of eqn system iterations

ndevelder · 2025-03-19T18:24:52Z

Here it is fixed

lawrenceccheung added the bug:amr-wind Something isn't working label Jan 14, 2025

lawrenceccheung assigned ndevelder, marchdf and lawrenceccheung Jan 14, 2025

github-actions bot added the no-issue-activity label Mar 6, 2025

lawrenceccheung removed the no-issue-activity label Mar 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradual slowdown of Amr-wind solver performance #1442

Gradual slowdown of Amr-wind solver performance #1442

lawrenceccheung commented Jan 14, 2025

marchdf commented Jan 14, 2025

lawrenceccheung commented Jan 14, 2025

marchdf commented Jan 14, 2025

lawrenceccheung commented Jan 14, 2025

ndevelder commented Jan 14, 2025

lawrenceccheung commented Jan 14, 2025

moprak-nrel commented Jan 15, 2025 •

edited

Loading

lawrenceccheung commented Jan 15, 2025

marchdf commented Jan 15, 2025

moprak-nrel commented Feb 3, 2025 •

edited

Loading

github-actions bot commented Mar 6, 2025

lawrenceccheung commented Mar 19, 2025

ndevelder commented Mar 19, 2025

ndevelder commented Mar 19, 2025

Gradual slowdown of Amr-wind solver performance #1442

Gradual slowdown of Amr-wind solver performance #1442

Comments

lawrenceccheung commented Jan 14, 2025

Bug description

Steps to reproduce

AMR-Wind information

marchdf commented Jan 14, 2025

lawrenceccheung commented Jan 14, 2025

marchdf commented Jan 14, 2025

lawrenceccheung commented Jan 14, 2025

ndevelder commented Jan 14, 2025

lawrenceccheung commented Jan 14, 2025

moprak-nrel commented Jan 15, 2025 • edited Loading

lawrenceccheung commented Jan 15, 2025

marchdf commented Jan 15, 2025

moprak-nrel commented Feb 3, 2025 • edited Loading

github-actions bot commented Mar 6, 2025

lawrenceccheung commented Mar 19, 2025

ndevelder commented Mar 19, 2025

ndevelder commented Mar 19, 2025

moprak-nrel commented Jan 15, 2025 •

edited

Loading

moprak-nrel commented Feb 3, 2025 •

edited

Loading