Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradual slowdown of Amr-wind solver performance #1442

Open
5 of 13 tasks
lawrenceccheung opened this issue Jan 14, 2025 · 14 comments
Open
5 of 13 tasks

Gradual slowdown of Amr-wind solver performance #1442

lawrenceccheung opened this issue Jan 14, 2025 · 14 comments
Assignees
Labels
bug:amr-wind Something isn't working

Comments

@lawrenceccheung
Copy link
Contributor

Bug description

After running the ExaWind driver or AMR-Wind solver for 10,000's or 100,000's of iterations, there is sometimes a noticeable slowdown in the solver performance. Solve times which were initially on the order of the 3-4 secs/iter can grow to 8-9 secs/iter.

This example is a case is from @ndevelder using the exawind hybrid solver, and showing that the slowdown is coming from the AMR-Wind solver alone:
image

It also appears in AMR-Wind only solutions, in this this case a 9 turbine wind farm case run with OpenFAST coupling
image
Here the typical solve time per iterations starts out around ~0.5 sec/iter and then grows to ~1 sec/iter about 40,000 iterations later. What's interesting is that if you restart the case, the solve time go back to ~0.5 sec/iter before slowly growing again.

Timing data from the log files can be extracted and plotted using

grep WallClockTime log1.txt |gnuplot -p -e "set yr [0:10]; plot '<cat' using 2:6;"

for AMR-Wind log files and

grep "AMR-Wind::Total" log | gnuplot -p -e "set yr [0:10]; plot '<cat' using 2:5;"

for ExaWind log files.

Note the number of solver iterations remains constant in AMR-Wind, here is a plot of the MAC and Nodal projection iterations required over the length the run:
image

Note also that the solve process also seems relatively unaffected, see the before restart/after restarts snippet from the log file below.

Steps to reproduce

Steps to reproduce the behavior:

  1. Compiler used
    • GCC
    • LLVM
    • oneapi (Intel)
    • nvcc (NVIDIA)
    • rocm (AMD)
    • with MPI
    • Clang
  2. Operating system
    • Linux
    • OSX
    • Windows
    • other (do tell ;)):
  3. Hardware:
    • CPU
    • GPU
  4. Machine details ():
    Observed this on runs with:
  • Frontier (GPU)
  • Sandia HPC (CPU)
  1. Input file attachments
  2. Error (paste or attach):
Step: 106124 dt: 0.02 Time: 28009.96 to 28009.98
CFL: 0.292407 (conv: 0.101057 diff: 0 src: 0.236542 )

Godunov:
  System                     Iters      Initial residual        Final residual
  ----------------------------------------------------------------------------
  MAC_projection                 4           1.706609085       1.653896045e-06
  temperature_solve              2       0.0001600051844       1.516013981e-10
  tke_solve                      1        0.001830700216       1.433928062e-06
  velocity_solve                 1        0.002116535213       2.247267522e-06
  Nodal_projection               4           2.918882696       5.527631235e-07

WallClockTime: 106124 Pre: 0.0393 Solve: 1.014 Post: 0.0176 Total: 1.071
Solve time per cell: 9.08e-06
Step: 106124 dt: 0.02 Time: 28009.96 to 28009.98
CFL: 0.292413 (conv: 0.101057 diff: 0 src: 0.236548 )

Godunov:
  System                     Iters      Initial residual        Final residual
  ----------------------------------------------------------------------------
  MAC_projection                 4           1.706301224       1.476950891e-06
  temperature_solve              2       0.0001600004604       1.516582415e-10
  tke_solve                      1        0.001830697993       1.433928058e-06
  velocity_solve                 1        0.002116534853       2.247266377e-06
  Nodal_projection               4           2.918889138       5.520496715e-07

WallClockTime: 106124 Pre: 0.0338 Solve: 0.5783 Post: 0.00379 Total: 0.616
Solve time per cell: 7.768e-06
  1. If this is a segfault, a stack trace from a debug build (paste or attach):
<!-- stack trace -->

AMR-Wind information

Problem has existed since at least

==============================================================================
                AMR-Wind (https://github.com/exawind/amr-wind)

  AMR-Wind version :: v2.0.0-4-gc70c279e
  AMR-Wind Git SHA :: c70c279eb6901edc4466d6f96f10e522ca6b62f9
  AMReX version    :: 24.03-36-g748f8dfea597

  Exec. time       :: Mon May 27 03:00:45 2024
  Build time       :: May 20 2024 00:00:24
  C++ compiler     :: Clang 15.0.0

  MPI              :: ON    (Num. ranks = 2400)
  GPU              :: ON    (Backend: HIP)
  OpenMP           :: OFF

  Enabled third-party libraries: 
    NetCDF    4.7.4
    HYPRE     2.31.0
    OpenFAST  

@marchdf
Copy link
Contributor

marchdf commented Jan 14, 2025

Thanks @lawrenceccheung . Were you able to check just the precursor sims? I would like to rule that out.

@lawrenceccheung
Copy link
Contributor Author

So this is interesting... in all of the precursor (ABL only) cases I checked, the AMR-Wind solve times have been incredibly consistent, no increasing trend in time/iter. Note that I only checked runs on Sandia hardware, Frontier is down today but I can check when that machine comes back up.

This might point to something going on in the I/O for boundary inflow/outflow, or something else that happens when we do these turbine simulations.

Lawrence

@marchdf
Copy link
Contributor

marchdf commented Jan 14, 2025

Thanks for checking that. Glad the precursor is fine.

Yeah... or maybe openfast somehow? I am trying to think of the best way to check this... maybe running out the uniform ALM regtest (which doesn't use boundary planes) could help eliminate candidates?

@lawrenceccheung
Copy link
Contributor Author

Yes, openfast was one of my initial thoughts too, because of the amount of I/O and memory that it requires. But in the exawind hybrid solver runs, the openfast coupling runs through Nalu-Wind. The solve times/iteration look pretty good on the Nalu-Wind side, so I'm thinking it might be something else.

@ndevelder
Copy link
Contributor

@lawrenceccheung maybe we should also try a precursor with increasing refinement levels turned on? I think we wanted to do one of those for the benchmark anyway? I could imagine that just adding levels is a possible culprit?

@lawrenceccheung
Copy link
Contributor Author

That's a good thought, all of the turbine cases have refinement regions that could a slow memory leak. Let me see if there's any ABL case that I've run before that has refinement regions, and if that has any impact on long term performance.

@moprak-nrel
Copy link
Contributor

moprak-nrel commented Jan 15, 2025

Here is a plot from the HFM benchmark neutral case using AMR-wind. It was run on Kestrel using 20 (CPU) nodes without refinement zones. I don't quite see as catastrophic a slowdown as your case @lawrenceccheung. However, there is a very small gradual slowdown over ~75,000 time steps.

Solve times (red), and moving average (blue) of solve times:
image
image

Post restart:
image
image

@lawrenceccheung
Copy link
Contributor Author

Thanks @moprak-nrel. I just checked the NREL5MW ALM case that we ran for the ExaWind benchmarks, and there we see that the solve times remain relatively steady across 50,000+ iterations:
image

This is interesting because that ALM case includes boundary I/O planes and refinements. So I'm not sure yet what could be the problem, but will continue looking across all of the cases that we have.

@marchdf
Copy link
Contributor

marchdf commented Jan 15, 2025

Ooof the plot thickens. Keep posting data @lawrenceccheung and hopefully we can make some sense of this.

@moprak-nrel
Copy link
Contributor

moprak-nrel commented Feb 3, 2025

Looked at a more recent smaller run, [nx ny nz] = [256 144 96] run on kestrel. The solve times appear very steady over 100,000 steps.

Image

Copy link

github-actions bot commented Mar 6, 2025

This issue is stale because it has been open 30 days with no activity.

@lawrenceccheung
Copy link
Contributor Author

Adding some of the latest timing information from the NREL5MW benchmark case that @ndevelder is running. This is a blade-resolved hybrid ExaWind solver simulation with a tower and 3 blades each in their own overset group. We're seeing an unexplained jump in the AMR-Wind solver time early on, and also this long term increase in solve times. The case is run on Sandia Flight with CPU's.

Image

Image

@ndevelder
Copy link
Contributor

And if it wasn't clear, the x-axis on the Nalu-Wind portion of the second plot is not right...these need more work to incorporate the number of eqn system iterations

@ndevelder
Copy link
Contributor

Image

Here it is fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug:amr-wind Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants