MLMG failed #2088

wang1202 · 2025-01-27T17:34:09Z

Hello,

Thank you for resolving the previous grid refinement issue. Now I'm testing the case DevTests/FlowInABox, and I found that the refined grid is very unstable. I can set amr.grid_eff = 0.1 or amr.n_error_buf = 5 to have the simulation run for hundreds of steps, but this causes the entire domain to be refined, which is not desired. Do you have any suggestions? Below is the last 5 iterations information when setting erf.mg_v = 4:

AT LEVEL 0 0   UP: Norm after  bottom 1.199623168e-08
MLMG: Iteration  95 Fine resid/bnorm = 2.211093305e-09
MLMG: Subtracting -4.394183651e-16 from mf component c = 0 on level (0, 0)
AT LEVEL 0 0   DN: Norm before bottom 1.199623049e-08
MLMG: Subtracting 1.71931457e-22 from mf component c = 0 on level (0, 0)
MLMG: Bottom solve failed.
AT LEVEL 0 0   UP: Norm after  bottom 1.190413938e-08
MLMG: Iteration  96 Fine resid/bnorm = 2.194119532e-09
MLMG: Subtracting -4.372072655e-16 from mf component c = 0 on level (0, 0)
AT LEVEL 0 0   DN: Norm before bottom 1.190413971e-08
MLMG: Subtracting -1.033134579e-22 from mf component c = 0 on level (0, 0)
MLMG: Bottom solve failed.
AT LEVEL 0 0   UP: Norm after  bottom 1.181302101e-08
MLMG: Iteration  97 Fine resid/bnorm = 2.177324811e-09
MLMG: Subtracting -4.396645998e-16 from mf component c = 0 on level (0, 0)
AT LEVEL 0 0   DN: Norm before bottom 1.181302037e-08
MLMG: Subtracting -1.870463104e-22 from mf component c = 0 on level (0, 0)
MLMG: Bottom solve failed.
AT LEVEL 0 0   UP: Norm after  bottom 1.172286616e-08
MLMG: Iteration  98 Fine resid/bnorm = 2.160707833e-09
MLMG: Subtracting -4.391924888e-16 from mf component c = 0 on level (0, 0)
AT LEVEL 0 0   DN: Norm before bottom 1.172286538e-08
MLMG: Subtracting -3.025547068e-22 from mf component c = 0 on level (0, 0)
MLMG: Bottom solve failed.
AT LEVEL 0 0   UP: Norm after  bottom 1.163366464e-08
MLMG: Iteration  99 Fine resid/bnorm = 2.144266838e-09
MLMG: Subtracting -4.381772983e-16 from mf component c = 0 on level (0, 0)
AT LEVEL 0 0   DN: Norm before bottom 1.163366518e-08
MLMG: Subtracting 4.585983918e-23 from mf component c = 0 on level (0, 0)
MLMG: Bottom solve failed.
AT LEVEL 0 0   UP: Norm after  bottom 1.154540596e-08
MLMG: Iteration 100 Fine resid/bnorm = 2.127999289e-09
MLMG: Failed to converge after 100 iterations. resid, resid/bnorm = 1.154540644e-08, 2.127999289e-09
amrex::Abort::0::MLMG failed. !!!

And below is the input I adjusted:

amr.n_cell       = 64  64  32

xlo.theta = 288.
xhi.theta = 288.
ylo.theta = 288.
yhi.theta = 288.
zlo.theta = 294.
zhi.theta = 282.

erf.cfl            = 0.8
erf.substepping_cfl = 0.5
erf.dt_max_initial = 0.1
erf.dt_max         = 0.1

# REFINEMENT / REGRIDDING
amr.max_level       = 1       # maximum level number allowed
amr.max_grid_size   = 256

erf.regrid_int = 2
erf.coupling_type = "TwoWay"

erf.refinement_indicators = diff_theta

erf.diff_theta.max_level     = 1
erf.diff_theta.field_name    = theta
erf.diff_theta.adjacent_difference_greater    = 1.0 2.0
erf.diff_theta.start_time = 1. 2.

amr.n_error_buf  = 5 5
#amr.grid_eff     = 0.1

# PROBLEM PARAMETERS
prob.rho_0 = 1.0
prob.T_0          = 288.
prob.T_0_Pert_Mag = 0.1
prob.U_0_Pert_Mag = 0.01
prob.V_0_Pert_Mag = 0.01
prob.W_0_Pert_Mag = 0.01

The text was updated successfully, but these errors were encountered:

asalmgren · 2025-01-27T17:43:55Z

Yes -- one of the things I noticed was that the grids at level 1 weren't "ideal". Typically if using multigrid we want to make sure the level 1 grids are sufficiently coarsenable for good multigrid performance, which basically means they should be m * 2^n, e.g 32 = 2^5 or 48 = 3 * 2^4. You can control this by setting amr.blocking_factor -- but notice that will also make the individual grids larger than you might want if you start with a small domain.

There are a lot of ways to control the size and shape of the grids and it takes a while to figure out what the ideal configuration is. grid_eff, n_error_buf, blocking_factor are some of the best.

This page is a good reference for how the grids are created.

One way to play with this might be to start at a later time where there's a region you know you want refined and play with the parameters starting at that point to see which grid coverage is most efficient and does what you want.

Happy to help more -- let me know if this is at all helpful.

wang1202 · 2025-01-27T18:32:22Z

Hi @asalmgren, thanks for the detailed information. I didn't pay attention to blocking_factor before. Yes, I restarted the run after using the coarse grid to reach a steady state first. I have two quick questions:

"the grids at level 1 were't ideal" -- do you mean amr.n_error_buf should be the multiple of 2?
I've tested many combinations of grid_eff, n_error_buf, blocking_factor, and I found that it only works when the entire grid is refined. I'm considering testing the tolerance and iteration time. What's the easiest way to modify them?

asalmgren · 2025-01-27T18:36:33Z

No -- n_error_buf doesn't have anything to do with the eventual specific size/shape of the grid, it just says how much you "buffer" the features you care about before you start the tagging procedure. One way to think about that is if you know a feature is moving at one grid cell per time step but you don't want to refine every timestep, you would set n_error_buf to be roughly regrid_int, so, for example, if you set n_error_buf = 5 then you could move 5 timesteps before the feature reached the coarse/fine boundary because you had created the grids with that much extra space to start with. This is overly simplistic, but hopefully gives the idea.

asalmgren · 2025-01-27T18:37:36Z

If you start with a relatively small grid (e.g. 32x64) and require the fine grids to be "relatively large", it will typically fill a lot of the domain. I also noticed that when the problem starts there aren't any well-defined regions to define. Can you share a picture of when the gridding is doing something you don't want it to do?

wang1202 · 2025-01-27T18:44:40Z

If you start with a relatively small grid (e.g. 32x64) and require the fine grids to be "relatively large", it will typically fill a lot of the domain. I also noticed that when the problem starts there aren't any well-defined regions to define. Can you share a picture of when the gridding is doing something you don't want it to do?

Sure. Please see the two snapshots before and after the gridding. I just want to locations with sharp temperature gradient to be refined, but the MLMG only converges when the entire domain is refined.

wang1202 · 2025-01-27T18:48:39Z

No -- n_error_buf doesn't have anything to do with the eventual specific size/shape of the grid, it just says how much you "buffer" the features you care about before you start the tagging procedure. One way to think about that is if you know a feature is moving at one grid cell per time step but you don't want to refine every timestep, you would set n_error_buf to be roughly regrid_int, so, for example, if you set n_error_buf = 5 then you could move 5 timesteps before the feature reached the coarse/fine boundary because you had created the grids with that much extra space to start with. This is overly simplistic, but hopefully gives the idea.

Thanks for the clarification, but then I still don’t understand why level 1 isn't 'ideal.' Could you point out where the settings show that the level 1 grid is not ideal?

asalmgren · 2025-01-27T18:52:34Z

There are tradeoffs between grid size and multigrid performance. Multigrid -- to work well-- needs to be able to coarsen. Imagine at level 1 if you have a 4x4x4 grid and a 64x64x64 grids. The level will only be able to coarsen once, which means the "bottom solver" has to deal with a 2x2x2 grid (fine) and a 32x32x32 grid (expensive). So ideally we want blocking factor at least 8 ... but when you try to decompose the level into boxes that are coarsenable by 8 and cover all your tagged points, it's a hard problem.

asalmgren · 2025-01-27T18:52:53Z

My suggestion is to cut n_error_buf down to 2 -- that's our usual default -- and set max_grid_size = 8. How does that work?

wang1202 · 2025-01-27T19:35:15Z

Thanks for the details and suggestion. I've tested the following settings:

amr.max_grid_size   = 8
amr.n_error_buf     = 2
amr.grid_eff        = 0.6
amr.blocking_factor = 4

Result: MLMG: Failed to converge after 100 iterations.

Increase blocking_factor:

amr.max_grid_size   = 8
amr.n_error_buf     = 2
amr.grid_eff        = 0.6
amr.blocking_factor = 8

Result: Entire domain is refined.

Reduce grid_eff:

amr.max_grid_size   = 8
amr.n_error_buf     = 2
amr.grid_eff        = 0.5
amr.blocking_factor = 4

Result: Entire domain is refined.

asalmgren · 2025-01-27T19:44:41Z

I think the code is doing exactly what you're telling it to do subject to the constraints. Can you modify your tagging criteria to make it only a very small region refined and verify that you can get one grid not covering the whole region?

Keep in mind btw that blocking_factor = 8 means not just that the grids have to be 8 wide but more specifically that the left edge will have to be at i = 0, 8, 16, 32, 40, 48, or 56.

wang1202 · 2025-01-28T16:51:52Z

If I increase erf.diff_theta.adjacent_difference_greater from 1.0 to 1.5, this setting fails (which works with diff_theta... = 1.0)

amr.max_grid_size   = 8
amr.n_error_buf     = 2
amr.grid_eff        = 0.6
amr.blocking_factor = 8

I need to reduce the grid_eff by 0.1:

amr.max_grid_size   = 4
amr.n_error_buf     = 2
amr.grid_eff        = 0.5
amr.blocking_factor = 8

Then I can get a layer not refined in the middle before the entire domain is refined.

I removed my previous comment about the advection scheme, because I found that I forgot to replace the input file's name when testing different advection schemes... after further testing I found that advection scheme may not be the reason.

asalmgren · 2025-01-28T16:56:15Z

One thing we can think about if the smaller grids are important -- if we implement an algorithm without the subcyling in time between levels, then the Poisson solves would be over the whole grid hierarchy which means we don't need the level 1 grids to be so coarsenable. But that would take a little time to make those changes.

wang1202 · 2025-01-29T15:20:52Z

Thanks for the information, @asalmgren! I have another question (not sure whether this is related to the dynamic regridding issue I met). I found that the smaller grid is hard to be separated horizontally. For some simple tests of static mesh refinement, I tried to refine half of the domain. Separating the grid vertically works well, but separating the grid horizontally always fail. Below is how I cut the grid.

The domain:

# PROBLEM SIZE & GEOMETRY
geometry.prob_lo = -1. -1.  0.
geometry.prob_hi =  1.  1.  1.
amr.n_cell       = 64  64  32

This works well:

erf.refinement_indicators =  box1
erf.box1.max_level     = 1
erf.box1.in_box_lo = -1. -1.  0.
erf.box1.in_box_hi = 1.  1.  0.5

This fails (output: SIGILL Invalid, privileged, or ill-formed instruction):

erf.refinement_indicators =  box1
erf.box1.max_level     = 1
erf.box1.in_box_lo = -1. -1.  0.
erf.box1.in_box_hi = 0.  1.  1.

This is the output of the last step before failing. Some values appear as large as e+21 or e+22:

[Level 0 step 3188] Advanced 131072 cells
[Level 1 step 3] ADVANCE from time = 300.0991465 to 300.1277102 with dt = 0.02856364786
Making slow rhs at time 300.0991465 for fast variables advancing from 300.0991465 to 300.1277102
 No-substepping time integration at level 1 to 300.1277102 with dt = 0.02856364786
Max/L2 norm of divergence before solve at level 1 : 1.823740494e+21 1.485771711e+22
MLMG: Initial rhs               = 1.823740494e+21
MLMG: Initial residual (resid0) = 1.823740494e+21
MLMG: Final Iter. 8 resid, resid/bnorm = 3.246153728e+10, 1.779942782e-11
MLMG: Timers: Solve = 0.029332875 Iter = 0.02683175 Bottom = 0.000173165
Time in solve 0.031279
Max/L2 norm of divergence  after solve at level 1 : 3.246154138e+10 2.151640924e+12

Below are some other inputs information that may affect:

fabarray.mfiter_tile_size = 1024 1024 1024

# TIME STEP CONTROL
erf.cfl            = 0.8
erf.substepping_cfl = 0.5
erf.dt_max_initial = 0.1
erf.dt_max         = 0.1

# REFINEMENT / REGRIDDING
amr.max_level       = 1       # maximum level number allowed
amr.ref_ratio = 2
amr.max_grid_size   = 256
amr.n_error_buf     = 0
amr.grid_eff        = 0.5
amr.blocking_factor = 8

erf.regrid_int = 2
erf.coupling_type = "TwoWay"

asalmgren · 2025-02-04T00:00:45Z

@wang1202 -- Sorry for the delay! I believe this issue is now fixed in PR 2095 -- can you give it a try?

wang1202 · 2025-02-04T16:46:15Z

Hi @asalmgren, I think it works now. Thank you for the assistance! Allowing the horizontal refinement seems important to me. Although I'm still testing the options to have a stabler run, now I can see the box separated horizontally as shown here.

asalmgren closed this as completed Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLMG failed #2088

MLMG failed #2088

wang1202 commented Jan 27, 2025

asalmgren commented Jan 27, 2025

wang1202 commented Jan 27, 2025 •

edited

Loading

asalmgren commented Jan 27, 2025

asalmgren commented Jan 27, 2025

wang1202 commented Jan 27, 2025

wang1202 commented Jan 27, 2025

asalmgren commented Jan 27, 2025

asalmgren commented Jan 27, 2025

wang1202 commented Jan 27, 2025

asalmgren commented Jan 27, 2025

wang1202 commented Jan 28, 2025

asalmgren commented Jan 28, 2025

wang1202 commented Jan 29, 2025 •

edited

Loading

asalmgren commented Feb 4, 2025

wang1202 commented Feb 4, 2025

MLMG failed #2088

MLMG failed #2088

Comments

wang1202 commented Jan 27, 2025

asalmgren commented Jan 27, 2025

wang1202 commented Jan 27, 2025 • edited Loading

asalmgren commented Jan 27, 2025

asalmgren commented Jan 27, 2025

wang1202 commented Jan 27, 2025

wang1202 commented Jan 27, 2025

asalmgren commented Jan 27, 2025

asalmgren commented Jan 27, 2025

wang1202 commented Jan 27, 2025

asalmgren commented Jan 27, 2025

wang1202 commented Jan 28, 2025

asalmgren commented Jan 28, 2025

wang1202 commented Jan 29, 2025 • edited Loading

asalmgren commented Feb 4, 2025

wang1202 commented Feb 4, 2025

wang1202 commented Jan 27, 2025 •

edited

Loading

wang1202 commented Jan 29, 2025 •

edited

Loading