Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MOST test with terrain failing #1453

Closed
indra098124 opened this issue Feb 23, 2024 · 26 comments
Closed

MOST test with terrain failing #1453

indra098124 opened this issue Feb 23, 2024 · 26 comments

Comments

@indra098124
Copy link

Hi there,
I tried most test provided in terrain3d_Hemisphere and WitchOfAgnesi. Both of these tests are failing for me. Are they expected to run from the initial condition defined in prob or we should run it without most first? I am using the latest version of the code and getting "SIGILL Invalid, privileged, or ill-formed instruction" error with these tests.

Many thanks for developing the code and answering my question.

@indra098124 indra098124 changed the title most test with terrain failing MOST test with terrain failing Feb 23, 2024
@asalmgren
Copy link
Collaborator

Hi @indra098124 -- could you try with the inputs files in those directories and see if that works for you? Here https://ccse.lbl.gov/pub/RegressionTesting1/ERF/ is our nightly regression test suite -- all of these should "just work" if you try them-- maybe also try some of these as well so we can rule out issues, then we can see about this particular problem.

@AMLattanzi
Copy link
Collaborator

Are you running these tests locally on a mac?

@indra098124
Copy link
Author

Thank you @asalmgren and @AMLattanzi for looking into this. @AMLattanzi yes, I am running these locally on Mac.
@asalmgren I can run ABL cases (that are also included in nightly tests) with no problem.

@baperry2
Copy link
Collaborator

Set amrex.fpe_trap_invalid = 0 in the input files, which turns off some runtime error checking. The Apple Clang compilers sometimes perform optimizations that cause the AMReX checks for divide by zero and similar errors to spuriously fail (conditional branches that don't get used and involve a divide by zero may be still be evaluated). These optimizations aren't performed in debug mode, so if needed you can also run with amrex.fpe_trap_invalid = 1 if you compile with DEBUG = TRUE.

@asalmgren
Copy link
Collaborator

@baperry2 -- that's really good to know -- could you add that to the docs somewhere?!

@indra098124
Copy link
Author

indra098124 commented Feb 23, 2024

Thanks @baperry2, I was not aware of this.
I tried that but it did not help. I also tried to run this test on a Linux machine and I get an error "erroneous arithmetic operation" . Looking at Backtrace it appears that the error originates in MOST calculation "Source/BoundaryConditions/MOSTAverage.H:143:56"

Here is the code snippet where it fails.
for (int n = 0; n < interp_comp; n++)
interp_vals[n] = sx_lo[0]*sx_lo[1]*sx_lo[2]*interp_array(i-1, j-1, k-1,n) +
sx_lo[0]*sx_lo[1]*sx_hi[2]*interp_array(i-1, j-1, k ,n) +
sx_lo[0]*sx_hi[1]*sx_lo[2]*interp_array(i-1, j , k-1,n) +
sx_lo[0]*sx_hi[1]*sx_hi[2]*interp_array(i-1, j , k ,n) +
sx_hi[0]*sx_lo[1]*sx_lo[2]*interp_array(i , j-1, k-1,n) +
sx_hi[0]*sx_lo[1]*sx_hi[2]*interp_array(i , j-1, k ,n) +
sx_hi[0]*sx_hi[1]*sx_lo[2]*interp_array(i , j , k-1,n) +
sx_hi[0]*sx_hi[1]*sx_hi[2]*interp_array(i , j , k ,n);
}

@baperry2
Copy link
Collaborator

@asalmgren will do, even though there appears to be more going on here, I definitely learned about the spurious FPEs on Macs the hard way and it would be good to have the information out there more.

@indra098124 - I tried again and see the same thing as you. For Witch of Agnesi, I see a spurious FPE that resolves with amrex.fpe_trap_invalid = 0 when running with inputs, but the same error as you when running with inputs_most_test, which appears to be a real error

@AMLattanzi
Copy link
Collaborator

@indra098124 Thank you for sharing the issue with inputs_most_test . The problem had to do with Theta_prim variable not having its ghost cells filled yet and the interpolation routine (where your backtrace points to) had to access that data. The following PR 1455 ran successfully in debug mode on my local machine with single and multiple cores. Please let me know if you have further issues.

@indra098124
Copy link
Author

Thank you @AMLattanzi . I modified my copy to have
IntVect ng = Theta_prim[lev]->nGrowVect();
in ERF.cpp and in ERF_Advance.cpp, still failing for me. I will try the version from PR.

@AMLattanzi
Copy link
Collaborator

@indra098124 Yes it should fail still with that revision. The creation of the MOST class and the calls to the MOST averaging needed to be moved later after the ghost cells were populated by FillPatch. If you see the issue arise, or a new issue, with the current development (e9bcaa0) let me know.

@indra098124
Copy link
Author

@AMLattanzi unfortunately, it is still failing for me with the latest version. I tried debug version as well. With debug I get the following error (on Mac and on Linux).

amrex::Abort::1:: (127,-1,-1,0) is out of bound (125:258,-3:10,0:63,0:0) !!!
SIGABRT
amrex::Abort::0:: (117,1,-1,0) is out of bound (-3:130,-3:10,0:63,0:0) !!!
SIGABRT

I tried running realclean and also a fresh download.

@indra098124
Copy link
Author

indra098124 commented Feb 25, 2024

@AMLattanzi and @asalmgren there are other cases as well that are failing for me. I am not sure if I am doing something wrong.

  1. ABL/inputs.write -> The input filed needed prob.T_0 = 300.0, after that it worked.
  2. ABL/inputs.read -> This has been giving segfault. Backtrace points to if (input_bndry_planes && m_r2d->ingested_velocity()) in ERF_init_bcs.cpp:86). Debug or Assertion don't tell anything more. I did generate boundary files using inputs.write before trying this.
  3. ABL_input_sounding does not compile. I just needed input_sounding that put me on track on finding the issue with this code compilation. This error is related to "USE_POISSON_SOLVE = TRUE". It gives an error /TI_headers.H:270:30: error: 'Vector' does not name a type 270 | const Vectoramrex::Real* d_rayleigh_ptrs_at_lev); I realized that it is do with USE_POISSON_SOLVE = TRUE. I think it should be amrex::Vector. There was another error about use_rayleigh_damping not being declared which might be a typo as other places I find it is referenced as solverChoice.use_rayleigh_damping. At TI_no_substep_fun.H:133:13 the code complains that incompressible is not declared. Lastly, At TI_slow_rhs_fun.H:357:25: I get an error: cannot convert 'std::unique_ptramrex::MultiFab' to 'const amrex::MultiFab*' erf_slow_rhs_inc(level, nrk, slow_dt. I could use input_sounding when I disable poisson_solve.

Thank you!

@asalmgren
Copy link
Collaborator

asalmgren commented Feb 26, 2024 via email

@indra098124
Copy link
Author

Thank you @asalmgren and thank you ERF development team for making the software available open source. Yes, after disabling the poisson solver, I can build and run this.

Last thing I am figuring out is to use boundary input.

@AMLattanzi
Copy link
Collaborator

@indra098124 sounds like things are alright on this front? Are we good to close this particular issue?

@AMLattanzi
Copy link
Collaborator

AMLattanzi commented Feb 28, 2024

I believe the inputs.write and inputs.read should work once PR 1461 goes through.

@indra098124
Copy link
Author

indra098124 commented Feb 28, 2024

@AMLattanzi thanks for following up. I am not sure, but the most with terrain still fails for me with the following error?

amrex::Abort::1:: (127,-1,-1,0) is out of bound (125:258,-3:10,0:63,0:0) !!!
SIGABRT
amrex::Abort::0:: (117,1,-1,0) is out of bound (-3:130,-3:10,0:63,0:0) !!!
SIGABRT

I am not sure. May I confirm if you were able to run terrain3d_Hemisphere successfully?

@AMLattanzi
Copy link
Collaborator

Ah, I have not tested hemisphere with MOST! Let me give that a go and I can either follow up with the results or create a PR to alleviate the issue. Thanks for clarifying.

@AMLattanzi
Copy link
Collaborator

@indra098124 I believe I have corrected the issue with MOST and the 3d hemisphere in PR 1465. Thank you again for bringing these issues to our attention, we greatly appreciate the feedback.

@indra098124
Copy link
Author

Thank you @AMLattanzi for your help.

@indra098124
Copy link
Author

@AMLattanzi after the new fix, the inputs_most_test in ABL seems to be broken. I find that if used erf.most.average_policy = 0, the code diverges at first time step with the error "0::Assertion `cell_data(i,j,k,RhoTheta_comp) > 0.' failed, file "../../Source/TimeIntegration/ERF_slow_rhs_pre.cpp", line 566" . most_average_policy =1 works fine. Would you mind having a look?

Many thanks

@indra098124
Copy link
Author

Additionally, looks like there is some issue with MOST with surface temperature. It always gives SIGILL Invalid, privileged, or ill-formed instruction. For e.g. see GABLS1 case.

@AMLattanzi
Copy link
Collaborator

@indra098124 The issue with the hemisphere should be corrected in PR 1468. The salient problem was that the turbulent viscosity was 0 for the given initialization; this is inconsistent with the MOST BC and the limiting we did with 1e-16 was not sufficient for stability. I also added an option for small perturbations in the IC to give finite strain and thus non-zero turbulent viscosity with Smagorinsky (the fluctuations seem to dissipate quickly). This ran for planar and local average for 10 steps.

With respect to the GABLS case, I am unable to replicate that issue. The instruction error you mention sounds like the mac issue Bruce explained. I have yet to see that error on a Linux machine with ERF. Perhaps try in DEBUG mode.

@indra098124
Copy link
Author

Thanks @AMLattanzi . This PR seems to have fixed the other issues (GABLS and ABLMost). I can see the ABLMost regression test ran successfully (https://ccse.lbl.gov/pub/RegressionTesting1/ERF/) while it was failing earlier today.
Also thank you for explaining what was wrong.

Many thanks

@asalmgren
Copy link
Collaborator

@indra098124 -- are we good to close this issue?

@indra098124
Copy link
Author

Thank you @AMLattanzi. Yes @asalmgren we can close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants