-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add example cfg for v3.LR.historical run on LCRC #694
Conversation
@chengzhuzhang I added a second commit -- 36d29729e5b8f04c05deef6e11b99dedbcfdb, but this isn't ready to merge yet. In particular, my to-do list is:
|
After discussions with @thorntonpe, for this example |
@chengzhuzhang I'm currently running the cfg with the changes to display the Land Viewers. Once I get that working, I'll have you do a final review. Just a note: there is actually one user-facing code change here -- I updated the |
@chengzhuzhang There's a bit of a problem with trying to run all available variables ( I'm getting the following when running the
Now, in https://github.com/E3SM-Project/zppy/blob/main/zppy/templates/ts.bash#L46: {% if mapping_file == 'glb' -%}
vars={{ vars }}
# https://unix.stackexchange.com/questions/237297/the-fastest-way-to-remove-a-string-in-a-variable
# https://stackoverflow.com/questions/26457052/remove-a-substring-from-a-bash-variable
# Remove U, since it is a 3D variable and thus will not work with rgn_avg
vars=${vars//,U}
{%- else %}
vars={{ vars }}
{%- endif %} And at https://github.com/E3SM-Project/zppy/blob/main/zppy/templates/ts.bash#L67: cat input.txt | ncclimo \
-c {{ case }} \
{%- if vars != '' %}
-v ${vars} \
{%- endif %} So, I'm seeing a few issues:
|
One more note: I believe I set the variables in https://github.com/E3SM-Project/zppy-interfaces/blob/main/tests/integration/global_time_series/cases_global_time_series.py#L11 explicitly to be ones where the data was actually available: plots_lnd_metric_average = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,"
plots_lnd_metric_total = (
"TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR"
)
plots_lnd_all = plots_lnd_metric_average + plots_lnd_metric_total That's why https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zi-test-webdir/global_time_series_1985-1995_results_viewers/table_lnd/index.html list far fewer than 354 variables. I remember going through test output, removing any variables that were raising not-found errors. |
@forsyth2 This post from nco repo can be helpful in this case to have ncclimo tolerate missing variables. Specifically, @czender's comment. Though I'm not sure at this point if |
@forsyth2 and @chengzhuzhang |
@czender @chengzhuzhang Hmm, that didn't seem to make a difference. I still get:
despite a bash script that includes:
|
@forsyth2 Are you trying this with NCO 5.3.2 from the latest E3SMU RC? This
And when I add W_SCALAR to that exclusion list, it successfully generates the seven global timeseres for all the single level fields:
|
@czender I'm running NCO after running And the second line of the output file has |
And what is the Bash version in the environment? Needs to be >= 4:
|
source /lcrc/soft/climate/e3sm-unified/test_e3sm_unified_1.11.0rc13_chrysalis.sh
bash --version gives:
|
I'm stumped. If it's possible, please send me the |
I used the code as of 5e888bd, specifically that includes a bit that updates the variable exclusion: {%- if vars != '' %}
-v ${vars} \
{%- else %}
--xcl_var -v PCT_LANDUNIT_tmp,TLAKE_tmp,LAKEICEFRAC_tmp,SOILLIQ_ICE_tmp,W_SCALAR_tmp,T_SCALAR_tmp,SOILICE_ICE_tmp,SOILPSI_tmp,O_SCALAR_tmp,H2OSOI_tmp \
{%- endif %} I'm running Output can be seen in
The bash file used is # Generate time series files
# If the user-defined parameter "vars" is "", then ${vars}, defined above, will be too.
cat input.txt | ncclimo \
-c v3.LR.historical_0051 \
--xcl_var -v PCT_LANDUNIT_tmp,TLAKE_tmp,LAKEICEFRAC_tmp,SOILLIQ_ICE_tmp,W_SCALAR_tmp,T_SCALAR_tmp,SOILICE_ICE_tmp,SOILPSI_tmp,O_SCALAR_tmp,H2OSOI_tmp \
--split \
--yr_srt=1985 \
--yr_end=2014 \
--ypf=30 \
-o output \
--rgn_avg \
--area=area \
--prc_typ=elm That ls v3.LR.historical_0051.elm.h0.????-*.nc > input.txt |
@chengzhuzhang @czender Please see my comment at #697 (comment) further diving into this. |
Thanks @forsyth2. I have verified that NCO 5.3.2 works as expected, for me, on Chrysalis with those files. This includes the exclusion list handling. I'm wondering if you inadvertently used the wrong exclusion list in your tests? You used (from above): |
@chengzhuzhang If we're adding a static example cfg, we should consider adding the auto-generated test files to the |
@chengzhuzhang It took 2 hours to run all 300+ land variables on I'll check on the status in the morning, and if all good, I'll request reviews of both the cfg itself and the output from you, Chris, and Wuyin. |
5e888bd
to
0917f59
Compare
# Once E3SM Unified 1.11.0 is released, you can use this line instead: | ||
# environment_commands = "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh" | ||
environment_commands = "source /lcrc/soft/climate/e3sm-unified/test_e3sm_unified_1.11.0rc13_chrysalis.sh" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: before merging, just make this the environment_commands = "source /lcrc/soft/climate/e3sm-unified/load_latest_e3sm_unified_chrysalis.sh"
line.
# plot_names (plot names should now be explicitly set via the plots_atm/ice/lnd/ocn parameters) | ||
active = True | ||
climo_years ="1985-2014", | ||
environment_commands = "source /gpfs/fs1/home/ac.forsyth2/miniforge3/etc/profile.d/conda.sh; conda activate zi_plots_lnd_20250326" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: before merging, remove this line.
@chengzhuzhang @golaz @wlin7 This is ready for review. The example cfg itself can be found at https://github.com/E3SM-Project/zppy/pull/694/files. Note there are 2 Output:
Web output:
|
# scratch | ||
active = True | ||
walltime = "02:00:00" | ||
years = "1985:2014:30", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider to add a note about elapsed time for this 30 years run, and caution about needing to adjust wall-time limit for longer simulations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For tc_analysis
specifically, or in general at the top of the cfg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For tc_analysis here, can you find the elapsed time from log files, so folks can have a rough estimate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tc_analysis
specifically is 3296 seconds
= 54.93 minutes
> grep "Elapsed time" *.o*
climo_atm_monthly_180x360_aave_1985-2014.o719881:Elapsed time 0m40s
climo_atm_monthly_180x360_aave_1985-2014.o719881:Elapsed time: 44 seconds
climo_atm_monthly_diurnal_8xdaily_180x360_aave_1985-2014.o719882:Elapsed time 2m2s
climo_atm_monthly_diurnal_8xdaily_180x360_aave_1985-2014.o719882:Elapsed time: 133 seconds
climo_land_monthly_climo_1985-2014.o719883:Elapsed time 1m21s
climo_land_monthly_climo_1985-2014.o719883:Elapsed time: 83 seconds
e3sm_diags_atm_monthly_180x360_aave_model_vs_obs_1985-2014.o719893:Elapsed time: 4473 seconds
e3sm_diags_atm_monthly_180x360_aave_mvm_model_vs_model_1985-2014_vs_1985-2014.o719894:Elapsed time: 1460 seconds
e3sm_diags_lnd_monthly_mvm_lnd_model_vs_model_1985-2014_vs_0051-0100.o719895:Elapsed time: 1855 seconds
e3sm_to_cmip_atm_monthly_180x360_aave_1985-2014-0030.o719890:Elapsed time: 71 seconds
e3sm_to_cmip_land_monthly_1985-2014-0030.o719891:Elapsed time: 74 seconds
global_time_series_1985-2014.o719959:Elapsed time: 24475 seconds
ilamb_1985-2014.o719898:Elapsed time: 1472 seconds
mpas_analysis_ts_1985-2014_climo_1985-2014.o719896:Elapsed time: 1272 seconds
tc_analysis_1985-2014.o719892:Elapsed time: 3296 seconds
ts_atm_daily_180x360_aave_1985-2014-0030.o719886:Elapsed time 1m18s
ts_atm_daily_180x360_aave_1985-2014-0030.o719886:Elapsed time: 84 seconds
ts_atm_monthly_180x360_aave_1985-2014-0030.o719884:Elapsed time 3m1s
ts_atm_monthly_180x360_aave_1985-2014-0030.o719884:Elapsed time: 192 seconds
ts_atm_monthly_glb_1985-2014-0030.o719888:Elapsed time 0m46s
ts_atm_monthly_glb_1985-2014-0030.o719888:Elapsed time: 51 seconds
ts_land_monthly_1985-2014-0030.o719885:Elapsed time 1m43s
ts_land_monthly_1985-2014-0030.o719885:Elapsed time: 114 seconds
ts_lnd_monthly_glb_1985-2014-0030.o719889:Elapsed time 7m24s
ts_lnd_monthly_glb_1985-2014-0030.o719889:Elapsed time: 449 seconds
ts_rof_monthly_1985-2014-0030.o719887:Elapsed time 0m27s
ts_rof_monthly_1985-2014-0030.o719887:Elapsed time: 32 seconds
input_subdir = "archive/lnd/hist" | ||
mapping_file = "glb" | ||
# vars = "FSH,RH2M,LAISHA,LAISUN,QINTR,QOVER,QRUNOFF,QSOIL,QVEGE,QVEGT,SOILWATER_10CM,TSA,H2OSNO,TOTLITC,CWDC,SOIL1C,SOIL2C,SOIL3C,SOIL4C,WOOD_HARVESTC,TOTVEGC,NBP,GPP,AR,HR" | ||
vars = "" # This will tell zppy to use all available variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider to add a note about elapsed time for this 30 years run for this option, and caution about needing to adjust wall-time limit for longer simulations.
# you can do something like the following: | ||
# environment_commands = "source /home/ac.zhang40/y/etc/profile.d/conda.sh; conda activate e3sm_diags_dev" | ||
# `e3sm_diags` is largely driven by which e3sm_diags sets are requested: | ||
sets="lat_lon","zonal_mean_xy","zonal_mean_2d","polar","cosp_histogram","meridional_mean_2d","annual_cycle_zonal_mean","qbo","diurnal_cycle","zonal_mean_2d_stratosphere","aerosol_aeronet","tropical_subseasonal","tc_analysis", "tropical_subseasonal", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix alignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@golaz I think that is just how it looks in the diff view. If you look at the file in https://github.com/E3SM-Project/zppy/blob/0917f59e9d48a2f7d246ed99b50eae0b352ed9a2/examples/post.v3.LR.historical.zppy_v3.cfg, it is aligned.
That commit (2eb9c6c) appears fine by visual inspection. I'll run the cfg. |
Thank you. Sorry for the trouble.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still waiting on the global_time_series
job, but there are a couple issues with e3sm_diags
> cd /lcrc/group/e3sm/ac.forsyth2/E3SMv3_20250403_try1/v3.LR.historical_0051/post/scripts/
> grep -v "OK" *status
e3sm_diags_atm_monthly_180x360_aave_mvm_model_vs_model_1985-2014_vs_0451-0500.status:ERROR (9)
e3sm_diags_lnd_monthly_mvm_lnd_model_vs_model_1985-2014_vs_0451-0500.status:ERROR (2)
global_time_series_1985-2014.status:RUNNING 721767
The first e3sm_diags
error:
File "/lcrc/group/e3sm/ac.forsyth2/E3SMv3_20250403_try1/v3.LR.historical_0051/post/scripts/tmp.e3sm_diags_atm_monthly_180x360_aave_mvm_model_vs_model_1985-2014_vs_0451-0500.721764.pgFA/e3sm.py", line 15
ref_start_yr = 0451
This can be resolved by changing the parameters in the cfg.
The second:
cp: cannot stat '/lcrc/group/e3sm2/ac.zhang40/E3SMv3/v3.LR.piControl_451-500/post/lnd/native/clim/50yr/20231209.v3.LR.piControl-spinup.chrysalis_*_0451??_0500??_climo.nc': No such file or directory
but:
> cd /lcrc/group/e3sm2/ac.zhang40/E3SMv3/v3.LR.piControl_451-500/post/lnd/native/clim/50yr/
v3.LR.piControl_01_045101_050001_climo.nc v3.LR.piControl_10_045110_050010_climo.nc
v3.LR.piControl_02_045102_050002_climo.nc v3.LR.piControl_11_045111_050011_climo.nc
v3.LR.piControl_03_045103_050003_climo.nc v3.LR.piControl_12_045112_050012_climo.nc
v3.LR.piControl_04_045104_050004_climo.nc v3.LR.piControl_ANN_045101_050012_climo.nc
v3.LR.piControl_05_045105_050005_climo.nc v3.LR.piControl_DJF_045101_050012_climo.nc
v3.LR.piControl_06_045106_050006_climo.nc v3.LR.piControl_JJA_045106_050008_climo.nc
v3.LR.piControl_07_045107_050007_climo.nc v3.LR.piControl_MAM_045103_050005_climo.nc
v3.LR.piControl_08_045108_050008_climo.nc v3.LR.piControl_SON_045109_050011_climo.nc
v3.LR.piControl_09_045109_050009_climo.nc
@chengzhuzhang It seems like the data is in fact there??
EDIT: oh I see -spinup.chrysalis
is missing. Do we want that or not?
ref_start_yr = 0451 | ||
ref_final_yr = 0500 | ||
ref_years = "0451-0500", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have to remove leading zero's for correct parsing
ref_final_yr = 0451 | ||
ref_start_yr = 0500 | ||
ref_years = "0451-0500", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will have to remove leading zero's for correct parsing
reference_data_path = "/lcrc/group/e3sm2/ac.zhang40/E3SMv3/v3.LR.piControl_451-500/post/lnd/native/clim" | ||
ref_name = "20231209.v3.LR.piControl-spinup.chrysalis" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source of the second diags error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ref_name should be "v3.LR.piControl", I must have missed this when updating
Actually, I'm extremely confused why Even the longest runtimes reported in the earlier thread #694 (comment) are 3 sec/var*year. This is only 25 land vars x 30 years of data. That's 750 var x years x 3 = absolute max of 2,250 seconds = 37.5 minutes. I checked the output:
So, it's definitely the same job. |
You should try to use a freshly created output directory that only include requested variables. I'm looking at /lcrc/group/e3sm/ac.forsyth2/E3SMv3_20250403_try1/v3.LR.historical_0051/post/lnd/glb/ts/monthly/5yr. it still included all the land variables. And the biggest performance bottleneck i guess is coming from reading in all these variables reside in this directory. |
Ah thank you @chengzhuzhang, this is a big help. That directory has that many variables because of |
Table below constructed from runs discussed in #694 (comment), #694 (comment)
|
Currently running with changes in a618c64.
I thought I had the code requesting specific variables, so this confused me. My current best guess for the relevant code is this block in
Is that |
I had this question and posed to Tom when we had a meeting. @tomvothecoder will try to see if there can be some performance problem here. |
Updated table below; last two rows show clearly that the performance issue is related to having too much output from the
|
The docs seem to suggest so:
It does allow "List of file paths" though. |
I think there is a bottleneck with I added a GitHub issue with my findings and a possible solution: E3SM-Project/zppy-interfaces#21 |
Thanks Tom, I was just coming to that same conclusion. I have code changes I'm testing now. |
@chengzhuzhang As for this example cfg itself, it looks like the jobs finished without errors: cd /lcrc/group/e3sm/ac.forsyth2/E3SMv3_20250403_try2/v3.LR.historical_0051/post/scripts
grep -v "OK" *status
# No errors See www results. If that looks good to you, I think we can call this example cfg done and merge it. (I do want to remove the extra |
The diags are all generated and I don't see a problem by a brief review! |
I just made bcad046 to clean up some parameters for what users will actually want to use. @chengzhuzhang, I think if you mark approved on this, I can merge it. |
The commit looks good! |
@forsyth2 The branch protection is good. Let see if @tomvothecoder or @golaz can approve. |
In general yes, I agree, but it is an obstacle if it's a very small change or a non-user-facing change like #701. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it appears I can approve this one. I can't approve #701 though, probably because I'm the author there?
Summary
Objectives: add example cfg for helping users to transition to use zppy v3 which has user-facing changes that are not backward compatible.
Issue resolution:
Select one: This pull request is...
Please fill out either the "Small Change" or "Big Change" section (the latter includes the numbered subsections), and delete the other.
Small Change
Big Change
1. Does this do what we want it to do?
Required:
If applicable:
2. Are the implementation details accurate & efficient?
Required:
If applicable:
zppy/conda
, not just animport
statement.3. Is this well documented?
Required:
4. Is this code clean?
Required:
If applicable: