Skip to content

Probtest santis #308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
82fd0aa
Add probtest
AnnikaLau Jul 16, 2025
aad0625
Add step by step
AnnikaLau Jul 17, 2025
ad25535
Improve instructions
AnnikaLau Jul 17, 2025
c9e3340
Add sections
AnnikaLau Jul 17, 2025
ea1d786
fix according to changes in icon
AnnikaLau Jul 18, 2025
e6454d1
Fix
AnnikaLau Jul 21, 2025
6e6aaf1
Merge branch 'main' into probtest_santis
AnnikaLau Jul 21, 2025
7c5a680
fix setup
AnnikaLau Jul 21, 2025
3bc7648
suggested fixes
AnnikaLau Jul 21, 2025
9a95365
fix replace comment
AnnikaLau Jul 21, 2025
329a3c4
General comment about replacing instead
AnnikaLau Jul 21, 2025
dada481
fixes
AnnikaLau Jul 21, 2025
138b8cf
Add and fix links
mjaehn Jul 22, 2025
84e868c
Merge branch 'main' of github.com:C2SM/c2sm.github.io into probtest_s…
mjaehn Jul 25, 2025
0c6da9e
Move out-of-source info to compile and run
mjaehn Jul 25, 2025
2f77dfd
Some restructuring
mjaehn Jul 29, 2025
ae77b06
Minor additions
mjaehn Jul 29, 2025
a55750a
Merge branch 'main' into probtest_santis
AnnikaLau Aug 7, 2025
143bac0
Fix broken link
AnnikaLau Aug 7, 2025
8d32a90
fix sentence
AnnikaLau Aug 7, 2025
6fc8aee
fix typo
AnnikaLau Aug 7, 2025
80d921b
Add UENV_VERSION
AnnikaLau Aug 8, 2025
97b4d62
Fix links
AnnikaLau Aug 11, 2025
668e1f5
Fix links
AnnikaLau Aug 11, 2025
4cb3fba
Merge branch 'main' into probtest_santis
AnnikaLau Aug 11, 2025
b447739
use actual paths
AnnikaLau Aug 13, 2025
d8f0958
Add test case to CI
AnnikaLau Aug 14, 2025
ee26387
link to new branch (fixed version of probtest_container_wrapper
AnnikaLau Aug 14, 2025
0ba7289
Rename Large Use Case to Validate Custom Namelist
AnnikaLau Aug 15, 2025
776f319
Fix description
AnnikaLau Aug 18, 2025
ea40f34
Make more general
AnnikaLau Aug 18, 2025
ee3f513
Update instructions
AnnikaLau Aug 18, 2025
b8a252d
Add link to Run Probtest on Säntis
AnnikaLau Aug 18, 2025
9de4033
Change buildbot to CI
AnnikaLau Aug 18, 2025
7bd148c
Merge branch 'main' into probtest_santis
AnnikaLau Aug 18, 2025
9ac10cc
Update introduction to Validate Custom Namelist
AnnikaLau Aug 18, 2025
c7cbf82
Update information
AnnikaLau Aug 18, 2025
da7caad
move balfrin relevant infromation to balfrin
AnnikaLau Aug 18, 2025
036152a
Comment about out-of-source builds
AnnikaLau Aug 18, 2025
965249d
Update graph
AnnikaLau Aug 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/best_practices/data_handling.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
* [Forschungsdatenmanagement und Datenerhalt :material-open-in-new:](https://documentation.library.ethz.ch/display/DD/Forschungsdatenmanagement+und+Datenerhalt){:target="_blank"}
* [ETH Guidelines for Research Integrity :material-open-in-new:](https://doi.org/10.3929/ethz-b-000179298){:target="_blank"}
* [ETH Research Collection :material-open-in-new:](https://www.research-collection.ethz.ch){:target="_blank"}
* [Add your ORCID ID to your ETH account :material-open-in-new:](https://documentation.library.ethz.ch/display/RC/Assign+ORCID+iD){:target="_blank"}
* [Author profil and assign ORCID iD :material-open-in-new:](https://unlimited.ethz.ch/spaces/RC/pages/194119877/Author+profil+and+assign+ORCID+iD){:target="_blank"}
* [ETH Guidelines for data management plans :material-open-in-new:](https://unlimited.ethz.ch/pages/viewpage.action?pageId=194127962){:target="_blank"}

## ETH Contacts
Expand Down Expand Up @@ -49,6 +49,6 @@
* [IAC internal documentation on NetCDF in general :material-open-in-new:](https://wiki.iac.ethz.ch/IT/LinuxNetCDF){:target="_blank"} (!with IAC login only)
* [Climate and Forecast (CF) conventions :material-open-in-new:](http://cfconventions.org){:target="_blank"}
* [Python CF checker :material-open-in-new:](https://github.com/cedadev/cf-checker){:target="_blank"}
* [CMIP6 controlled vocabularies :material-open-in-new:](https://cmor.llnl.gov/mydoc_cmor3_CV/){:target="_blank"}
* [Controlled Vocabularies (CVs) for use in CMIP6 :material-open-in-new:](https://wcrp-cmip.github.io/CMIP6_CVs/){:target="_blank"}


2 changes: 1 addition & 1 deletion docs/glossary/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
| **ECMWF** | European Centre for Medium-range Weather Forecasts | [ECMWF website :material-open-in-new:](https://www.ecmwf.int/en/about){:target="_blank"} |
| **ED** | Executive Director ||
| **ERA5** | ECMWF ReAnalysis v5 | [ERA4 website :material-open-in-new:](https://www.ecmwf.int/en/forecasts/dataset/ecmwf-reanalysis-v5){:target="_blank"} |
| **ESGF** | Earth System Grid Federation | [ESGF website :material-open-in-new:](https://esgf.llnl.gov){:target="_blank"} |
| **ESGF** | Earth System Grid Federation | [ESGF website :material-open-in-new:](https://esgf.github.io/index.html){:target="_blank"} |
| **EXCLAIM** | EXtreme scale Computing and data platform for cLoud-resolving weAther and clImate Modeling | [EXCLAIM website :material-open-in-new:](https://exclaim.ethz.ch){:target="_blank"} |
| **EXTPAR** | EXTernal PARameters for numerical weather prediction and climate application | [EXTPAR documentation :material-open-in-new:](https://c2sm.github.io/extpar/){:target="_blank"} |
| **FPS** | Flagship Pilot Studies ||
Expand Down
3 changes: 2 additions & 1 deletion docs/models/icon/SUMMARY.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
* [ICON](index.md)
* [Compile and Run](compile_and_run.md)
* [Large Use Cases](large_use_cases.md)
* [Validate Custom Namelist](validate_custom_namelist.md)
* [Run Probtest on Säntis](probtest.md)
* [ICON-CLM](icon-clm.md)
44 changes: 38 additions & 6 deletions docs/models/icon/compile_and_run.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,25 +36,57 @@ Clone the ICON repository:

### Säntis

!!! info "Last update: 2025-05-22"
!!! info "Last update: 2025-08-18"

Säntis is regularly maintained by CSCS. In addition, the [uenvs](../../alps/uenvs.md) are updated irregularly. Therefore, some of the information provided here may be out of date. Please use the [C2SM support forum :material-open-in-new:](https://github.com/C2SM/Tasks-Support/discussions){:target="_blank"} in case of questions regarding building ICON on Säntis.

Run the following after navigating into ICON root folder (replace `cpu` by `gpu` if applicable):
Run the following after navigating into the ICON root folder (replace `gpu` by `cpu` if applicable):

```console
UENV_VERSION=$(cat config/cscs/SANTIS_ENV_TAG)
uenv run ${UENV_VERSION} -- ./config/cscs/santis.cpu.nvhpc
uenv run ${UENV_VERSION} -- ./config/cscs/santis.gpu.nvhpc
```

!!! Note

If you have never used a uenv on Säntis, you need to create a uenv repo first: `uenv repo create`.
If you have never used a uenv on Säntis, you need to create a uenv repo first:
```
uenv repo create
```

In case you are using the uenv version for the first time, you need to pull the image first:
```
uenv image pull $UENV_VERSION
```


In case you are using the uenv version for the first time, you need to pull the image first: `uenv image pull $UENV_VERSION`.
#### Building out-of-source

For out-of-source builds navigate into the build folder and adapt the path to the configure wrapper above.
Out-of-source builds are useful if you want to have two or more compiled versions of ICON in the same repository.
To achieve that, you simply need to create separate folders in the ICON root folder
and run the configure wrapper from there.

For example, if you want to compile ICON both for `cpu` and `gpu`, create those directories:

```bash
mkdir nvhpc_cpu
mkdir nvhpc_gpu
```

Then, navigate into the corresponding folder and compile with:

=== "`cpu`"
```bash
UENV_VERSION=$(cat config/cscs/SANTIS_ENV_TAG)
cd nvhpc_cpu
uenv run ${UENV_VERSION} -- ./../config/cscs/santis.cpu.nvhpc
```
=== "`gpu`"
```bash
UENV_VERSION=$(cat config/cscs/SANTIS_ENV_TAG)
cd nvhpc_gpu
uenv run ${UENV_VERSION} -- ./../config/cscs/santis.gpu.nvhpc
```

### Euler

Expand Down
139 changes: 0 additions & 139 deletions docs/models/icon/large_use_cases.md

This file was deleted.

119 changes: 119 additions & 0 deletions docs/models/icon/probtest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
# Run Probtest on Säntis

Use Probtest to verify whether your test case produces consistent results on GPU. It compares a GPU test run to a CPU ensemble with perturbed input conditions.

## 1. Compile ICON
Compile ICON on CPU and on GPU as [out-of-source builds](compile_and_run.md#building-out-of-source). Note that the build directories need to be sub-directories of the ICON root folder. Otherwise the probtest container does not have access to the data.

## 2. Set Up the Probtest Container and Environment on Säntis
To run Probtest for ICON on Säntis, use the prebuilt container available on Docker Hub ([Probtest Container :material-open-in-new:](https://github.com/MeteoSwiss/probtest?tab=readme-ov-file#probtest-container){:target="_blank"}). ICON provides the wrapper script [`probtest_container_wrapper.py` :material-open-in-new:](https://gitlab.dkrz.de/icon/icon-nwp/-/blob/add_test_cases_santis/scripts/cscs_ci/probtest_container_wrapper.py?ref_type=heads){:target="_blank"}.

!!! note
If your ICON version doesn’t include this script, add it to `scripts/cscs_ci/probtest_container_wrapper.py`, along with the appropriate [PROBTEST_TAG :material-open-in-new:](https://gitlab.dkrz.de/icon/icon/-/blob/main/run/tolerance/PROBTEST_TAG?ref_type=heads){:target="_blank"} under `run/tolerance/PROBTEST_TAG` and [yaml_experiment_test_processor.py :material-open-in-new:](https://gitlab.dkrz.de/icon/icon/-/blob/main/scripts/experiments/yaml_experiment_test_processor.py?ref_type=heads){:target="_blank"} under `scripts/experiments/yaml_experiment_test_processor.py` (replace if already available).


### When Setting Up ICON from Scratch
In your ICON root directory, import the container:

```console
PROBTEST_TAG=$(cat run/tolerance/PROBTEST_TAG)
enroot import docker://c2sm/probtest:${PROBTEST_TAG}
```

Add a TOML configuration and export EDF path (being used when running the container):
```console
echo "image = \"$(pwd)/c2sm+probtest+${PROBTEST_TAG}.sqsh\"" > probtest.toml
echo "mounts = [ \"$(pwd)\" ]" >> probtest.toml
echo "workdir = \"$(pwd)\"" >> probtest.toml
echo "writable = true" >> probtest.toml
export EDF_PATH=$(pwd)
```

Create and activate Python environment:
```console
python3 -m venv .venv
source .venv/bin/activate
pip install pyyaml pandas click toml
```

### Every Time You Reconnect to the Server
If the container and environment are already set up, simply re-run:
```console
export EDF_PATH=$(pwd)
source .venv/bin/activate
```

Set experiment name, e.g.:
```console
export EXPERIMENT=c2sm_clm_r13b03_seaice
```

Export required environment variables:
```console
export BB_NAME=santis_cpu_nvhpc
export UENV_VERSION=$(cat config/cscs/SANTIS_ENV_TAG)
```

## 3. Run perturbed ensemble on CPU
Navigate to your CPU build directory and generate and run a 10-member ensemble (this may take time):
```console
cd nvhpc_cpu
./make_runscripts $EXPERIMENT
uenv run ${UENV_VERSION} -- python3 scripts/cscs_ci/probtest_container_wrapper.py ensemble $EXPERIMENT --build-dir $(pwd) --member-ids $(seq -s, 1 10)
```

This generates:

- `stats_${EXPERIMENT}_<member_id>.csv`
- `${EXPERIMENT}_reference.csv`

## 4. Generate Tolerance from Ensemble

Create reference and tolerance files using the 10 ensemble members:
```console
python3 scripts/cscs_ci/probtest_container_wrapper.py tolerance $EXPERIMENT --build-dir $(pwd) --member-ids $(seq -s, 1 10)
```

This generates:

- `${EXPERIMENT}_tolerance.csv`

## 5. Run the test case on GPU and collect statistics
Navigate to your GPU build folder and run the same test case, e.g.:
```console
cd ../nvhpc_gpu
./make_runscripts $EXPERIMENT
cd run && sbatch --uenv ${UENV_VERSION} ./exp.c2sm_clm_r13b03_seaice.run && cd ..
```

Navigate back to ICON root folder and collect the GPU statistics:
```console
cd ..
python3 scripts/cscs_ci/probtest_container_wrapper.py stats $EXPERIMENT --stats-file-path stats_gpu.csv --build-dir nvhpc_gpu
```

This saves the GPU stats as `stats_gpu.csv` in your ICON root directory.

## 6. Check GPU Statistics Against Reference and Tolerance

From your ICON root directory, run the check using the generated reference and tolerance:
```console
python3 scripts/cscs_ci/probtest_container_wrapper.py check $EXPERIMENT --input-file-cur stats_gpu.csv --input-file-ref nvhpc_cpu/${EXPERIMENT}_reference.csv --tolerance-file-name nvhpc_cpu/${EXPERIMENT}_tolerance.csv --build-dir $(pwd)
```

## 7. Increase Ensemble Size if Validation Fails
A 10-member ensemble may not capture the full variability, causing false negatives. Increase to 49 members for better coverage from your CPU build directory:

Run additional members (11–49):
```console
cd nvhpc_cpu
./make_runscripts $EXPERIMENT
uenv run ${UENV_VERSION} -- python3 scripts/cscs_ci/probtest_container_wrapper.py ensemble $EXPERIMENT --build-dir $(pwd) --member-ids $(seq -s, 11 49)
```

Regenerate reference and tolerance using all 49 members:
```console
python3 scripts/cscs_ci/probtest_container_wrapper.py tolerance $EXPERIMENT --build-dir $(pwd) --member-ids $(seq -s, 1 49)
```

*If the test still fails, the GPU result is likely incorrect.*
Loading