Set of tools for compressing netCDF files with Zarr.
The tools use the following compression libraries:
- Numcodecs: Zarr native library [documentation]
- numcodecs-wasm: Compression for codecs compiled to WebAssembly [documentation]
- EBCC: Error Bounded Climate Compressor [documentation]
System Prerequisites
- C/C++ compiler toolchain (required to build mpi4py)
- MPI implementation (required for mpi4py)
- ecCodes library for GRIB files
On Santis@ALPS:
export UENV_NAME="prgenv-gnu/24.11:v2"
On Balfrin@ALPS:
export UENV_NAME="netcdf-tools/2024:v1"
Then:
uenv image pull $UENV_NAME
uenv start --view=default $UENV_NAME
once the above is complete (just for Santis, locally it is not needed):
git clone [email protected]:C2SM/data-compression.git
python -m venv venv
source venv/bin/activate
bash install_dc_toolkit.sh
--------------------------------------------------------------------------------
Usage: dc_toolkit --help # List of available commands
Usage: dc_toolkit COMMAND --help # Documentation per command
Example:
dc_toolkit \ # CLI-tool
evaluate_combos \ # command
netCDF_files/tigge_pl_t_q_dx=2_2024_08_02.nc \ # netCDF file to compress
./dump \ # where to write the compressed file(s)
--field-to-compress t # field of netCDF to compress
--------------------------------------------------------------------------------
Two User Interfaces have been implemented to make the file compression process more user-friendly. Both UIs provide functionlaities for compressors similarity metrics and file compression.
Outside of the mutual UI functionalities, this UI allows users to download similarity metrics plots and tweak parameters more dynamically.
If launched from santis, make sure to ssh correctly:
ssh -L 8501:localhost:8501 santis
dc_toolkit run_web_ui_vcluster \
--user_account "YOUR_USER_ACCOUNT" \
--uenv_image UENV_NAME \
--uploaded_file "PATH_TO_FILE" \
--time "00:15:00" \
--nodes "1" --ntasks-per-node "72"
Local web-versions and non are also available:
dc_toolkit run_local_ui
dc_toolkit run_web_ui