Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
83 commits
Select commit Hold shift + click to select a range
8c65e64
add scripts for dataset creation and training
VeraChristina May 21, 2025
19eb068
add basic suite to test dataset creation and training
VeraChristina Jun 16, 2025
d660f8d
use venv and uv
VeraChristina Jun 26, 2025
06086f0
update test config folder structure
VeraChristina Jun 27, 2025
31319be
update to new wellies version
VeraChristina Jul 7, 2025
a1d619e
add workflow drafts
VeraChristina Jul 8, 2025
8561dde
add secrets and pull request trigger for debugging
VeraChristina Jul 8, 2025
e56cdc0
update secrets
VeraChristina Jul 8, 2025
cca80ad
debug
VeraChristina Jul 8, 2025
8b828f7
add print results step
VeraChristina Jul 8, 2025
8847483
debug
VeraChristina Jul 8, 2025
f0ff171
remove print results step
VeraChristina Jul 8, 2025
304dd27
debug
VeraChristina Jul 8, 2025
e9c7eb2
add print results step
VeraChristina Jul 8, 2025
d4d774b
small fix in print results step
VeraChristina Jul 8, 2025
e0c25eb
clean only after printing
VeraChristina Jul 8, 2025
b5be3a6
always run print, comment out clean
VeraChristina Jul 8, 2025
0795c3e
comment out clean
VeraChristina Jul 8, 2025
f9141f8
Minor debug in print action
corentincarton Jul 8, 2025
49ebf50
Update system_level_test.yaml
corentincarton Jul 8, 2025
d1845d7
add top level group and did minor cleanup
corentincarton Jul 9, 2025
69176a6
debug action
corentincarton Jul 9, 2025
74706f3
update steps triggers
corentincarton Jul 9, 2025
6403c4a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 9, 2025
5512a01
minor change in structure
corentincarton Jul 9, 2025
a49a578
merge conflicts
corentincarton Jul 9, 2025
3787b0d
fix bug in action
corentincarton Jul 9, 2025
c2091f4
debug run task in action
corentincarton Jul 9, 2025
846364f
debug run task in action
corentincarton Jul 9, 2025
d7c6445
debug run task in action
corentincarton Jul 9, 2025
fef3b7a
debug run task in action
corentincarton Jul 9, 2025
9c1fa97
debug print in action
corentincarton Jul 9, 2025
3cae9cc
add summary step
corentincarton Jul 9, 2025
75917c7
debug summary step
corentincarton Jul 9, 2025
d00b2fe
debug summary step
corentincarton Jul 9, 2025
d7835ed
debug summary step
corentincarton Jul 9, 2025
85d2dd4
debug summary step
corentincarton Jul 9, 2025
a80cc3e
debug summary step
corentincarton Jul 9, 2025
34cd8a0
debug summary step
corentincarton Jul 9, 2025
198f0ef
replace dashes in suite names
VeraChristina Jul 9, 2025
656d0ea
update test user for action
corentincarton Jul 10, 2025
7b6e197
Merge branch 'test/system-level-prototype-ecflow' of github.com:ecmwf…
corentincarton Jul 10, 2025
688de19
update path to print tool in action
corentincarton Jul 10, 2025
2d08fab
add deploy family to get dataset configs
VeraChristina Jul 10, 2025
c79f62e
adding link to anemoi repos branch in user config for tests
corentincarton Jul 10, 2025
96e1df1
fix format in tests summary
corentincarton Jul 10, 2025
86ccda1
only get configs folder as static data
VeraChristina Jul 16, 2025
c2e1fee
make branches configurable
VeraChristina Jul 21, 2025
8e113db
remove pull request trigger
VeraChristina Jul 21, 2025
aa60e2e
delay clean up until summary is complete
VeraChristina Jul 21, 2025
59878fc
remove config_name from workflow options
VeraChristina Jul 21, 2025
fe4f00b
add cron trigger and pull request trigger for debugging
VeraChristina Jul 21, 2025
3129287
add submission params for training on gpu
VeraChristina Jul 22, 2025
8b53627
move suite to prepml workdir and point to uv cache
VeraChristina Jul 23, 2025
6240a9a
debug
VeraChristina Jul 23, 2025
53f0702
debug
VeraChristina Jul 23, 2025
c0fa1cc
debug
VeraChristina Jul 23, 2025
4277d27
debug
VeraChristina Jul 23, 2025
11c2fef
debug
VeraChristina Jul 23, 2025
f9ff8e4
debug
VeraChristina Jul 23, 2025
7b49402
add lam use case
VeraChristina Jul 31, 2025
a6fe016
use explicit training configs
VeraChristina Aug 1, 2025
abd3502
add docs
VeraChristina Aug 7, 2025
6c56781
remove pull request trigger
VeraChristina Aug 7, 2025
7aa69bf
Merge branch 'main' into test/system-level-prototype-ecflow
VeraChristina Aug 7, 2025
1170a21
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 7, 2025
44b92ec
Merge branch 'main' into test/system-level-prototype-ecflow
VeraChristina Aug 8, 2025
6f6243f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 8, 2025
b3a2c06
improve readability
VeraChristina Aug 13, 2025
042eb7d
add README for local build and deploy
VeraChristina Aug 13, 2025
b50cb3d
address feedback
VeraChristina Aug 14, 2025
4051848
use pathlib
VeraChristina Aug 14, 2025
9e40aae
set triggers for training tasks in main family
VeraChristina Aug 15, 2025
0869631
add cleanup
VeraChristina Aug 15, 2025
5f28e32
add small check if created dataset exists
VeraChristina Aug 26, 2025
e7c6b43
add basic check for checkpoints
VeraChristina Aug 29, 2025
992a843
make anemoi commands configurable
VeraChristina Aug 29, 2025
9435afe
update docs
VeraChristina Aug 29, 2025
508d004
Update path to tracksuite-print (#54)
corentincarton Sep 2, 2025
66e6de0
add nightly workflow which checks for recent commits
VeraChristina Sep 3, 2025
4f07ebb
typo and remove pr trigger
VeraChristina Sep 3, 2025
1f847a8
Merge branch 'main' into test/system-level-prototype-ecflow
VeraChristina Sep 3, 2025
9deee33
make task_config entries more generic
VeraChristina Sep 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions .github/actions/deploy/action.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
name: deploy-suite
description: |
A Github action to build and deploy a suite deployed on an ecflow server.
inputs:
troika_user:
description: User used to submit troika job.
required: true
sbatch_options:
description: List of SBATCH directives
required: false
site:
description: HPC site name.
required: false
default: hpc-batch
github_token:
description: Github token.
required: true
ecflow_host:
description: ecflow server hostname.
required: true
ecflow_port:
description: ecflow server port.
required: true
suite_name:
description: Name of the suite.
required: true
build_options:
description: List of options to pass to the build script.
required: false
default: "limit_workers=1 "
runs:
using: composite
steps:
- name: Build and deploy suite to ecflow server
uses: ecmwf-actions/reusable-workflows/ci-hpc-generic@v2
with:
template: |
set -eux
echo "Job is running on ${{ runner.os }}"

# Load modules of interest
module load ecflow
module load wellies

# Clone the repository
PACKAGE_NAME=anemoi-docs
PACKAGE_BRANCH=${{ github.head_ref || github.ref_name }}
git clone -b $PACKAGE_BRANCH https://${{ inputs.github_token }}@github.com/ecmwf/${PACKAGE_NAME}.git
cd $PACKAGE_NAME
cd tests/system-level

# Set ecflow server variables
export ECF_HOST=${{ inputs.ecflow_host }}
export ECF_PORT=${{ inputs.ecflow_port }}

# Build the suite definition file
BUILD_DIR=${SCRATCHDIR}/pyflow/${{ inputs.suite_name }}/build
./build.sh -s name=${{ inputs.suite_name }} ${{ inputs.build_options }} -y

# Deploy the suite. If a suite is already running it will fail. User should deal with its zombies
ecflow_client --replace=/anemoi_tests/${{ inputs.suite_name }} $HOME/pyflow/anemoi_tests/${{ inputs.suite_name }}/${{ inputs.suite_name }}.def
sbatch_options: ${{ inputs.sbatch_options }}
troika_user: ${{ inputs.troika_user }}
site: ${{ inputs.site }}
51 changes: 51 additions & 0 deletions .github/workflows/nightly_ci_hpc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
name: nightly-system-level-test

on:
schedule:
- cron: '0 23 * * 0' # Nightly at 23pm on all main branches
# pull_request:
# types: [opened, synchronize, reopened]
# branches:
# - main
# - develop


jobs:
check-commits:
runs-on: ubuntu-latest
name: Check latest commits
outputs:
should_run: ${{ steps.check-commits.outputs.should_run }}
steps:
- name: Check for recent commits on main in anemoi repos
id: check-commits
run: |
REPOS=("anemoi-core" "anemoi-datasets" "anemoi-docs")
SHOULD_RUN=false

for REPO in "${REPOS[@]}"; do
echo "Checking repository: $REPO"

# Fetch the latest commit date on the main branch from GitHub API
LATEST_COMMIT_DATE=$(curl -s "https://api.github.com/repos/ecmwf/$REPO/commits/main" | jq -r '.commit.committer.date')

if [ "$(date -d "$LATEST_COMMIT_DATE" +%s)" -gt "$(date -d "24 hours ago" +%s)" ]; then
echo "Recent commit found in $REPO: $LATEST_COMMIT_DATE"
SHOULD_RUN=true
fi
done

echo "should_run=$SHOULD_RUN" >> "$GITHUB_OUTPUT"

system_level_test:
needs: check-commits
if: needs.check-commits.outputs.should_run == 'true'
uses: ./.github/workflows/system_level_test.yaml
with:
suite_name: "nightly"
build_options: ""
secrets:
gh_token: ${{ secrets.GH_REPO_READ_TOKEN }}
troika_user: ${{ secrets.HPC_SYSTEM_TEST_USER }}
ecflow_host: ${{ secrets.HPC_SYSTEM_TEST_HOST }}
ecflow_port: ${{ secrets.HPC_SYSTEM_TEST_PORT }}
61 changes: 61 additions & 0 deletions .github/workflows/on_demand_ci_hpc.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: on-demand-system-level-test

on:
workflow_dispatch:
inputs:
anemoi_docs_branch:
description: Branch for anemoi-docs to be used to deploy the anemoi_tests suite.
required: false
anemoi_datasets_branch:
description: Branch for anemoi-datasets to be used to build the datasets environment for testing.
required: false
default: "main"
anemoi_training_branch:
description: Branch for anemoi-training to be used to build the training environment for testing.
required: false
default: "main"


jobs:
prepare:
runs-on: ubuntu-latest
outputs:
suite_name: ${{ steps.normalize.outputs.suite_name }}
build_options: ${{ steps.set_build_options.outputs.build_options }}
steps:
- name: Normalize suite_name
id: normalize
run: echo "suite_name=${GITHUB_TRIGGERING_ACTOR//-/_}" >> "$GITHUB_OUTPUT"
env:
GITHUB_TRIGGERING_ACTOR: ${{ github.triggering_actor }}
- name: Set build options
id: set_build_options
run: |
build_options=""

if [[ "${{ github.event_name }}" == "workflow_dispatch" ]]; then
docs_branch="${{ github.event.inputs.anemoi_docs_branch }}"
datasets_branch="${{ github.event.inputs.anemoi_datasets_branch }}"
training_branch="${{ github.event.inputs.anemoi_training_branch }}"

# Default to ref name if docs branch not specified
if [[ -z "$docs_branch" ]]; then
docs_branch="${{ github.ref_name }}"
fi

build_options="anemoi_docs_branch=$docs_branch anemoi_datasets_branch=$datasets_branch anemoi_training_branch=$training_branch"
fi

echo "build_options=$build_options" >> "$GITHUB_OUTPUT"

system_level_test:
needs: prepare
uses: ./.github/workflows/system_level_test.yaml
with:
suite_name: ${{ needs.prepare.outputs.suite_name }}
build_options: ${{ needs.prepare.outputs.build_options }}
secrets:
gh_token: ${{ secrets.GH_REPO_READ_TOKEN }}
troika_user: ${{ secrets.HPC_SYSTEM_TEST_USER }}
ecflow_host: ${{ secrets.HPC_SYSTEM_TEST_HOST }}
ecflow_port: ${{ secrets.HPC_SYSTEM_TEST_PORT }}
146 changes: 146 additions & 0 deletions .github/workflows/system_level_test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
name: system-level-test
description: |
A Github workflow to build, deploy, run and check that a suite
finished well on an ecflow server.

on:
workflow_call:
inputs:
build_options:
required: true
type: string
suite_name:
required: true
type: string
secrets:
gh_token:
required: true
troika_user:
required: true
ecflow_host:
required: true
ecflow_port:
required: true

jobs:
deploy:
runs-on: hpc
name: Deploy
steps:
- uses: actions/checkout@v4
- name: Build and deploy the suite
uses: ./.github/actions/deploy
with:
sbatch_options: |
#SBATCH --job-name=anemoi_test_build_deploy
#SBATCH --time=00:10:00
#SBATCH --qos=nf
site: hpc-batch
troika_user: ${{ secrets.troika_user }}
github_token: ${{ secrets.gh_token }}
ecflow_host: ${{ secrets.ecflow_host }}
ecflow_port: ${{ secrets.ecflow_port }}
suite_name: ${{ inputs.suite_name }}
build_options: ${{ inputs.build_options }}
run:
runs-on: hpc
name: Run
needs: deploy
steps:
- uses: ecmwf/reusable-workflows/ci-hpc-generic@v2
name: Run suite from ecflow server
with:
template: |
set -eux
echo "Job is running on ${{ runner.os }}"

# Load modules of interest
module load ecflow

# Set ecflow server variables
export ECF_HOST=${{ secrets.ecflow_host }}
export ECF_PORT=${{ secrets.ecflow_port }}

# Run the suite
ecflow_client --begin=/anemoi_tests
ecflow_client --resume=/anemoi_tests/${{ inputs.suite_name }}
sbatch_options: |
#SBATCH --job-name=anemoi_test_build
#SBATCH --time=00:10:00
#SBATCH --qos=nf
troika_user: ${{ secrets.troika_user }}
site: hpc-batch
monitor:
runs-on: hpc
name: Monitor
needs: run
steps:
- name: Check that the suite is complete
uses: ecmwf/reusable-workflows/hpc/ecflow/wait-for-ecflow-suite-to-complete@v2
with:
sbatch_options: |
#SBATCH --job-name=anemoi_test_monitor
#SBATCH --time=01:00:00
#SBATCH --qos=nf
site: hpc-batch
troika_user: ${{ secrets.troika_user }}
suite_name: anemoi_tests/${{ inputs.suite_name }}
ecflow_host: ${{ secrets.ecflow_host }}
ecflow_port: ${{ secrets.ecflow_port }}
delay: 60
timeout: 3600
print:
runs-on: hpc
name: Print
needs:
- deploy
- monitor
if: always() && needs.deploy.result == 'success'
steps:
- name: Print results
uses: ecmwf/reusable-workflows/ci-hpc-generic@v2
with:
template: |
set -eux
echo "Job is running on ${{ runner.os }}"

# Load modules of interest
module load wellies/1.2.0

mkdir -p /scratch/${{ secrets.troika_user }}/anemoi_tests/${{ inputs.suite_name }}
/home/mlx/bin/tracksuite-print /anemoi_tests/${{ inputs.suite_name }} --host ${{ secrets.ecflow_host }} -f html &> /scratch/${{ secrets.troika_user }}/anemoi_tests/${{ inputs.suite_name }}/summary.md
sbatch_options: |
#SBATCH --job-name=anemoi_test_print_results
#SBATCH --time=00:10:00
#SBATCH --qos=nf
site: hpc-batch
troika_user: ${{ secrets.troika_user }}
summary:
runs-on: hpc
name: Summary
needs: print
if: always() && needs.print.result == 'success'
steps:
- name: Print summary
run: |
scp ${{ secrets.troika_user }}@hpc-batch:/scratch/${{ secrets.troika_user }}/anemoi_tests/${{ inputs.suite_name }}/summary.md $GITHUB_STEP_SUMMARY
clean:
runs-on: hpc
name: Clean
needs:
- monitor
- summary
if: always() && needs.monitor.result == 'success'
steps:
- name: Clean the suite
uses: ecmwf/reusable-workflows/hpc/ecflow/remove-suite@v2
with:
sbatch_options: |
#SBATCH --job-name=anemoi_test_clean
#SBATCH --time=00:10:00
#SBATCH --qos=nf
site: hpc-batch
troika_user: ${{ secrets.troika_user }}
suite_name: anemoi_tests/${{ inputs.suite_name }}
ecflow_host: ${{ secrets.ecflow_host }}
ecflow_port: ${{ secrets.ecflow_port }}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ _build/
?.*
~*
*.egg-info/
__pycache__/
Loading