Skip to content

CADWRDeltaModeling/azure_dms_batch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AZURE DMS BATCH

YAML Configuration System - The Power of Templates

The YAML configuration and template system is what makes Azure DMS Batch uniquely powerful and flexible.

With just a simple YAML file, you can define and submit complex Azure Batch jobs without writing any code:

# Basic job configuration
resource_group: my_resource_group
job_name: test_job
batch_account_name: my_batch_account
storage_account_name: my_storage_account
template_name: win_dsm2

# VM configuration
vm_size: standard_ds2_v2
num_hosts: 1

# Command to execute
command: 'echo "This is a test job"'

Submit your job with a single command:

dmsbatch submit-job --file my_config.yml

Key Benefits:

  1. Simple Job Definition - Define all aspects of your job in a clean, readable YAML format
  2. Pre-built Templates - Use specialized templates for different workloads (DSM2, SCHISM, MPI, etc.)
  3. Minimal Configuration - Override only the parameters you need; inherit sensible defaults from templates
  4. Dynamic Substitution - Use variable references and custom tags for flexible configurations
  5. Template-Based Architecture - Standardized approach for different model types

Documentation:

Azure Batch runs for Models

Models are processes that take input and process via files and environment variables and run an executable producing output

(input(s) --> EXE --> output(s))

Azure Batch Job Architecture

Azure Batch runs for a model, i.e., a executable that runs independently based on a set of input files and environment variables and produces a set of output files.

Setup package

Use the environment.yml with conda to create an environment called azure

conda env create -f environment.yml

or

pip install -r requirements.txt

Git clone this project

git clone https://github.com/CADWRDeltaModeling/azure_dms_batch.git

Change directory to the location of this project and then install using

pip install --no-deps -e .

Setup Azure

Setup can be done via az commands. Here we setup a batch account with associated storage

Login with your Azure credentials

az login

Create a resource group in the desired location

See the Azure docs for details. To use the commands below, enter your values (replacing the angle brackets and values)

az group create --name <resource_group_name> --location <location_name>

az storage account create --resource-group <resource_group_name> --name <storage_account_name> --location <location_name> --sku Standard_LRS

az batch account create --name <batch_account_name> --storage-account <storage_account_name> --resource-group <resource_group_name> --location <location_name>

You can also create the batch account and associated account as explained here https://docs.microsoft.com/en-us/azure/batch/batch-account-create-portal

VM sizes available

This is needed later when deciding what machine sizes to use

az batch location list-skus --location <location_name> --output table

You can also browse the availability by region as not all VMs are available in every region

This page is to guide selection of VMs by different attributes

This is needed later when deciding what machine sizes to use

az batch location list-skus --location <location_name> --output table

You can also browse the availability by region as not all VMs are available in every region

This page is to guide selection of VMs by different attributes

OS Images available

set AZ_BATCH_ACCOUNT=<batch_account_name>
set AZ_BATCH_ACCESS_KEY=<batch_account_key>
set AZ_BATCH_ENDPOINT=<batch_account_url>
az batch pool supported-images list --output table

A sample output is included for quick reference

Tools

Azure allows you to do most things via the command line interface (cli) or the web console. However I have found the following desktop apps useful for working with these services.

Batch Explorer is a desktop tool for managing batch jobs, pools and application packages

Storage Explorer is a desktop tool for working with storage containers

Sample Configuration Files

For all new projects, we recommend using the YAML configuration system instead of writing Python code directly:

# Submit a job using a YAML configuration file
dmsbatch submit-job --file my_config.yml

This approach is much simpler than the notebook examples and provides all the same capabilities. We have several example configuration files in the sample_configs directory, such as:

SCHISM specific runs

See the detailed documentation for SCHISM specific run setup in README-schism-batch.md

MPI runs

Note: For MPI workloads, YAML configuration is now the recommended approach. See the SCHISM-specific configuration guide and the template system documentation for details.

Parameterized runs

Note: For information on how to submit parameterized runs, see the architecture documentation.

An example notebook for PTM batch runs that vary based on environment variables demonstrates this capability. It also shows an example where a large file needs to be uploaded and shared with all the running tasks.

Beopest runs

Note: For information on how BeoPEST is implemented, see the architecture documentation.

This notebook showing an implementation of the beopest run scheme demonstrates how this works.

Sample Notebooks

See the sample notebooks for examples The samples explain step by step and can be used as a template for writing your own batch run

See the simplest example notebook for running dsm2 hydro and outputting its version

See the slightly more involved example notebook for running dsm2 hydro with input and output file handling which uploads the input files as a zip and then uploads the output directory next to the uploaded input files at the end of the run

Note: While these notebooks demonstrate how to use the Azure DMS Batch API directly, we recommend using the YAML configuration system for new projects.

References

Documentation

Azure Documentation

MPI specific

Azure Batch MPI

Cluster configuration options

Intel MPI

Azure settings for Intel MPI

Intel MPI Pre-requisites