A template for modular data workflows using snakemake
, part of the clio
toolset.
To familiarise yourself with clio
data modules:
- Check the auto-generated minimal example. You can find it in
tests/integration/Snakefile
. - Read about the
clio
approach in our documentation. - Read about
snakemake
modularisation in their documentation.
We recommend using pixi
as your package manager. Once installed, do the following:
-
Install the templater tool
copier
.pixi global install copier
-
Use
copier
to build a project with this template. A new module will be created in the directory you chose. We recommend you use the module name as the directory name.copier copy https://github.com/calliope-project/data-module-template.git ./path/to/<module_name>
If your terminal does not have access to
copier
then you may need to update yourPATH
variable to include~/.pixi/bin
. -
Answer some questions so can we pre-fill licensing, citation files, etc...
-
Initialise the
pixi
project environment of your new module.cd ./path/to/<module_name> # navigate to the new project pixi install --all # install the project environment
-
Extra: run the auto-generated example module!
cd tests/integration # go to the integration test... pixi run snakemake --use-conda # run it!
- Standardised layout compliant with the snakemake workflow catalogue's listing requirements.
resources/
: files needed for the module's processes.user/
: files that should be provided by users. Document them well!automatic/
: files that the module downloads or prepares in intermediate steps.
results/
: files generated by the module's algorithms that are relevant to the user.
- Pre-made integration setup for your module.
- Continuous Integration (CI) settings, ready for pre-commit.ci.
- GitHub actions to automate chores during pull requests and releases.
- Premade
pytest
setup.
- Documentation setup, ready for Read the Docs or Github Pages.
Important
A few things to be aware of.
- Modules do not work like regular snakemake workflows
- The primary way to test them should be external (calling
module:
, passing resources, and requesting results). Check the pre-made example intests/integration
for more info. - Internal access (e.g., calling the
all:
rule) may not work, as the module may not have the necessaryresources/
to execute properly.
- The primary way to test them should be external (calling
- Please be sure to maintain the following files to ensure
clio
compatibility- These are:
INTERFACE.yaml
: a simple description of the module's input/output structure.config/config.yaml
: a basic functioning example of how to configure this module.workflow/internal/config.schema.yaml
: the module's configuration schema, used bysnakemake
for validation.AUTHORS
/CITATION.cff
/LICENSE
: licensing and attribution of this module's code and methods.
- These are: