Cookiecutter template for Snakemake workflows and Quarto documentation.
This project template is inspired by the cookiecutter snakemake project template and the suggested project structure for workflowr.
This template now uses Quarto for documentation and static site generation, supporting .qmd, .Rmd, and .ipynb notebooks.
Install cookiecutter:
pip install cookiecutter
Start creating a snakemake-workflow from cookiecutter:
cookiecutter https://github.com/bfairkun/cookiecutter-wflowR-smk.git
After filling the prompts, this will create a project template with the following directory structure:
{{ cookiecutter.repo_name }}/
├── analysis
│ ├── about.qmd
│ ├── index.qmd
│ ├── license.qmd
│ └── _quarto.yml
├── code
│ ├── config
│ │ ├── config.yaml
│ │ └── samples.tsv
│ ├── envs
│ │ ├── {{ cookiecutter.repo_name }}.yaml
│ │ ├── jupyter.yml
│ │ ├── myenv.yaml
│ │ └── r_essentials.yml
│ ├── module_workflows
│ ├── README.md
│ ├── rules
│ │ ├── common.smk
│ │ └── other.smk
│ ├── scripts
│ │ └── common
│ │ └── __init__.py
│ ├── Snakefile
│ └── snakemake_profiles
│ └── slurm
│ ├── cluster-config.yaml
│ ├── config.yaml
│ ├── slurm-jobscript.sh
│ ├── slurm-status.py
│ ├── slurm-submit.py
│ └── slurm_utils.py
├── {{ cookiecutter.repo_name }}.Rproj
├── data
│ └── README.md
├── docs
│ └── assets
├── output
│ └── README.md
├── README.md
- Optionally track the entire project with
git initin the newly created project root. - Use the
codedirectory to create a reproducible Snakemake pipeline which does heavy-lifting analysis to be run on a cluster environment fromcodeas the working directory. Thecode/.gitignoremakes it easy to git track all the code, but ignore tracking the potentially large files in thecodedirectory. As you write the Snakemake pipeline, it is ok to create large untracked files which are too big to push to GitHub. Use Snakemake to do heavy lifting (e.g., download NGS data, align, etc.) and process the data to small files that can be easily tracked and pushed to GitHub. The cookiecutter will optionally create a conda environment for Snakemake with some basic NGS processing software incode/envs/{{ cookiecutter.repo_name }}.yaml. If you need to create additional rule-specific conda environments for Snakemake, they should also be saved incode/envs/. Using snakemake modules is a nice way to start a workflow, as in this example where I can include the code for a submodule as a nested git submodule. See thecode/README.mdfor more on my Snakemake project template. - Write the Snakemake pipeline to output smaller processed files (e.g., gene x sample count tables for RNA-seq) to
outputwhere they will be tracked by git. - Raw data (that should never be directly edited) that is small enough to track with git should go in
data. - Use the
analysisdirectory to write Quarto (.qmd), Rmarkdown (.Rmd), and Jupyter notebook (.ipynb) files to document your thoughts and analysis of processed data. If these notebook files only read in the data files tracked inoutputordata, it should be easy for anyone to edit or re-run your notebooks by cloning this repo (without needing to run Snakemake or do the computationally intensive things).- To render notebooks into a static site and host on GitHub, use Quarto to render
.qmd,.Rmd, and.ipynbfiles into HTML and place them intodocs. Thedocs/assetsdirectory can be used to save images that can also be referenced in notebooks and their rendered HTMLs. To enable GitHub Pages hosting, add the project to GitHub and modify the project settings to build a site from the/docsfolder (in the "Pages" section of project settings). - Occasionally, you may write notebooks that need access to large untracked files output by Snakemake. In this case, it is helpful to follow a naming convention to specify which notebooks need access to these large files, so it is clear what notebooks can be run simply by cloning the repo, versus needing access to large untracked files.
- To render notebooks into a static site and host on GitHub, use Quarto to render
-
The
analysis/_quarto.ymlfile configures the Quarto site. -
The
analysis/index.qmdfile lists all notebooks in theanalysis/directory. -
To render the site, run the following in the
analysis/directory:quarto render -
The rendered HTML will appear in the
docs/directory, ready for GitHub Pages.
Start scripting and documenting your project!