Skip to content

bfairkun/cookiecutter-quarto-smk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cookiecutter-snakemake-workflow

Cookiecutter template for Snakemake workflows and Quarto documentation.
This project template is inspired by the cookiecutter snakemake project template and the suggested project structure for workflowr.
This template now uses Quarto for documentation and static site generation, supporting .qmd, .Rmd, and .ipynb notebooks.


USAGE

Install cookiecutter:

pip install cookiecutter

Start creating a snakemake-workflow from cookiecutter:

cookiecutter https://github.com/bfairkun/cookiecutter-wflowR-smk.git

After filling the prompts, this will create a project template with the following directory structure:

{{ cookiecutter.repo_name }}/
├── analysis
│   ├── about.qmd
│   ├── index.qmd
│   ├── license.qmd
│   └── _quarto.yml
├── code
│   ├── config
│   │   ├── config.yaml
│   │   └── samples.tsv
│   ├── envs
│   │   ├── {{ cookiecutter.repo_name }}.yaml
│   │   ├── jupyter.yml
│   │   ├── myenv.yaml
│   │   └── r_essentials.yml
│   ├── module_workflows
│   ├── README.md
│   ├── rules
│   │   ├── common.smk
│   │   └── other.smk
│   ├── scripts
│   │   └── common
│   │       └── __init__.py
│   ├── Snakefile
│   └── snakemake_profiles
│       └── slurm
│           ├── cluster-config.yaml
│           ├── config.yaml
│           ├── slurm-jobscript.sh
│           ├── slurm-status.py
│           ├── slurm-submit.py
│           └── slurm_utils.py
├── {{ cookiecutter.repo_name }}.Rproj
├── data
│   └── README.md
├── docs
│   └── assets
├── output
│   └── README.md
├── README.md

Guidelines for project organization

  • Optionally track the entire project with git init in the newly created project root.
  • Use the code directory to create a reproducible Snakemake pipeline which does heavy-lifting analysis to be run on a cluster environment from code as the working directory. The code/.gitignore makes it easy to git track all the code, but ignore tracking the potentially large files in the code directory. As you write the Snakemake pipeline, it is ok to create large untracked files which are too big to push to GitHub. Use Snakemake to do heavy lifting (e.g., download NGS data, align, etc.) and process the data to small files that can be easily tracked and pushed to GitHub. The cookiecutter will optionally create a conda environment for Snakemake with some basic NGS processing software in code/envs/{{ cookiecutter.repo_name }}.yaml. If you need to create additional rule-specific conda environments for Snakemake, they should also be saved in code/envs/. Using snakemake modules is a nice way to start a workflow, as in this example where I can include the code for a submodule as a nested git submodule. See the code/README.md for more on my Snakemake project template.
  • Write the Snakemake pipeline to output smaller processed files (e.g., gene x sample count tables for RNA-seq) to output where they will be tracked by git.
  • Raw data (that should never be directly edited) that is small enough to track with git should go in data.
  • Use the analysis directory to write Quarto (.qmd), Rmarkdown (.Rmd), and Jupyter notebook (.ipynb) files to document your thoughts and analysis of processed data. If these notebook files only read in the data files tracked in output or data, it should be easy for anyone to edit or re-run your notebooks by cloning this repo (without needing to run Snakemake or do the computationally intensive things).
    • To render notebooks into a static site and host on GitHub, use Quarto to render .qmd, .Rmd, and .ipynb files into HTML and place them into docs. The docs/assets directory can be used to save images that can also be referenced in notebooks and their rendered HTMLs. To enable GitHub Pages hosting, add the project to GitHub and modify the project settings to build a site from the /docs folder (in the "Pages" section of project settings).
    • Occasionally, you may write notebooks that need access to large untracked files output by Snakemake. In this case, it is helpful to follow a naming convention to specify which notebooks need access to these large files, so it is clear what notebooks can be run simply by cloning the repo, versus needing access to large untracked files.

Quarto Usage

  • The analysis/_quarto.yml file configures the Quarto site.

  • The analysis/index.qmd file lists all notebooks in the analysis/ directory.

  • To render the site, run the following in the analysis/ directory:

    quarto render
    
  • The rendered HTML will appear in the docs/ directory, ready for GitHub Pages.


Start scripting and documenting your project!

About

cookiecutter template of workflowR combined with snakemake pipeline in code

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors