https://github.com/lucas-diedrich/snakemake-learning
GitHub Repository for the snakemake learn session at the @MannLabs Group Retreat 2025
https://github.com/lucas-diedrich/snakemake-learning
tutorial
Last synced: 3 days ago
JSON representation
GitHub Repository for the snakemake learn session at the @MannLabs Group Retreat 2025
- Host: GitHub
- URL: https://github.com/lucas-diedrich/snakemake-learning
- Owner: lucas-diedrich
- License: mit
- Created: 2025-05-23T17:40:55.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2025-07-11T09:25:06.000Z (7 months ago)
- Last Synced: 2025-09-05T08:53:57.686Z (5 months ago)
- Topics: tutorial
- Language: Python
- Homepage:
- Size: 1.06 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# snakemake-learning
GitHub Repository for the hands-on snakemake learn session at the MannLabs Group Retreat 2025
Snakemake is a python-based workflow manager that is supposed to make your life easier when analysing large datasets. It **enforces reproducibility** and **enables scalability**.
### Tutorial overview
In this tutorial, we will
1. read in a dataset (here: a small image)
2. process it with a simple function (here: apply different image transformations to it)
3. generate a plot as output (here: histograms of pixel intensities)
4. generate a snakemake report.

## Installation
1. Using the command line, go into your favorite directory (`cd /path/to/my/favorite/directory`)
2. Clone this repository
```shell
git clone https://github.com/lucas-diedrich/snakemake-learning.git
```
(or download it via `Code > Download ZIP`, and unzip it locally)
3. Go into the directory
```shell
cd snakemake-learning
```
4. Create a `mamba`/`conda` environment with snakemake based on the `environment.yaml` file and activate it
```shell
mamba create -n snakemake-env --file environment.yaml && mamba activate snakemake-env
# OR conda env create -f environment.yaml && conda activate snakemake-env
```
5. Check if the installation was successful
```shell
snakemake --version
> 9.5.1
```
## Tutorial
### 1. Snakemake - Introduction
See the slides in `./docs`
### 2. Check out the workflow
Run the following command in the root directory (`.`) to se the whole task graph.
```shell
# --dag: Directed acyclic graph
snakemake --dag
```
And the following command to inspect how the rules depend on one another (simpler than task graph, especially for large workflows)
```shell
# --rulegraph: Show dependencies between rules
snakemake --rulegraph
```
```mermaid
---
title: Rule Graph
---
flowchart TB
id0[all]
id1[plot_histogram]
id2[transform_image]
id3[save_image]
style id0 fill:#CD5C5C,stroke-width:2px,color:#333333
style id1 fill:#F08080,stroke-width:2px,color:#333333
style id2 fill:#FA8072,stroke-width:2px,color:#333333
style id3 fill:#E9967A,stroke-width:2px,color:#333333
id0 --> id0
id1 --> id0
id2 --> id1
id3 --> id2
```
You can use this [`grapviz visualizer`](https://dreampuf.github.io/GraphvizOnline/) editor to view the task graph
### 3. Run the full workflow
Go in the `./workflow` directory and run:
```shell
snakemake --cores 2 --use-conda
```
The output can be found in the `./results` directory
### Generate the report
Go in the `./workflow` directory and run
```shell
snakemake --report ../results/report.html
```
The output can be found in the `./results` directory
## Run on a slurm HPC cluster
You can run this workflow on an high-performance computing cluster (_here leveraging the slurm manager_). In this case, one slurm job acts as a scheduler that submits individual rule executions as separate slurm jobs. The `snakemake-executor-plugin-slurm` automatically handles the scheduling and submission of dependent jobs. Please checkout the script `/workflow/snakemake.sbatch` and the official [snakemake slurm plugin documentation](https://snakemake.github.io/snakemake-plugin-catalog/plugins/executor/slurm.html#snakemake-executor-plugin-slurm) to learn more about the relevant flags and settings.
### Execution
Install the environment
```
conda create -n snakemake-env -y
conda env update --n snakemake-env --file environment.yaml
```
Additionally install the `snakemake-executor-plugin-slurm`:
```shell
pip install snakemake-executor-plugin-slurm
```
Then submit the provided workflow script on a cluster
```shell
cd /workflow/
sbatch snakemake.sbatch
```
## Exercises
*To further deepen your understanding after the workshop.*
### 1. Scale the workflow to other images
The script `create-data.py` can take image names (that are part of the `skimage` package) as arguments.
```shell
python scripts/create-data.py --image-name --output
```
Modify the workflow in a way that it also (=in addition) runs on other `skimage` example datasets, e.g. `colorwheel, cat, logo`
### 2. Add a rule
Add a new rule in which you generate an aggregated plot - where the image and its modifications are shown in the top row and the associated histograms are shown in the bottom row.
### 3. Prettify the report
Explore possibilities to modify the report with the rich structured text format.
## References
- **Snakemake homepage + Documentation** [snakemake.readthedocs.io](https://snakemake.readthedocs.io/en/stable/index.html)
- **Publication** Mölder F, Jablonski KP, Letcher B et al. Sustainable data analysis with Snakemake [version 2; peer review: 2 approved]. F1000Research 2021, 10:33 (https://doi.org/10.12688/f1000research.29032.2)