https://github.com/dienerlab/pipelines

A monorepo containing all of our Nextflow pipelines.
https://github.com/dienerlab/pipelines

metagenomics metatranscriptomics microbiome nextflow

Last synced: 10 days ago
JSON representation

A monorepo containing all of our Nextflow pipelines.

Host: GitHub
URL: https://github.com/dienerlab/pipelines
Owner: dienerlab
License: apache-2.0
Created: 2024-03-03T12:55:25.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2026-06-01T08:37:00.000Z (11 days ago)
Last Synced: 2026-06-01T10:21:54.262Z (11 days ago)
Topics: metagenomics, metatranscriptomics, microbiome, nextflow
Language: Nextflow
Homepage: https://dienerlab.github.io/pipelines
Size: 1.2 MB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

README

[![checks](https://github.com/dienerlab/pipelines/actions/workflows/lint.yml/badge.svg)](https://github.com/dienerlab/pipelines/actions/workflows/lint.yml)

[![docs](https://github.com/dienerlab/pipelines/actions/workflows/docs.yml/badge.svg)](https://dienerlab.github.io/pipelines)

# :hammer: :triangular_ruler: Pipelines

This repo contains various analysis pipelines for the lab. Here are the basic
rules:

- each folder includes pipelines for a particular analysis - data type combination
- pipelines are [nextflow](https://www.nextflow.io/) workflows
- each pipeline comes with a list of conda environment files that manage the required software

## Data layout

Pipelines will usually operate from a top level project
directory structured in the following way:

```
[project root]
├─ [pipeline].nf
├─ data
│ ├─ raw
│ │ ├─ sample1_R1.fastq.gz
│ │ ├─ sample1_R2.fastq.gz
│ │ └─ ...
│ ├─ figures
│ ├─ fig1.png
│ │ └─ ...
│ └─ ...
└─ refs
├─ eggnog
└─ kraken2
```

The initial raw data lives in `data/raw` and all analysis artifacts should
be written into `data/` as well. Figures go into `figures/`.

## Setup

The first step is to copy or symlink the pipeline files into the top project
directory. After that you can set up a conda environment that includes all software
for the pipeline (please see individual pipelines for variations on that).

```bash
conda env create -f conda.yml
```

Either activate the environment (usualy named after the pipeline):

```bash
conda activate metagenomics
```

or run the pipeline with the `-with-conda /my/envs/metagenomics` option (required for HPC).

## Nextflow Configuration

You may also create a [nextflow config](https://www.nextflow.io/docs/latest/config.html) either in the project
directory as `nextflow.config` or in your user HOME as `~/.nextflow/config`. A template config is
[included in this repo](nextflow.config). If you are a lab member please use the [optimized
version from the wiki](https://github.com/dienerlab/internal/wiki/configs).

To install it as a global configuration:

On the server run the following to create the config directory

```bash
mkdir ~/.nextflow
```

After that edit and copy the config:

```bash
cp /path/to/pipelines/nextlow.config ~/.nextflow/config
```

Add in your token if you want to use [Nextflow Tower](https://tower.nf) to track your pipeline.

For slurm substitute the partition name `default` with the SLURM partition.

## Run the pipeline

After setup you can test the pipeline with

```bash
nextflow run [WORKFLOW].nf -profile local -resume
```

By default this will use all available 12 CPUs and 128 GB RAM unless specified otherwise in the personal [netxflow config](https://www.nextflow.io/docs/latest/config.html#scope-executor).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dienerlab/pipelines

Awesome Lists containing this project

README