https://github.com/angelovangel/nxf-alignment

Nextflow pipeline to process ONT adaptive sampling runs (basecalling, mapping, statistics)
https://github.com/angelovangel/nxf-alignment
basecalling nanopore nextflow-pipeline sequencing
Last synced: 3 months ago
JSON representation
Nextflow pipeline to process ONT adaptive sampling runs (basecalling, mapping, statistics)
Host: GitHub
URL: https://github.com/angelovangel/nxf-alignment
Owner: angelovangel
License: mit
Created: 2025-12-02T06:42:50.000Z (6 months ago)
Default Branch: main
Last Pushed: 2026-01-13T13:00:33.000Z (5 months ago)
Last Synced: 2026-01-13T21:22:40.748Z (5 months ago)
Topics: basecalling, nanopore, nextflow-pipeline, sequencing
Language: Python
Homepage:
Size: 174 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # nxf-alignment

A Nextflow workflow for basecalling (ONT only), aligning, and variant calling for long-read sequencing data (ONT + HiFi).

## Features

- **Basecalling**: Uses Dorado for basecalling (and demultiplexing) with optional adaptive sampling support (ONT)

- **Alignment**: Aligns reads to a reference genome using Dorado aligner (modifications are preserved, ONT or HiFi data)

- **Coverage Analysis**: Calculates per-region coverage statistics with thresholds (1x, 10x, 20x, 30x)

- **SNP Variant Calling**: Uses Clair3 or DeepVariant for SNP variant calling (ONT or HiFi data)

- **Structural Variant Calling**: Uses Sniffles2 for structural variant calling (ONT or HiFi data)

- **Variant Annotation**: Uses snpEff for variant annotation (ONT or HiFi data)

- **Base Modifications Analysis**: Uses modkit for base modifications analysis (ONT or HiFi data)

- **Interactive HTML Report**: Generates an interactive report with read statistics, coverage, variants and annotations metrics

>Note: You can also import this workflow in EPI2ME, see [EPI2ME documentation](https://epi2me.nanoporetech.com/)

## Requirements

- **Nextflow** >= 23.04

- **Docker** 

- **NVIDIA GPU** (for basecalling and variants)

## Quick Start

#### Basic Workflow (Basecalling + Alignment + Variant Calling)

For an adaptive sampling run, basecalling is done for the accepted reads based on the decision file produced by MinKNOW.

```bash

nextflow run angelovangel/nxf-alignment \

  --pod5 /path/to/pod5/dir \

  --asfile /path/to/AS_decisions.csv \ # optional

  --model hac \

  --bed /path/to/regions.bed \ # optional, if provided the report contains coverage analysis per region from bed file 

  --ref /path/to/ref.fasta \

  --snp \

  --annotate

```

#### Barcoded run (Basecalling + Alignment)

For a barcoded run, provide a [samplesheet](#sample-sheet-barcoded-runs) and kit name

```bash

nextflow run angelovangel/nxf-alignment \

  --pod5 /path/to/pod5/dir \

  --model hac,5mC_5hmC \

  --bed /path/to/regions.bed \

  --ref /path/to/ref.fasta \

  --kit SQK-RBK114-96 \

  --samplesheet /path/to/samplesheet.csv

```

>Note: Sample name is obtained from the pod5 file (the sample ID entered in MinKNOW). If another sample name is desired, use the `--samplename` parameter. For barcoded runs sample names are taken from the samplesheet.

#### Skip Basecalling (Align Existing BAM/FASTQ + SNP/SV Variant Calling)

If the basecalling has been performed before, the pipeline can be run with the `--reads` parameter. The reads can be in any HTS format, a directory of reads can also be given. If the reads contain base modifications, you can use the `--mods` parameter to perform base modification analysis - create a summary of counts of modified and unmodified bases.

```bash

nextflow run angelovangel/nxf-alignment \

  --reads /path/to/reads.bam \ # can be also a directory with reads

  --ref /path/to/ref.fasta \

  --bedfile /path/to/regions.bed \

  --snp \

  --sv \

  --annotate \

  --mods

```

#### Skip alignment (basecalling only)

Basecalling (for single sample and barcoded runs) can also be performed without alignment, using the `--basecall` or `--report` parameters.

`--basecall` will do basecalling (and evt demultiplexing), `--report` will do basecalling + report

```bash

nextflow run angelovangel/nxf-alignment \

    --pod5 /path/to/pod5/dir \

    --model hac \

    --kit SQK-RBK114-96 \ # for barcoded runs only

    --samplesheet /path/to/samplesheet.csv \ # for barcoded runs only

    --report

```

## Parameters

#### Core Parameters

| Parameter | Type | Default | Description |

|-----------|------|---------|-------------|

| `pod5` | path | - | Directory containing POD5 files (required if not using `--reads`) |

| `reads` | path | null | Path to input BAM/FASTQ file(s) or directory (skips basecalling) |

| `ref` | path | - | Reference genome in FASTA format (required unless `--basecall` or `--report` is used) |

| `basecall` | boolean | `false` | Run the pipeline up to basecalling only |

| `report` | boolean | `false` | Run the pipeline up to reporting only (skips alignment and variants) |

| `model` | string | `fast` | Dorado basecall model, see [available models](https://software-docs.nanoporetech.com/dorado/latest/models/list/)|

| `outdir` | string | `results` | Output directory for results |

#### Optional (advanced) Parameters

| Parameter | Type | Default | Description |

|-----------|------|---------|-------------|

| `samplename` | string | null | Sample name to use for non-barcoded runs (if not provided, sample name is obtained from the pod5 file) |

| `asfile` | path | null | Adaptive sampling decisions CSV (if using AS filtering) |

| `herro` | boolean | null | Enable HERRO read correction. The corrected reads will be in 00-basecall, but will NOT be used in alignment. |

| `kit` | string | null | Barcoding kit name (e.g., `SQK-NBD111-96`). Required for barcoded runs |

| `samplesheet` | path | null | Sample sheet CSV or XLSX with columns: `sample`, `barcode`. Required for barcoded runs |

| `bed` | path | null | BED file with target regions (auto-generated from reference if not provided) |

| `snp` | boolean | false | Perform SNP variant calling using Clair3 or DeepVariant |

| `snp_caller` | string | `clair3` | SNP variant caller to use (`clair3` or `deepvariant`, use only with `--snp`) |

| `clair3_platform` | string | `ont` | Platform to use for Clair3 (`ont` or `hifi`) |

| `clair3_model` | string | `r1041_e82_400bps_hac_v500` | Model to use for Clair3 |

| `deepvariant_model` | string | `ONT_R104` | Model to use for DeepVariant |

| `sv` | boolean | false | Perform SV variant calling using Sniffles2 |

| `phase` | boolean | false | Perform SNP phasing using Whatshap (use only with `--snp`, only diploid cases supported) |

| `annotate` | boolean | false | Annotate SNP variants using snpEff (use only with `--snp`) |

| `anno_db` | string | `hg38` | Database to use for annotation |

| `anno_filterQ` | int | `20` | Filter out SNP variants with quality lower than this before annotation |

| `mods` | boolean | false | Perform base modification analysis using modkit (`--ref` is required)|

| `mods_filter` | int | `5` | Minimum coverage for base modifications calls |

#### Profiles

Predefined set of parameters for common use cases, use with `-profile`:

| Profile | Description |

|---------|-------------|

| `standard` | Standard workflow with Docker GPU support |

| `dev` | Workflow for testing on Apple Silicon |

| `singularity` | Workflow with Singularity GPU support |

| `revio` | Workflow optimized for HiFi Revio (use with `--reads` to skip basecalling)|

## Output Structure

```

output/

├── 00-basecall/

│   ├── reads.bam                       # Basecalled reads

│   ├── reads.bam.bai                   # BAM index

│   └── processed/                      # Per-sample BAMs (if barcoded)

│       ├── sample_1.bam

│       └── sample_1.bam.bai

├── 01-align/

│   ├── reads.align.bam                 # Aligned reads

│   └── reads.align.bam.bai             # BAM index

│   ├── reads.align.ht.bam              # Haplotagged aligned reads

│   └── reads.align.ht.bam.bai          # Haplotagged aligned BAM index

├── 02-coverage/

│   ├── reads.hist.tsv                  # Coverage histogram

│   └── reads.bigwig                    # Coverage bigwig

├── 03-variants/

│   ├── reads.snp.vcf                   # SNP variants

│   ├── reads.sv.vcf                    # SV variants

│   └── reads.ann.vcf                   # Annotated variants

├── 04-modifications/

│   ├── reads.bedmethyl                 # Output of modkit pileup

│   └── reads.summary.tsv               # Base modification summary

├── nxf-alignment-report.html           # Workflow report

├── nxf-alignment-execution-summary.txt # Workflow execution summary

└── variants-annotation-report.html     # Variants annotation report

```

## Input Files

#### BED File (Optional)

Tab-separated file defining target regions:

```

chr1    1000    5000    GENE_A

chr1    8000    12000   GENE_B

```

If not provided, the workflow auto-generates a BED file covering the entire reference.

#### Sample Sheet (Barcoded Runs)

CSV with minimum columns `sample` and `barcode`:

```

sample,barcode

sample_1,barcode01

sample_2,barcode02

```

#### Adaptive Sampling Decisions File (Optional)

This file is generated by MinKNOW during an adaptive sampling run, and can be found under `runfolder/adaptive_sampling/AS_decisions.csv`

## Enabling docker GPU Support

If you observe the error "could not select device driver with capabilities gpu", additional docker setup is required. The `nvidia-container-toolkit` has to be installed and running on your system. See [here](https://epi2me.nanoporetech.com/epi2me-docs/installation/) and [here](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) for details. 

## Citation

If you use this workflow, please cite:

- **Dorado**: https://github.com/nanoporetech/dorado

- **Bedtools**: Quinlan & Hall, 2010

- **Nextflow**: Di Tommaso et al., 2017
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/angelovangel/nxf-alignment

Awesome Lists containing this project

README