https://github.com/nbisweden/pipelines-nextflow
A set of workflows written in Nextflow for Genome Annotation.
https://github.com/nbisweden/pipelines-nextflow
genome-annotation nextflow workflow
Last synced: 11 months ago
JSON representation
A set of workflows written in Nextflow for Genome Annotation.
- Host: GitHub
- URL: https://github.com/nbisweden/pipelines-nextflow
- Owner: NBISweden
- License: gpl-3.0
- Created: 2020-01-31T09:29:13.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-07-01T13:23:26.000Z (almost 2 years ago)
- Last Synced: 2025-04-12T09:46:45.356Z (about 1 year ago)
- Topics: genome-annotation, nextflow, workflow
- Language: Nextflow
- Homepage:
- Size: 338 KB
- Stars: 45
- Watchers: 36
- Forks: 18
- Open Issues: 20
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# NBIS Annotation service pipelines
## Table of Contents
* [Overview](#overview)
* [Citation](#citation)
* [Installation and Usage](#installation-and-usage)
## Overview
This Nextflow workflow is a compilation of several subworkflows for different stages of
genome annotation. Specifically:
* [Annotation preprocessing](./subworkflows/annotation_preprocessing/README.md)
* [Transcript assembly](./subworkflows/transcript_assembly/README.md)
* [Abinitio Training](./subworkflows/abinitio_training/README.md)
* [Functional annotation](./subworkflows/functional_annotation/README.md)
where the overall genome annotation process is:
```mermaid
graph TD
preprocessing[Annotation Preprocessing] --> evidenceAlignment[Evidence alignment]
transcriptAssembly[Transcript Assembly] --> evidenceAlignment
evidenceAlignment --> evidenceMaker[Evidence-based Maker]
denovoRepeatLibrary[De novo Repeat Library] ---> evidenceMaker
transcriptAssembly --> pasa[PASA]
preprocessing --> pasa
pasa --> evidenceMaker
evidenceMaker --> abinitioTraining[Abinitio Training]
abinitioTraining --> abinitioMaker[Abinitio-based Maker]
evidenceMaker --> abinitioMaker
pasa --> functionalAnnotation[Functional Annotation]
abinitioMaker --> functionalAnnotation
functionalAnnotation --> EMBLmyGFF3
```
The subworkflow is selected using the `subworkflow` parameter.
## Citation
If you use these pipelines in your work, please acknowledge NBIS within your
communication according to this example: "Support by NBIS (National Bioinformatics
Infrastructure Sweden) is gratefully acknowledged."
[](https://doi.org/10.5281/zenodo.5195586)
### Acknowledgments
These workflows were based on the Bpipe workflows written by
Marc Höppner (\@marchoeppner) and Jacques Dainat (\@Juke34).
Thank you to everyone who contributes to this project.
### Maintainers
* Mahesh Binzer-Panchal (\@mahesh-panchal)
* *Expertise*: Nextflow workflow development
* Jacques Dainat (\@Juke34)
* *Expertise*: Genome annotation, Nextflow workflow development
* Lucile Soler (\@LucileSol)
* *Expertise*: Genome Annotation
## Installation and Usage
Requirements:
* Nextflow
* A container platform (recommended) such as Singularity or Docker, or the
conda/mamba package manager if a container platform is not available.
If containers or conda/mamba are unavailable, then tool dependencies
must be accessible from your `PATH`.
### Nextflow
Install Nextflow directly:
```bash
curl -s https://get.nextflow.io | bash
mv ./nextflow ~/bin
```
Alternatively, installation can be managed with conda (or mamba) in it's own conda environment:
```bash
conda create -c conda-forge -c bioconda -n nextflow-env nextflow
conda activate nextflow-env
```
See [Nextflow: Get started - installation](https://www.nextflow.io/docs/latest/getstarted.html#installation) for further details.
### General Usage
A workflow is run in the following way:
```bash
nextflow run NBISweden/pipelines-nextflow \
[-profile [,,...] ] \
[-c workflow.config ] \
[-resume] \
-params-file workflow_parameters.yml
```
where `-profile` selects from a predefined profile (select here for [available profiles](#profiles)),
`-c workflow.config` loads a custom configuration for altering existing process settings (defined
in `nextflow.config` - loaded by default, such as the
number of cpus, time allocation, memory, output prefixes and tool command-line options ). The
`-params-file` is a YAML formatted file listing workflow parameters, e.g.
```yaml
subworkflow: 'annotation_preprocessing'
genome: '/path/to/genome'
busco_lineage:
- 'eukaryota_odb10'
- 'bacteria_odb10'
outdir: '/path/to/save/results'
```
> **Note**
> If running on a compute cluster infrastructure, `nextflow` must be able to communicate
> with the workload manager at all times, otherwise tasks will be cancelled.
> The best way to do this is to run `nextflow` using a `screen` or `tmux`
> terminal.
>
> E.g. Screen
>
> ```bash
> # Open a named screen terminal session
> screen -S my_nextflow_run
> # load nextflow with conda
> conda activate nextflow-env
> # run nextflow
> nextflow run -c -profile
> # "Detach" screen terminal
>
> # list screen sessions
> screen -ls
> # "Attach" screen session
> screen -r my_nextflow_run
> ```
#### Profiles
* **uppmax**: A profile for the Uppmax clusters. Tasks are submitted to the SLURM workload manager,
executed within Singularity (unless otherwise noted), and use the `$SNIC_TMP` scratch space.
*Note*: The workflow parameter `project` is manadatory when using Uppmax clusters.
* **conda**: A general purpose profile that uses conda to manage software dependencies.
* **mamba**: A general purpose profile that uses mamba to manage software dependencies.
* **docker**: A general purpose profile that uses docker to manage software dependencies.
* **singularity**: A general purpose profile that uses singularity to manage software dependencies.
* **nbis**: A profile for the NBIS annotation cluster. Tasks are submitted to the SLURM workload
manager, and use the disk space `/scratch` for task execution. Software should be managed using one
of the general purpose profiles above.
* **gitpod**: A profile to set local executor settings in the Gitpod environment.
* **test**: A profile supplying test data to check if the workflows run on your system.
* **pipeline_report**: Adds a folder in the `outdir` which include workflow execution reports.
##### Uppmax profile good practices
> **Note**
>
> Nextflow is enabled using the module system on Uppmax.
>
> ```bash
> module load bioinfo-tools Nextflow
> ```
>
> The following configuration in your `workflow.config` is recommended when running workflows on Uppmax.
>
> ```nextflow
> // Set your work directory to a folder in your project directory under nobackup
> workDir = '/proj//nobackup/work'
> // Restart workflows from last successful execution (i.e. use cached results where possible).
> resume = true
> // Add any overriding process directives here, e.g.,
> process {
> withName: 'BLAST_BLASTN' {
> cpus = 12
> time = 2.d
> }
> }
> ```
##### NBIS profile good practices
> **Note**
>
> Both singularity and conda are installed, however singularity is
> preferred for speed and reproducibility.
>
> ```bash
> module load Singularity
> ```
>
> The following configuration in your `workflow.config` is recommended when running workflows on
> the annotation cluster.
>
> ```nextflow
> // Set your work directory to a folder on the /active partition
> workDir = '/active//nobackup/work'
> // Restart workflows from last successful execution (i.e. use cached results where possible).
> resume = true
> // Add any overriding process directives here, e.g.,
> process {
> withName: 'BLAST_BLASTN' {
> cpus = 12
> time = 2.d
> }
> }
> // Use a shared cache folder singularity images
> singularity.cacheDir = '/active/nxf_singularity_cachedir'
> // If using conda, use a shared cache for conda environments
> conda.cacheDir = '/active/nxf_conda_cachedir'
> // Use mamba for speed over conda
> conda.useMamba = true
> ```
>
> Project results should be published to `/projects`, work directories should be on
> `/active`, while computations are performed on the local `/scratch` partitions.