An open API service indexing awesome lists of open source software.

https://github.com/sequana/nanomerge

Pipeline utility to perform merge of barcoded nanopore runs
https://github.com/sequana/nanomerge

Last synced: 3 months ago
JSON representation

Pipeline utility to perform merge of barcoded nanopore runs

Awesome Lists containing this project

README

          

.. image:: https://badge.fury.io/py/sequana-nanomerge.svg
:target: https://pypi.python.org/pypi/sequana_nanomerge

.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg
:target: http://joss.theoj.org/papers/10.21105/joss.00352
:alt: JOSS (journal of open source software) DOI

.. image:: https://github.com/sequana/nanomerge/actions/workflows/main.yml/badge.svg
:target: https://github.com/sequana/nanomerge/actions/workflows

.. image:: https://coveralls.io/repos/github/sequana/nanomerge/badge.svg?branch=main
  :target: https://coveralls.io/github/sequana/nanomerge?branch=main

.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg
:target: http://joss.theoj.org/papers/10.21105/joss.00352
:alt: JOSS (journal of open source software) DOI

.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg
:target: https://pypi.python.org/pypi/sequana
:alt: Python 3.8 | 3.9 | 3.10

This is is the **nanomerge** pipeline from the `Sequana `_ project

:Overview: merge fastq files generated by Nanopore run and generates raw data QC.
:Input: individual fastq files generated by nanopore demultiplexing
:Output: merged fastq files for each barcode (or unique sample)
:Status: production
:Citation: Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352

Installation
~~~~~~~~~~~~

You can install the packages using pip::

pip install sequana_nanomerge --upgrade

An optional requirements is pycoQC, which can be install with conda/mamba using e.g.::

conda install pycoQC

you will also need graphviz installed.

Usage
~~~~~

::

sequana_nanomerge --help

If you data is barcoded, they are usually in sub-directories barcoded/barcodeXY so you will need to use a pattern
(--input-pattern) such as `*/*.gz`::

sequana_nanomerge --input-directory DATAPATH/barcoded --samplesheet samplesheet.csv
--summary summary.txt --input-pattern '*/*fastq.gz'

otherwise all fastq files are in DATAPATH/ so the input pattern can just be `*.fastq.gz`::

sequana_nanomerge --input-directory DATAPATH --samplesheet samplesheet.csv
--summary summary.txt --input-pattern '*fastq.gz'

The --summary is optional and takes as input the output of albacore/guppy demultiplexing. usually a file called sequencing_summary.txt

Note that the different between the two is the extra `*/` before the `*.fastq.gz` pattern since barcoded files are in individual subdirectories.

In both bases, the command creates a directory with the pipeline and configuration file. You will then need to execute the pipeline::

cd nanomerge
sh nanomerge.sh # for a local run

This launch a snakemake pipeline. If you are familiar with snakemake, you can
retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::

snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt

Or use `sequanix `_ interface.

Concerning the sample sheet, whether your data is barcoded or not, it should be a CSV file ::

barcode,project,sample
barcode01,main,A
barcode02,main,B
barcode03,main,C

For a non-barcoded run, you must provide a file where the barcode column can be set (empty)::

barcode,project,sample
,main,A

or just removed::

project,sample
main,A

Usage with apptainer:
~~~~~~~~~~~~~~~~~~~~~~~~~

With apptainer, initiate the working directory as follows::

sequana_nanomerge --use-apptainer

Images are downloaded in the working directory but you can store then in a directory globally (e.g.)::

sequana_nanomerge --use-apptainer --apptainer-prefix ~/.sequana/apptainers

and then::

cd nanomerge
sh nanomerge.sh

if you decide to use snakemake manually, do not forget to add apptainer options::

snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt --use-apptainer --apptainer-prefix ~/.sequana/apptainers --apptainer-args "-B /home:/home"

Requirements
~~~~~~~~~~~~

This pipelines requires the following executable(s), which is optional:

- pycoQC
- dot

.. image:: https://raw.githubusercontent.com/sequana/nanomerge/main/sequana_pipelines/nanomerge/dag.png

Details
~~~~~~~~~

This pipeline runs **nanomerge** in parallel on the input fastq files (paired or not).
A brief sequana summary report is also produced.

Rules and configuration details
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here is the `latest documented configuration file `_
to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file.

Changelog
~~~~~~~~~

========= ====================================================================
Version Description
========= ====================================================================
1.5.1 * Fix wrappers tag
1.5.0 * refactoring to use Click
1.4.0 * sub sampling was biased in v1.3.0. Using stratified sampling to
correcly sample large file. Also set a --promethion option that
auomatically sub sample 10% of the data
* add summary table
1.3.0 * handle large promethium run by using a sub sample of the
sequencing summary file (--sample of pycoQC still loads the entire
file in memory)
1.2.0 * handle large promethium run by using find+cat instead of just
cat to cope with very large number of input files.
1.1.0 * add subsample option and set to 1,000,000 reads to handle large
runs such as promethion
1.0.1 * CSV can now handle sample or samplename column name in samplesheet.
* Fix the pyco file paths, update requirements and doc
1.0.0 Stable release ready for production
0.0.1 **First release.**
========= ====================================================================