https://github.com/sequana/nanomerge

Pipeline utility to perform merge of barcoded nanopore runs
https://github.com/sequana/nanomerge

Last synced: 3 months ago
JSON representation

Pipeline utility to perform merge of barcoded nanopore runs

Host: GitHub
URL: https://github.com/sequana/nanomerge
Owner: sequana
License: bsd-3-clause
Created: 2023-03-17T16:06:20.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2023-12-20T09:19:18.000Z (over 2 years ago)
Last Synced: 2025-10-09T05:38:54.745Z (8 months ago)
Language: Python
Size: 323 KB
Stars: 2
Watchers: 1
Forks: 3
Open Issues: 0
Metadata Files:
- Readme: README.rst
- License: LICENSE

Awesome Lists containing this project

README

          
.. image:: https://badge.fury.io/py/sequana-nanomerge.svg

     :target: https://pypi.python.org/pypi/sequana_nanomerge

.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg

    :target: http://joss.theoj.org/papers/10.21105/joss.00352

    :alt: JOSS (journal of open source software) DOI

.. image:: https://github.com/sequana/nanomerge/actions/workflows/main.yml/badge.svg

   :target: https://github.com/sequana/nanomerge/actions/workflows

.. image:: https://coveralls.io/repos/github/sequana/nanomerge/badge.svg?branch=main

   :target: https://coveralls.io/github/sequana/nanomerge?branch=main

.. image:: http://joss.theoj.org/papers/10.21105/joss.00352/status.svg

   :target: http://joss.theoj.org/papers/10.21105/joss.00352

   :alt: JOSS (journal of open source software) DOI

.. image:: https://img.shields.io/badge/python-3.8%20%7C%203.9%20%7C3.10-blue.svg

    :target: https://pypi.python.org/pypi/sequana

    :alt: Python 3.8 | 3.9 | 3.10

This is is the **nanomerge** pipeline from the `Sequana `_ project

:Overview: merge fastq files generated by Nanopore run and generates raw data QC.

:Input: individual fastq files generated by nanopore demultiplexing

:Output: merged fastq files for each barcode (or unique sample)

:Status: production

:Citation: Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352

Installation

~~~~~~~~~~~~

You can install the packages using pip::

    pip install sequana_nanomerge --upgrade

An optional requirements is pycoQC, which can be install with conda/mamba using e.g.::

    conda install pycoQC

you will also need graphviz installed.

Usage

~~~~~

::

    sequana_nanomerge --help

If you data is barcoded, they are usually in sub-directories barcoded/barcodeXY so you will need to use a pattern

(--input-pattern) such as `*/*.gz`::

    sequana_nanomerge --input-directory DATAPATH/barcoded --samplesheet samplesheet.csv

        --summary summary.txt --input-pattern '*/*fastq.gz'

otherwise all fastq files are in DATAPATH/ so the input pattern can just be `*.fastq.gz`::

    sequana_nanomerge --input-directory DATAPATH --samplesheet samplesheet.csv

        --summary summary.txt --input-pattern '*fastq.gz'

The --summary is optional and takes as input the output of albacore/guppy demultiplexing. usually a file called sequencing_summary.txt

Note that the different between the two is the extra `*/` before the `*.fastq.gz` pattern since barcoded files are in individual subdirectories.

In both bases, the command creates a directory with the pipeline and configuration file. You will then need to execute the pipeline::

    cd nanomerge

    sh nanomerge.sh  # for a local run

This launch a snakemake pipeline. If you are familiar with snakemake, you can 

retrieve the pipeline itself and its configuration files and then execute the pipeline yourself with specific parameters::

    snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt

Or use `sequanix `_ interface.

Concerning the sample sheet, whether your data is barcoded or not, it should be a CSV file ::

    barcode,project,sample

    barcode01,main,A

    barcode02,main,B

    barcode03,main,C

For a non-barcoded run, you must provide a file where the barcode column can be set (empty)::

    barcode,project,sample

    ,main,A

or just removed::

    project,sample

    main,A

Usage with apptainer:

~~~~~~~~~~~~~~~~~~~~~~~~~

With apptainer, initiate the working directory as follows::

    sequana_nanomerge --use-apptainer

Images are downloaded in the working directory but you can store then in a directory globally (e.g.)::

    sequana_nanomerge --use-apptainer --apptainer-prefix ~/.sequana/apptainers

and then::

    cd nanomerge

    sh nanomerge.sh

if you decide to use snakemake manually, do not forget to add apptainer options::

    snakemake -s nanomerge.rules -c config.yaml --cores 4 --stats stats.txt --use-apptainer --apptainer-prefix ~/.sequana/apptainers --apptainer-args "-B /home:/home"

Requirements

~~~~~~~~~~~~

This pipelines requires the following executable(s), which is optional:

- pycoQC

- dot

.. image:: https://raw.githubusercontent.com/sequana/nanomerge/main/sequana_pipelines/nanomerge/dag.png

Details

~~~~~~~~~

This pipeline runs **nanomerge** in parallel on the input fastq files (paired or not). 

A brief sequana summary report is also produced.

Rules and configuration details

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here is the `latest documented configuration file `_

to be used with the pipeline. Each rule used in the pipeline may have a section in the configuration file. 

Changelog

~~~~~~~~~

========= ====================================================================

Version   Description

========= ====================================================================

1.5.1     * Fix wrappers tag

1.5.0     * refactoring to use Click

1.4.0     * sub sampling was biased in v1.3.0. Using stratified sampling to 

            correcly sample large file. Also set a --promethion option that

            auomatically sub sample 10% of the data

          * add summary table

1.3.0     * handle large promethium run by using a sub sample of the 

            sequencing summary file (--sample of pycoQC still loads the entire

            file in memory)

1.2.0     * handle large promethium run by using find+cat instead of just 

            cat to cope with very large number of input files.

1.1.0     * add subsample option and set to 1,000,000 reads to handle large 

            runs such as promethion

1.0.1     * CSV can now handle sample or samplename column name in samplesheet.

          * Fix the pyco file paths, update requirements and doc

1.0.0     Stable release ready for production

0.0.1     **First release.**

========= ====================================================================

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/sequana/nanomerge

Awesome Lists containing this project

README