https://github.com/nanoporetech/isonclust2

A tool for de novo clustering of long transcriptomic reads
https://github.com/nanoporetech/isonclust2

cdna rna rna-seq transcriptomics

Last synced: 6 months ago
JSON representation

A tool for de novo clustering of long transcriptomic reads

Host: GitHub
URL: https://github.com/nanoporetech/isonclust2
Owner: nanoporetech
License: other
Created: 2019-04-05T12:37:31.000Z (over 6 years ago)
Default Branch: master
Last Pushed: 2022-10-02T14:36:23.000Z (about 3 years ago)
Last Synced: 2025-04-06T08:02:14.560Z (6 months ago)
Topics: cdna, rna, rna-seq, transcriptomics
Language: C++
Homepage:
Size: 643 KB
Stars: 15
Watchers: 15
Forks: 3
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

          ![ONT_logo](/ONT_logo.png)

-----------------------------

isONclust2 - a tool for de novo clustering of long transcriptomic reads

=======================================================================

[![install with bioconda](https://anaconda.org/bioconda/isonclust2/badges/installer/conda.svg)](https://anaconda.org/bioconda/isonclust2) [![CircleCI](https://circleci.com/gh/nanoporetech/isONclust2.svg?style=svg)](https://circleci.com/gh/nanoporetech/isONclust2)

`isONclust2` is a tool for clustering long transcriptomic reads into gene families.

The tool is based on the approach pioneered by [isONclust](https://github.com/ksahlin/isONclust), using minimizers and occasional pairwise alignment.

`isONclust2` is implemented in C++, which makes it fast enough to cluster large transcriptomic datasets produced on PromethION P24 and P48 devices. The tool is not a re-implementation of the original `isONclust` approach, as it deals with the strandedness of the reads and provides further optional features. 

**WARNING: In order to be able to handle large datasets, `isONclust2` splits the input data into batches which have to be processed in a specified order to obtain the results. Hence, the use of `isONclust2` as a standalone tool is highly discouraged and one should always use it through the de novo transcriptomics pipeline at [https://github.com/nanoporetech/pipeline-nanopore-denovo-isoforms](https://github.com/nanoporetech/pipeline-nanopore-denovo-isoforms).**

Getting Started

===============

## Installation

The best way to install `isONclust2` is from bioconda:

- Make sure you have [miniconda3](https://docs.conda.io/en/latest/miniconda.html) installed.

- Install the tool by issuing `conda install -c bioconda isonclust2`

## Compiling from source

- Clone the repository: `git clone --recursive https://github.com/nanoporetech/isONclust2.git`

- Make sure you have cmake v3.1 or later.

- Issue `cd isONclust2; mkdir build; cd build; cmake ..; make -j`

- The produced binary is static with no library dependencies. Link this under your path to use the tool.

## Usage

### Help message

```

isONclust2 version: v2.3-a0e5b32

Available subcommands: sort, cluster, dump, info, help, version

sort - sort reads and 
        -B --batch-size 
        -M --batch-max-seq 
        -k --kmer-size 
        -w --window-size 
        -m --min-shared 
        -q --min-qual 
        -x --mode 
 
 
 
        -g --low-cons-size 
        -c --max-cons-size 
        -P --cons-period 
        -r --mapped-threshold 
        -a --aligned-threshold 
        -f --min-fraction 
        -p --min-prob-no-hits 
        -F --min-cls-size 
        -o --outfolder 
        -h --help 
        -v --verbose 
        -d --debug 
        [positional argument]

write out batches: Batch size in kilobases (default: 50000) Maximum number of sequences per batch (default: 3000). Kmer size (default: 11). Window size (default: 15). Minimum number of minimizers shared between read and cluster (default: 5). Minimum average quality value (default: 7.0). Clustering mode: * sahlin (default): use minimizers first, alignment second * fast: use minimizers only * furious: always use alignment Use all sequences for consensus below this size (default: 20). Maximum number of sequences used for consensus (default: 150). Do not recalculate consensus after this many seuqences added (default: 500). Minmum mapped fraction of read to be     included in cluster (default: 0.65). Minimum aligned fraction of read to be included in cluster (default: 0.2). Minimum fraction of minimizers shared compared to best hit, in order to continue mapping (default: 0.8). Minimum probability for i consecutive    minimizers to be different between read and representative (default: 0.1) Skip clusters smaller than this in the left batch (default: 3). Output folder (default:  ./isONclust2_batches). Print help. Verbose output. Print debug info. Input fastq file (required).

cluster - cluster and/or merge batches:

        -l --left-batch        Left input batch (mandatory).

        -r --right-batch       Right input batch (optional).

        -o --outfile           Output batch.

        -x --mode  Clustering mode:

                   * sahlin (default): use minimizers first, alignment second

                   * fast: use minimizers only

                   * furious: use alignment only

        -A --spoa-algo  spoa alignment algorithm:

                   * 0 (default): local

                   * 1 : global

                   * 1 : semi-global

        -z --min-purge         Purge minimizer database from output batch.

        -j --keep-seq          Do not purge non-representative sequences from output batches.

        -F --min-cls-size      Skip clusters smaller than this in the left batch.

        -v --verbose           Verbose output.

        -Q --quiet             Supress progress bar.

        -d --debug             Print debug info.

        -h --help              Print help.

dump - dump clustered batch:

        -o --outdir            Output directory.

        -i --index             Index of sorted reads.

        -v --verbose           Verbose output.

        -d --debug             Print debug info.

        -h --help              Print help.

info:

        -h --help              Print help.

        [positional argument]  Input serialized batch (required).

help - print help message

version - print version

```

### A minimal example

```bash

# sort reads and write out batches:

isONclust2 sort -B 50000 -v ens500.fq

# initial clustering of individual batches:

isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_0.cer -o b0.cer

isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_1.cer -o b1.cer

isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_2.cer -o b1.cer

# merge cluster batches:

isONclust2 cluster -v -l b0.cer -r b1.cer -o b_0_1.cer

isONclust2 cluster -v -l b_0_1.cer -r b2.cer -o b_0_1_2.cer

# dump final results:

isONclust2 dump -v -i sorted/sorted_reads_idx.cer -o results b_0_1_2.cer

```

Help

====

## Acknowledgements

This software was built in collaboration with [Kristoffer Sahlin](https://www.scilifelab.se/researchers/kristoffer-sahlin/) and [Paul Medvedev](http://medvedevgroup.com/).

## Licence and Copyright

(c) 2020 Oxford Nanopore Technologies Ltd.

This Source Code Form is subject to the terms of the Mozilla Public

License, v. 2.0. If a copy of the MPL was not distributed with this

file, You can obtain one at http://mozilla.org/MPL/2.0/.

## FAQs and tips

## References and Supporting Information

See the post announcing the transcriptomics tools at the Nanopore Community [here](https://community.nanoporetech.com/posts/new-transcriptomics-analys).

## Research Release

Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nanoporetech/isonclust2

Awesome Lists containing this project

README