https://github.com/nanoporetech/isonclust2
A tool for de novo clustering of long transcriptomic reads
https://github.com/nanoporetech/isonclust2
cdna rna rna-seq transcriptomics
Last synced: 6 months ago
JSON representation
A tool for de novo clustering of long transcriptomic reads
- Host: GitHub
- URL: https://github.com/nanoporetech/isonclust2
- Owner: nanoporetech
- License: other
- Created: 2019-04-05T12:37:31.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2022-10-02T14:36:23.000Z (about 3 years ago)
- Last Synced: 2025-04-06T08:02:14.560Z (6 months ago)
- Topics: cdna, rna, rna-seq, transcriptomics
- Language: C++
- Homepage:
- Size: 643 KB
- Stars: 15
- Watchers: 15
- Forks: 3
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README

-----------------------------
isONclust2 - a tool for de novo clustering of long transcriptomic reads
=======================================================================[](https://anaconda.org/bioconda/isonclust2) [](https://circleci.com/gh/nanoporetech/isONclust2)
`isONclust2` is a tool for clustering long transcriptomic reads into gene families.
The tool is based on the approach pioneered by [isONclust](https://github.com/ksahlin/isONclust), using minimizers and occasional pairwise alignment.`isONclust2` is implemented in C++, which makes it fast enough to cluster large transcriptomic datasets produced on PromethION P24 and P48 devices. The tool is not a re-implementation of the original `isONclust` approach, as it deals with the strandedness of the reads and provides further optional features.
**WARNING: In order to be able to handle large datasets, `isONclust2` splits the input data into batches which have to be processed in a specified order to obtain the results. Hence, the use of `isONclust2` as a standalone tool is highly discouraged and one should always use it through the de novo transcriptomics pipeline at [https://github.com/nanoporetech/pipeline-nanopore-denovo-isoforms](https://github.com/nanoporetech/pipeline-nanopore-denovo-isoforms).**
Getting Started
===============## Installation
The best way to install `isONclust2` is from bioconda:
- Make sure you have [miniconda3](https://docs.conda.io/en/latest/miniconda.html) installed.
- Install the tool by issuing `conda install -c bioconda isonclust2`## Compiling from source
- Clone the repository: `git clone --recursive https://github.com/nanoporetech/isONclust2.git`
- Make sure you have cmake v3.1 or later.
- Issue `cd isONclust2; mkdir build; cd build; cmake ..; make -j`
- The produced binary is static with no library dependencies. Link this under your path to use the tool.## Usage
### Help message
```
isONclust2 version: v2.3-a0e5b32
Available subcommands: sort, cluster, dump, info, help, versionsort - sort reads and write out batches:
-B --batch-size Batch size in kilobases (default: 50000)
-M --batch-max-seq Maximum number of sequences per batch (default: 3000).
-k --kmer-size Kmer size (default: 11).
-w --window-size Window size (default: 15).
-m --min-shared Minimum number of minimizers shared between read and cluster (default: 5).
-q --min-qual Minimum average quality value (default: 7.0).
-x --mode Clustering mode:
* sahlin (default): use minimizers first, alignment second
* fast: use minimizers only
* furious: always use alignment
-g --low-cons-size Use all sequences for consensus below this size (default: 20).
-c --max-cons-size Maximum number of sequences used for consensus (default: 150).
-P --cons-period Do not recalculate consensus after this many seuqences added (default: 500).
-r --mapped-threshold Minmum mapped fraction of read to be included in cluster (default: 0.65).
-a --aligned-threshold Minimum aligned fraction of read to be included in cluster (default: 0.2).
-f --min-fraction Minimum fraction of minimizers shared compared to best hit, in order to continue mapping (default: 0.8).
-p --min-prob-no-hits Minimum probability for i consecutive minimizers to be different between read and representative (default: 0.1)
-F --min-cls-size Skip clusters smaller than this in the left batch (default: 3).
-o --outfolder Output folder (default: ./isONclust2_batches).
-h --help Print help.
-v --verbose Verbose output.
-d --debug Print debug info.
[positional argument] Input fastq file (required).cluster - cluster and/or merge batches:
-l --left-batch Left input batch (mandatory).
-r --right-batch Right input batch (optional).
-o --outfile Output batch.
-x --mode Clustering mode:
* sahlin (default): use minimizers first, alignment second
* fast: use minimizers only
* furious: use alignment only
-A --spoa-algo spoa alignment algorithm:
* 0 (default): local
* 1 : global
* 1 : semi-global
-z --min-purge Purge minimizer database from output batch.
-j --keep-seq Do not purge non-representative sequences from output batches.
-F --min-cls-size Skip clusters smaller than this in the left batch.
-v --verbose Verbose output.
-Q --quiet Supress progress bar.
-d --debug Print debug info.
-h --help Print help.dump - dump clustered batch:
-o --outdir Output directory.
-i --index Index of sorted reads.
-v --verbose Verbose output.
-d --debug Print debug info.
-h --help Print help.info:
-h --help Print help.
[positional argument] Input serialized batch (required).help - print help message
version - print version
```### A minimal example
```bash
# sort reads and write out batches:
isONclust2 sort -B 50000 -v ens500.fq
# initial clustering of individual batches:
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_0.cer -o b0.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_1.cer -o b1.cer
isONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_2.cer -o b1.cer
# merge cluster batches:
isONclust2 cluster -v -l b0.cer -r b1.cer -o b_0_1.cer
isONclust2 cluster -v -l b_0_1.cer -r b2.cer -o b_0_1_2.cer
# dump final results:
isONclust2 dump -v -i sorted/sorted_reads_idx.cer -o results b_0_1_2.cer
```Help
====## Acknowledgements
This software was built in collaboration with [Kristoffer Sahlin](https://www.scilifelab.se/researchers/kristoffer-sahlin/) and [Paul Medvedev](http://medvedevgroup.com/).
## Licence and Copyright
(c) 2020 Oxford Nanopore Technologies Ltd.
This Source Code Form is subject to the terms of the Mozilla Public
License, v. 2.0. If a copy of the MPL was not distributed with this
file, You can obtain one at http://mozilla.org/MPL/2.0/.## FAQs and tips
## References and Supporting Information
See the post announcing the transcriptomics tools at the Nanopore Community [here](https://community.nanoporetech.com/posts/new-transcriptomics-analys).
## Research Release
Research releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.