{"id":25596861,"url":"https://github.com/nanoporetech/isonclust2","last_synced_at":"2025-04-13T02:31:22.150Z","repository":{"id":85586371,"uuid":"179678588","full_name":"nanoporetech/isONclust2","owner":"nanoporetech","description":"A tool for de novo clustering of long transcriptomic reads","archived":false,"fork":false,"pushed_at":"2022-10-02T14:36:23.000Z","size":658,"stargazers_count":15,"open_issues_count":3,"forks_count":3,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-04-06T08:02:14.560Z","etag":null,"topics":["cdna","rna","rna-seq","transcriptomics"],"latest_commit_sha":null,"homepage":"","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nanoporetech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-04-05T12:37:31.000Z","updated_at":"2025-02-10T14:46:19.000Z","dependencies_parsed_at":"2023-03-13T05:57:05.966Z","dependency_job_id":null,"html_url":"https://github.com/nanoporetech/isONclust2","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nanoporetech%2FisONclust2","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nanoporetech%2FisONclust2/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nanoporetech%2FisONclust2/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nanoporetech%2FisONclust2/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nanoporetech","download_url":"https://codeload.github.com/nanoporetech/isONclust2/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248657795,"owners_count":21140841,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cdna","rna","rna-seq","transcriptomics"],"created_at":"2025-02-21T12:34:55.431Z","updated_at":"2025-04-13T02:31:22.143Z","avatar_url":"https://github.com/nanoporetech.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"![ONT_logo](/ONT_logo.png)\n\n-----------------------------\n\nisONclust2 - a tool for de novo clustering of long transcriptomic reads\n=======================================================================\n\n[![install with bioconda](https://anaconda.org/bioconda/isonclust2/badges/installer/conda.svg)](https://anaconda.org/bioconda/isonclust2) [![CircleCI](https://circleci.com/gh/nanoporetech/isONclust2.svg?style=svg)](https://circleci.com/gh/nanoporetech/isONclust2)\n\n`isONclust2` is a tool for clustering long transcriptomic reads into gene families.\nThe tool is based on the approach pioneered by [isONclust](https://github.com/ksahlin/isONclust), using minimizers and occasional pairwise alignment.\n\n`isONclust2` is implemented in C++, which makes it fast enough to cluster large transcriptomic datasets produced on PromethION P24 and P48 devices. The tool is not a re-implementation of the original `isONclust` approach, as it deals with the strandedness of the reads and provides further optional features. \n\n**WARNING: In order to be able to handle large datasets, `isONclust2` splits the input data into batches which have to be processed in a specified order to obtain the results. Hence, the use of `isONclust2` as a standalone tool is highly discouraged and one should always use it through the de novo transcriptomics pipeline at [https://github.com/nanoporetech/pipeline-nanopore-denovo-isoforms](https://github.com/nanoporetech/pipeline-nanopore-denovo-isoforms).**\n\nGetting Started\n===============\n\n## Installation\n\nThe best way to install `isONclust2` is from bioconda:\n\n- Make sure you have [miniconda3](https://docs.conda.io/en/latest/miniconda.html) installed.\n- Install the tool by issuing `conda install -c bioconda isonclust2`\n\n## Compiling from source\n\n- Clone the repository: `git clone --recursive https://github.com/nanoporetech/isONclust2.git`\n- Make sure you have cmake v3.1 or later.\n- Issue `cd isONclust2; mkdir build; cd build; cmake ..; make -j`\n- The produced binary is static with no library dependencies. Link this under your path to use the tool.\n\n## Usage\n\n### Help message\n\n```\nisONclust2 version: v2.3-a0e5b32\nAvailable subcommands: sort, cluster, dump, info, help, version\n\nsort - sort reads and write out batches:\n        -B --batch-size        Batch size in kilobases (default: 50000)\n        -M --batch-max-seq     Maximum number of sequences per batch (default: 3000).\n        -k --kmer-size         Kmer size (default: 11).\n        -w --window-size       Window size (default: 15).\n        -m --min-shared        Minimum number of minimizers shared between read and cluster (default: 5).\n        -q --min-qual          Minimum average quality value (default: 7.0).\n        -x --mode  Clustering mode:\n                   * sahlin (default): use minimizers first, alignment second\n                   * fast: use minimizers only\n                   * furious: always use alignment\n        -g --low-cons-size     Use all sequences for consensus below this size (default: 20).\n        -c --max-cons-size     Maximum number of sequences used for consensus (default: 150).\n        -P --cons-period       Do not recalculate consensus after this many seuqences added (default: 500).\n        -r --mapped-threshold  Minmum mapped fraction of read to be     included in cluster (default: 0.65).\n        -a --aligned-threshold Minimum aligned fraction of read to be included in cluster (default: 0.2).\n        -f --min-fraction      Minimum fraction of minimizers shared compared to best hit, in order to continue mapping (default: 0.8).\n        -p --min-prob-no-hits  Minimum probability for i consecutive    minimizers to be different between read and representative (default: 0.1)\n        -F --min-cls-size      Skip clusters smaller than this in the left batch (default: 3).\n        -o --outfolder         Output folder (default:  ./isONclust2_batches).\n        -h --help              Print help.\n        -v --verbose           Verbose output.\n        -d --debug             Print debug info.\n        [positional argument]  Input fastq file (required).\n\ncluster - cluster and/or merge batches:\n        -l --left-batch        Left input batch (mandatory).\n        -r --right-batch       Right input batch (optional).\n        -o --outfile           Output batch.\n        -x --mode  Clustering mode:\n                   * sahlin (default): use minimizers first, alignment second\n                   * fast: use minimizers only\n                   * furious: use alignment only\n        -A --spoa-algo  spoa alignment algorithm:\n                   * 0 (default): local\n                   * 1 : global\n                   * 1 : semi-global\n        -z --min-purge         Purge minimizer database from output batch.\n        -j --keep-seq          Do not purge non-representative sequences from output batches.\n        -F --min-cls-size      Skip clusters smaller than this in the left batch.\n        -v --verbose           Verbose output.\n        -Q --quiet             Supress progress bar.\n        -d --debug             Print debug info.\n        -h --help              Print help.\n\ndump - dump clustered batch:\n        -o --outdir            Output directory.\n        -i --index             Index of sorted reads.\n        -v --verbose           Verbose output.\n        -d --debug             Print debug info.\n        -h --help              Print help.\n\ninfo:\n        -h --help              Print help.\n        [positional argument]  Input serialized batch (required).\n\nhelp - print help message\n\nversion - print version\n```\n\n### A minimal example\n\n```bash\n# sort reads and write out batches:\nisONclust2 sort -B 50000 -v ens500.fq\n# initial clustering of individual batches:\nisONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_0.cer -o b0.cer\nisONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_1.cer -o b1.cer\nisONclust2 cluster -v -l isONclust2_batches/sorted/batches/isONbatch_2.cer -o b1.cer\n# merge cluster batches:\nisONclust2 cluster -v -l b0.cer -r b1.cer -o b_0_1.cer\nisONclust2 cluster -v -l b_0_1.cer -r b2.cer -o b_0_1_2.cer\n# dump final results:\nisONclust2 dump -v -i sorted/sorted_reads_idx.cer -o results b_0_1_2.cer\n```\n\nHelp\n====\n\n## Acknowledgements\n\nThis software was built in collaboration with [Kristoffer Sahlin](https://www.scilifelab.se/researchers/kristoffer-sahlin/) and [Paul Medvedev](http://medvedevgroup.com/).\n\n## Licence and Copyright\n\n(c) 2020 Oxford Nanopore Technologies Ltd.\n\nThis Source Code Form is subject to the terms of the Mozilla Public\nLicense, v. 2.0. If a copy of the MPL was not distributed with this\nfile, You can obtain one at http://mozilla.org/MPL/2.0/.\n\n## FAQs and tips\n\n\n## References and Supporting Information\n\nSee the post announcing the transcriptomics tools at the Nanopore Community [here](https://community.nanoporetech.com/posts/new-transcriptomics-analys).\n\n## Research Release\n\nResearch releases are provided as technology demonstrators to provide early access to features or stimulate Community development of tools. Support for this software will be minimal and is only provided directly by the developers. Feature requests, improvements, and discussions are welcome and can be implemented by forking and pull requests. However much as we would like to rectify every issue and piece of feedback users may have, the developers may have limited resource for support of this software. Research releases may be unstable and subject to rapid iteration by Oxford Nanopore Technologies.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnanoporetech%2Fisonclust2","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnanoporetech%2Fisonclust2","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnanoporetech%2Fisonclust2/lists"}