{"id":26575602,"url":"https://github.com/sortmerna/sortmerna","last_synced_at":"2025-04-06T14:11:27.403Z","repository":{"id":15025388,"uuid":"17751144","full_name":"sortmerna/sortmerna","owner":"sortmerna","description":"SortMeRNA: next-generation sequence filtering and alignment tool","archived":false,"fork":false,"pushed_at":"2024-04-13T20:07:47.000Z","size":93218,"stargazers_count":224,"open_issues_count":53,"forks_count":68,"subscribers_count":16,"default_branch":"master","last_synced_at":"2024-04-14T10:49:25.540Z","etag":null,"topics":["alignment","bioinformatics","cpp","metatranscriptomics","ngs","python","sequencing"],"latest_commit_sha":null,"homepage":"https://sortmerna.readthedocs.io","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sortmerna.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null}},"created_at":"2014-03-14T15:42:40.000Z","updated_at":"2024-04-15T12:00:55.930Z","dependencies_parsed_at":"2022-07-16T03:16:15.725Z","dependency_job_id":"8de64341-1f52-43f7-8939-9536f4285892","html_url":"https://github.com/sortmerna/sortmerna","commit_stats":{"total_commits":1175,"total_committers":12,"mean_commits":97.91666666666667,"dds":0.2612765957446809,"last_synced_commit":"30531938729f47167bebaf7f63a0e909e9fbd61e"},"previous_names":["biocore/sortmerna"],"tags_count":22,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sortmerna%2Fsortmerna","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sortmerna%2Fsortmerna/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sortmerna%2Fsortmerna/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sortmerna%2Fsortmerna/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sortmerna","download_url":"https://codeload.github.com/sortmerna/sortmerna/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247492513,"owners_count":20947544,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignment","bioinformatics","cpp","metatranscriptomics","ngs","python","sequencing"],"created_at":"2025-03-23T02:19:44.873Z","updated_at":"2025-04-06T14:11:27.381Z","avatar_url":"https://github.com/sortmerna.png","language":"C++","readme":"# sortmerna\n\nSortMeRNA is a local sequence alignment tool for filtering, mapping and clustering.\n\nThe core algorithm is based on approximate seeds and allows for sensitive analysis of NGS reads.\nThe main application of SortMeRNA is filtering rRNA from metatranscriptomic data.\nSortMeRNA takes as input files of reads (fasta, fastq, fasta.gz, fastq.gz) and one or multiple\nrRNA database file(s), and sorts apart aligned and rejected reads into two files. SortMeRNA works\nwith Illumina, Ion Torrent and PacBio data, and can produce SAM and BLAST-like alignments.\n\nSortMeRNA is also available through [QIIME v1.9.1](http://qiime.org) and\nthe [nf-core RNA-Seq pipeline v.3.9](https://nf-co.re/rnaseq/3.9).\n\n## Table of Contents\n\n- [Getting Started](#getting-started)\n  - [Using Conda package](#using-conda-package)\n  - [Using GitHub release binaries on Linux](#using-github-release-binaries-on-linux)\n  - [Running](#running)\n    - [Execution trace](#execution-trace)\n- [Building from sources](#building-from-sources)\n- [User Manual](#user-manual)\n- [Databases](#databases)\n- [Taxonomies](#taxonomies)\n- [Citation](#citation)\n- [Contributors](#contributors)\n- [Support](#support)\n\n\n## Getting Started\n\nSortMeRNA 4 is C++17 compliant, and mostly uses standard libraries. It uses CMake as the build system, and can be run/built on all major OS including Linux, Windows, and Mac, on AMD64 and ARM64 processors.\n\n### Using Conda package\n\nInstall conda - [official docs](https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)\n```\nwget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh\nbash Miniconda3-latest-Linux-x86_64.sh\n```\nThe conda packages before Sortmerna 4.3.7 were hosted on Bioconda. Starting with 4.3.7 the packages are hosted on conda-forge.\nErroneously an empty 4.3.7 package made its way to Bioconda, and should be ignored until removed (from Bioconda).\n\nCurrently the build on conda-forge still waiting to be merged. Until it is ready, the local installation package can be used:\n```\n# == only for 4.3.7 until ready on conda-forge ==\n# download the conda-build package into a directory of your choice e.g. Downloads/\nwget https://github.com/sortmerna/sortmerna/releases/download/v4.3.7/sortmerna-4.3.7-conda-linux-64.tar.bz2 -P ~/Downloads/\n\n# create a new environment and install SortMeRNA in it\nconda create --name sortmerna\nconda activate sortmerna\nconda install ~/Downloads/sortmerna-4.3.7-conda-linux-64.tar.bz2\n\nwhich sortmerna  # check the installed binary e.g. miniforge3/envs/sortmerna/bin/sortmerna \nsortmerna -h\n```\nFor versions older then 4.3.7 per the [Bioconda guidelines](https://bioconda.github.io), add the following conda channels:\n```\nconda config --add channels defaults\nconda config --add channels bioconda\nconda config --add channels conda-forge\nconda config --set channel_priority strict\n\n\nconda search sortmerna\n  Loading channels: done\n  # Name                       Version           Build  Channel\n  sortmerna                        2.0               0  bioconda\n  ...\n  sortmerna                      4.3.4               0  bioconda\n  ...\n  sortmerna                      4.3.6               0  bioconda\n  ...\n  sortmerna                      4.3.7      hdbdd923_1  bioconda \u003c- (!) ignore - corrupt, see instructions above\n\n# create a new environment and install SortMeRNA in it\nconda create --name sortmerna_env\nconda activate sortmerna_env\nconda install sortmerna\nwhich sortmerna\n  /home/biocodz/miniconda3/envs/sortmerna_env/bin/sortmerna\n\n# test the installation\nsortmerna --version\n  SortMeRNA version 4.3.6\n  Build Date: Aug 16 2022\n  sortmerna_build_git_sha:@db8c1983765f61986b46ee686734749eda235dcc@\n  sortmerna_build_git_date:@2022/08/16 11:42:59@\n\n# view help\nsortmerna -h\n```\n\n### Using GitHub release binaries on Linux\n\nVisit [Sortmerna GitHub Releases](https://github.com/biocore/sortmerna/releases)\n\nLinux distribution is a Shell script with the embedded installation archive.\n\nIssue the following bash commands:\n\n```\npushd ~\n\n# get the distro\nwget https://github.com/biocore/sortmerna/releases/download/v4.3.6/sortmerna-4.3.6-Linux.sh\n\n# view the installer usage\nbash sortmerna-4.3.6-Linux.sh --help\n    Options: [defaults in brackets after descriptions]\n      --help            print this message\n      --version         print cmake installer version\n      --prefix=dir      directory in which to install\n      --include-subdir  include the sortmerna-4.3.6-Linux subdirectory\n      --exclude-subdir  exclude the sortmerna-4.3.6-Linux subdirectory\n      --skip-license    accept license\n\n# run the installer\nbash sortmerna-4.3.6-Linux.sh --skip-license\n  sortmerna Installer Version: 4.3.6, Copyright (c) Clarity Genomics\n  This is a self-extracting archive.\n  The archive will be extracted to: $HOME/sortmerna\n  \n  Using target directory: /home/biocodz/sortmerna\n  Extracting, please wait...\n  \n  Unpacking finished successfully\n\n# check the installed binaries\nls -lrt /home/biocodz/sortmerna/bin/\nsortmerna\n\n# set PATH\nexport PATH=$HOME/sortmerna/bin:$PATH\n\n# test the installation\nsortmerna --version\n  SortMeRNA version 4.3.6\n  Build Date: Jul 17 2021\n  sortmerna_build_git_sha:@921fa40256760ea2d44c49b21eb326afda748d5e@\n  sortmerna_build_git_date:@2022/08/16 10:59:31@\n\n# view help\nsortmerna -h\n```\n\n### Running\n\n* The only required options are `--ref` and `--reads`\n* Options (any) can be specified usig a single dash e.g. `-ref` and `-reads`\n* Both plain `fasta/fastq` and archived `fasta.gz/fastq.gz` files are accepted\n* file extensions `.fastq, .fastq.gz, .fq, .fq.gz, .fasta, ...` are optional. The format and compression are automatically recognized\n* Relative paths are accepted\n\nfor example\n\n```\n# single reference and single reads file\nsortmerna --ref REF_PATH --reads READS_PATH\n\n# for multiple references use multiple '--ref'\nsortmerna --ref REF_PATH_1 --ref REF_PATH_2 --ref REF_PATH_3 --reads READS_PATH\n\n# for paired reads use '--reads' twice\nsortmerna --ref REF_PATH_1 --ref REF_PATH_2 --ref REF_PATH_3 --reads READS_PATH_1 --reads READS_PATH_2\n\n```\n\nMore examples can be found in [test.jinja](https://github.com/biocore/sortmerna/blob/master/scripts/test.jinja) and [run.py](https://github.com/biocore/sortmerna/blob/master/scripts/run.py)\n\n#### Execution trace\n\nHere is a [sample execution trace](https://sortmerna.readthedocs.io/en/latest/trace4.3.2.html).  \n\n`IMPORTANT`\n- Progressing execution trace showing the number of reads processed so far indicates a normally running program. \n- Non-progressing trace means a problem. Please, kill the process (no waiting for two days), and file an issue [here](https://github.com/biocore/sortmerna/issues)  \n- please, provide the execution trace when filing issues.\n\n[Sample execution statistics](https://github.com/biocore/sortmerna/wiki/sample-execution-statistics) are provided to give an idea on what the execution time might be.\n\n## Building from sources\n\n[Build instructions](https://sortmerna.readthedocs.io/en/latest/building.html)\n\n## User Manual\n\nSee [Sortmerna Read The Docs project](https://sortmerna.readthedocs.io/en/latest/index.html).\n\nIn case you need PDF, any modern browser can print web pages to PDF.\n\n## Databases\n\nPlease, use [database.tar.gz](https://github.com/biocore/sortmerna/releases/download/v4.3.4/database.tar.gz) from release 4.3.4.\n\nWe recommend to use smr_v4.3_default_db.fasta.\n\nOriginal source databases (clustering parameters given below):\n* Silva 138 SSURef NR99 (16S, 18S)\n* Silva 132 LSURef (23S, 28S)\n* RFAM v14.1 (5S, 5.8S)\n\nThe difference between the databases is the % ID for clustering the sequences for each kingdom + rRNA component.\n\nSpecifically,\n\n* smr_v4.3_fast_db.fasta\n  * bac-16S 85%, 5S \u0026 5.8S seeds, rest 90% (benchmark accuracy: 99.888%)\n* smr_v4.3_default_db.fasta\n  * bac-16S 90%, 5S \u0026 5.8S seeds, rest 95% (benchmark accuracy: 99.899%)\n* smr_v4.3_sensitive_db.fasta\n  * all 97% (benchmark accuracy: 99.907%)\n* smr_v4.3_sensitive_db_rfam_seeds.fasta\n  * all 97%, except RFAM database which includes the full seed database sequences\n\nThe accuracy (based on sensitivity and selectivity) is very good for all databases, however the \"sensitive\" databases will run at least 2x slower.\n\n## Taxonomies\n\nThe folder `data/rRNA_databases/silva_ids_acc_tax.tar.gz` contains SILVA taxonomy strings (extracted from XML file generated by ARB)\nfor each of the reference sequences in the representative databases. The format of the files is three tab-separated columns,\nthe first being the reference sequence ID, the second being the accession number and the final column is the taxonomy.\n\n## Citation\n\nIf you use SortMeRNA, please cite:\nKopylova E., Noé L. and Touzet H., \"SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data\", Bioinformatics (2012), doi: 10.1093/bioinformatics/bts611.\n\n## Contributors\n\nSee [AUTHORS](./AUTHORS) for a list of contributors to this project.\n\n## Support\n\nFor questions and comments, feel free to file an [issue](https://github.com/sortmerna/sortmerna/issues), or start a [discussion](https://github.com/sortmerna/sortmerna/discussions).\n\t\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsortmerna%2Fsortmerna","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsortmerna%2Fsortmerna","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsortmerna%2Fsortmerna/lists"}