{"id":27303119,"url":"https://github.com/tseemann/sixess","last_synced_at":"2025-04-12T02:50:08.962Z","repository":{"id":146773055,"uuid":"80079696","full_name":"tseemann/sixess","owner":"tseemann","description":"🔬🐛 Rapid 16s rRNA identification from isolate FASTQ files","archived":false,"fork":false,"pushed_at":"2018-04-24T11:49:53.000Z","size":38688,"stargazers_count":23,"open_issues_count":3,"forks_count":2,"subscribers_count":7,"default_branch":"master","last_synced_at":"2024-06-13T00:03:07.697Z","etag":null,"topics":["bacteria","fastq","genomics","species"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tseemann.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2017-01-26T02:34:42.000Z","updated_at":"2023-10-23T19:31:19.000Z","dependencies_parsed_at":null,"dependency_job_id":"e407444e-26de-43f2-aad6-c03c6d93cd1e","html_url":"https://github.com/tseemann/sixess","commit_stats":{"total_commits":37,"total_committers":2,"mean_commits":18.5,"dds":"0.027027027027026973","last_synced_commit":"dfd1c0050b730cb5125f6ebac4b80cd392043aca"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tseemann%2Fsixess","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tseemann%2Fsixess/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tseemann%2Fsixess/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tseemann%2Fsixess/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tseemann","download_url":"https://codeload.github.com/tseemann/sixess/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248509093,"owners_count":21115935,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bacteria","fastq","genomics","species"],"created_at":"2025-04-12T02:50:08.359Z","updated_at":"2025-04-12T02:50:08.948Z","avatar_url":"https://github.com/tseemann.png","language":"Shell","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![Build Status](https://travis-ci.org/tseemann/sixess.svg?branch=master)](https://travis-ci.org/tseemann/sixess) [![License: GPL v3](https://img.shields.io/badge/License-GPL%20v3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0) [](#lang-au)\n\n:warning: **THIS SOFTWARE IS STILL UNDER DEVELOPMENT - USE AT OWN RISK**\n\n# sixess\nRapid 16s rDNA from isolate FASTQ files\n\n## Introduction\n\n`sixess` is a command-line software tool to identify \nbacterial species based on 16S rDNA sequence directly\nfrom WGS FASTQ data. It includes databases from \nNCBI (default), RDP and SILVA.\n\n## Quick start\n\n```\n# just give it sequences!\n% sixess R1.fastq.gz\nStaphylococcus epidermidis\n\n# sometimes there is no match\n% sixess /dev/null\nNo matches\n\n# give it as many sequence files as needed\n% sixess R1.fq R2.fq\nEnterococcus faecium\n\n# we provide different databases you can choose\n% sixess -d RDP contigs.fa\nBacillus cereus\n\n# you can pipe to stdin too\n% bzcat chernobyl.fq.bz2 | sixess -\nDeinococcus radiodurans\n```\n\n## Installation\n\n### Source\n```\ncd $HOME\ngit clone https://github.com/tseemann/sixess\nexport PATH=$HOME/sixess/bin:$PATH\n```\n### Homebrew\n```\nbrew install brewsci/bio/sixess  # COMING SOON\n```\n### Bioconda\n```\nconda install -c bioconda -c conda-forge sixess  # COMING SOON\n```\n\n## Usage\n\n### Input\n\nThe input can be one or more sequence files, or `-` denoting `stdin`.\nThe input data can be FASTQ or FASTA, and may be `.gz` compressed.\nAny read length is accepted, even whole chromosomes.\n\n### Output\n\nThe output is a *single line* to `stdout`.\nIf a match was found, it will be `Genus species`.\nIf no prediction could be made, it will be `No matches`.\n\n### Options\n\n```\n  -q        Quiet mode, no output\n  -p DIR    Database folder (/home/tseemann/git/sixess/db)\n  -d FILE   Database {NCBI RDP SILVA.gz} (NCBI)\n  -t NUM    CPU threads (1)\n  -m FILE   Save alignments to FILE in PAF format\n  -V        Print version and exit\n```\n\n* `-q` enables \"quiet mode\" which only prints to stderr for errors\n* `-p` is the location of the sequence databases\n* `-d` selects the database; they can be `.gz` compressed (see [Databases](#databases)\n* `-t` increases threads; 3 is the suggested value for `minimap2`\n* `-m` allows you to save the PAF output of `minimap2`\n* `-V` prints the version and exits *e.g.* `sixess 1.0`\n\n## Databases\n\n### NCBI (bundled, default)\n\nThe [NCBI 16S ribosomal RNA project](https://www.ncbi.nlm.nih.gov/refseq/targetedloci/)\ncontains curated 16S ribosomal RNA bacteria and archaea RefSeq entries.\nIt has ~20,000 entries.\n\n```\nesearch -db nucleotide -query '33175[BioProject] OR 33317[BioProject]' \\\n  | efetch -db nuccore -format fasta \\\n  \u003e $(which sixess)/../db/NCBI\n```\n\n### RDP (bundled)\n\nBacterial 16S rDNA sequences for \"type strains\" \nfrom the [RDP](https://rdp.cme.msu.edu/) database\nare included. These are denoted with `(T)` in the\nFASTA headers. It contains ~10,000 entries.\n\n```\nwget --no-check-certificate https://rdp.cme.msu.edu/download/current_Bacteria_unaligned.fa.gz\ngunzip -c current_Bacteria_unaligned.fa.gz \\\n  | bioawk -cfastx '/\\(T\\)/{print \"\u003e\" $name \" \" $comment \"\\n\" toupper($seq)}' \\\n  \u003e $(which sixess)/../db/RDP\n```\n\n### SILVA (bundled)\n\n[SILVA](https://www.arb-silva.de/)\nis a comprehensive on-line resource for quality checked and \naligned ribosomal RNA sequence data.\nThe filtered version of the aligned 16S/18S/SSU database\ncontains ~100,000 entries.\n\n```\n# replace \"132\" with latest version as needed\nwget https://www.arb-silva.de/fileadmin/silva_databases/release_132/Exports/SILVA_132_SSURef_Nr99_tax_silva.fasta.gz\ngunzip -v SILVA_132_SSURef_Nr99_tax_silva.fasta.gz \\\n  | bioawk -cfastx \\\n    '$comment ~ /^Bacteria;|^Archaea;/ \\\n    \u0026\u0026 $comment !~ /(;unidentified|Mitochondria;|;Chloroplast|;uncultured| sp\\.)/ \\\n    { sub(/^.*;/,\"\",$comment);\n      gsub(\"U\",\"T\",$seq);\n      print \"\u003e\" $name \" \" $comment \"\\n\" $seq }' \\\n  | seqtk seq -l 60 -U \\\n  \u003e SILVA.tmp1\ncd-hit-est -i SILVA.tmp1 -o SILVA.tmp2 -c 1.0 -T 0 -M 2000 -d 250\ncp SILVA.tmp2 $(which sixess)/../db/SILVA\nrm -f SILVA.tmp1 SILVA.tmp2 SILVA.tmp2.clstr\n```\n\n## Custom databases\n\nAssuming you have a FASTA file of 16S DNA sequences\ncalled `/home/alex/GG.fa` say, you can do this:\n\n### Global installaion\n\n```\ncp /home/alex/GG.fa $(which sixess)/../db/GG\nsixess -d GG R1.fastq.gz\n```\n\n### Local installaion\n\n```\nsixess -p /home/alex/data -d GG.fa R1.fastq.gz\n```\n\n## Algorithm\n\n1. Identify reads which look like 16S (`minimap2`)\n2. Count up how many reads hit each 16S sequence (possibly weighted)\n3. Choose the top hit and report it\n\n## Feedback\n\nReport bugs and give suggesions on the\n[Issues page](https://github.com/tseemann/sixess/issues)\n\n## License\n\n[GPL Version 3](https://raw.githubusercontent.com/tseemann/sixess/master/LICENSE)\n\n## Author\n\n[Torsten Seemann](http://tseemann.github.io)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftseemann%2Fsixess","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftseemann%2Fsixess","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftseemann%2Fsixess/lists"}