{"id":37073490,"url":"https://github.com/crosenth/moose","last_synced_at":"2026-01-14T08:37:34.922Z","repository":{"id":57443119,"uuid":"97652073","full_name":"crosenth/moose","owner":"crosenth","description":"dna/rna alignment classifier","archived":false,"fork":false,"pushed_at":"2025-09-05T18:55:49.000Z","size":1096,"stargazers_count":0,"open_issues_count":2,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-09-05T20:47:42.148Z","etag":null,"topics":["alignments","bioinformatics","blast","data-analysis","data-science","kmer-counting","moose-classifier","pandas"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/crosenth.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2017-07-18T23:23:51.000Z","updated_at":"2025-09-05T18:55:21.000Z","dependencies_parsed_at":"2022-09-26T17:21:33.433Z","dependency_job_id":"c0aee481-b69a-4ec2-8f10-3fb002bb6f81","html_url":"https://github.com/crosenth/moose","commit_stats":{"total_commits":164,"total_committers":4,"mean_commits":41.0,"dds":0.03658536585365857,"last_synced_commit":"51c3caf5055d2f9e8c2fc563de62f3951c84ee26"},"previous_names":[],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/crosenth/moose","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crosenth%2Fmoose","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crosenth%2Fmoose/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crosenth%2Fmoose/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crosenth%2Fmoose/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/crosenth","download_url":"https://codeload.github.com/crosenth/moose/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/crosenth%2Fmoose/sbom","scorecard":{"id":309461,"data":{"date":"2025-08-11","repo":{"name":"github.com/crosenth/moose","commit":"c4bae9cc2ac08f265fc846eadf31cf619c0890ec"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.8,"checks":[{"name":"Maintained","score":6,"reason":"8 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 6","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":10,"reason":"no dangerous workflow patterns detected","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Token-Permissions","score":0,"reason":"detected GitHub workflow tokens with excessive permissions","details":["Warn: no topLevel permission defined: .github/workflows/unnittests.yml:1","Info: no jobLevel write permissions found"],"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Code-Review","score":1,"reason":"Found 4/26 approved changesets -- score normalized to 1","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Pinned-Dependencies","score":0,"reason":"dependency not pinned by hash detected -- score normalized to 0","details":["Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/unnittests.yml:20: update your workflow using https://app.stepsecurity.io/secureworkflow/crosenth/moose/unnittests.yml/master?enable=pin","Warn: GitHub-owned GitHubAction not pinned by hash: .github/workflows/unnittests.yml:24: update your workflow using https://app.stepsecurity.io/secureworkflow/crosenth/moose/unnittests.yml/master?enable=pin","Warn: pipCommand not pinned by hash: .github/workflows/unnittests.yml:28","Info:   0 out of   2 GitHub-owned GitHubAction dependencies pinned","Info:   0 out of   1 pipCommand dependencies pinned"],"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"License","score":0,"reason":"license file not detected","details":["Warn: project does not have a license file"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":0,"reason":"branch protection not enabled on development/release branches","details":["Warn: branch protection not enabled for branch 'master'"],"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 8 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-17T22:53:36.996Z","repository_id":57443119,"created_at":"2025-08-17T22:53:36.997Z","updated_at":"2025-08-17T22:53:36.997Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28414667,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-14T08:31:27.429Z","status":"ssl_error","status_checked_at":"2026-01-14T08:31:19.098Z","response_time":107,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alignments","bioinformatics","blast","data-analysis","data-science","kmer-counting","moose-classifier","pandas"],"created_at":"2026-01-14T08:37:34.163Z","updated_at":"2026-01-14T08:37:34.909Z","avatar_url":"https://github.com/crosenth.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# moose\n\nA tool for taxonomically selecting, grouping and summarizing pairwise\nalignment classifications into something more concise and readable.\n\n## authors\n\n* [Noah Hoffman](https://github.com/nhoffman)\n* [Tim Holland](https://github.com/tholland)\n* [Daniel Hoogestraat](https://github.com/dhoogest)\n* [Tyler Land](https://github.com/tyleraland)\n* [Steve Salipante](mailto:stevesal@uw.edu)\n* [Chris Rosenthal](mailto:crosenth@gmail.com)\n\n## about\n\nMoose groups pairwise alignments by taxonomy and alignment scores.  It works \nsafely with large data sets utilizing the Python Data Analysis Library.\n\n## dependencies\n\n* Python \u003e= 3.7\n* [Pandas](https://pandas.pydata.org/) \u003e= 2.0.2\n\n## installation\n\nMoose can be installed in a few ways:\n\nFrom PyPI:\n\n```\n% pip install moose_classifier\n```\n\nOr cloned from Github:\n\n```\n% git clone https://github.com/crosenth/moose.git\n% python moose/setup.py install\n```\n\n## examples\n\nThe following examples will use results using 16s sequences aligned to a local\nNCBI nt database.  For instructions on creating a local blast nt database see\nthe NCBI walkthrough\n[here](https://www.ncbi.nlm.nih.gov/sites/books/NBK537770/) and\nand [here](https://www.ncbi.nlm.nih.gov/sites/books/NBK279688/).\n\nThe simplest example pipes blast \"10 qaccver saccver pident staxid\" into\nthe classifier and outputs a table of species level taxonomy results:\n\n```\n% blastn -db nt -outfmt \"10 qaccver saccver pident staxid\" -query sequences.fasta | classify --columns qaccver,saccver,pident,staxid\nspecimen,assignment_id,assignment,best_rank,max_percent,min_percent,min_threshold,reads,clusters,pct_reads\nquery1,0,Homo sapiens,species,99.67,99.67,0.00,1,1,100.00\nquery10,0,Actinobacteria*;uncultured bacterium*,species,100.00,93.21,0.00,1,1,100.00\nquery11,0,Bacteroidetes*;uncultured bacterium*/organism,species,100.00,91.26,0.00,1,1,100.00\nquery12,0,Apteryx australis*;Bacteria*;Firmicutes*,species,100.00,98.26,0.00,1,1,100.00\nquery13,0,Dikarya*;uncultured bacterium/eukaryote,species,100.00,82.88,0.00,1,1,100.00\nquery14,0,Saccharomyces cerevisiae*;uncultured eukaryote,species,100.00,99.00,0.00,1,1,100.00\nquery2,0,Homo sapiens;Pan troglodytes,species,97.07,95.40,0.00,1,1,100.00\nquery6,0,Bacteria*;Escherichia coli;Staphylococcus,species,100.00,98.62,0.00,1,1,100.00\nquery7,0,Bacteroidetes*;uncultured bacterium*/organism*,species,100.00,91.61,0.00,1,1,100.00\nquery8,0,Bacteria*;Escherichia coli;Staphylococcus,species,100.00,98.62,0.00,1,1,100.00\nquery9,0,Bacteria*;uncultured organism*,species,100.00,98.62,0.00,1,1,100.00\n```\n\nThis example shows the bare minimum information required to simplify and group\nalignment results: a query sequence (qseqid), subject sequence (sseqid), a\npercent identiy (pident) and a subject taxonomy id (staxid). If the staxid\ncolumn is unavailable an accession to taxonomy id map file can be used with\nthe `--seq-info` argument. Results are output in csv format.\n\nSending the `blastn` results to a standalone file we can look a bit closer \nat what happened.  And for purposes of this walkthrough the csv output will\nbe displated in as a nicely formatted table:\n\n```\n% blastn -outfmt \"10 qaccver saccver pident staxid\" -query sequences.fasta -out blast.csv\n% wc --lines blast.csv\n1084 blast.csv\n% classify --columns qaccver,saccver,pident,staxid blast.csv\nspecimen,assignment_id,assignment,best_rank,max_percent,min_percent,min_threshold,reads,clusters,pct_reads\n|----------+---------------+------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| specimen | assignment_id | assignment                                     | best_rank | max_percent | min_percent | min_threshold | reads | clusters | pct_reads |\n|----------+---------------+------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| query1   | 0             | Homo sapiens                                   | species   | 99.67       | 99.67       | 0.00          | 1     | 1        | 100.00    |\n| query10  | 0             | Actinobacteria*;uncultured bacterium*          | species   | 100.00      | 93.21       | 0.00          | 1     | 1        | 100.00    |\n| query11  | 0             | Bacteroidetes*;uncultured bacterium*/organism  | species   | 100.00      | 91.26       | 0.00          | 1     | 1        | 100.00    |\n| query12  | 0             | Apteryx australis*;Bacteria*;Firmicutes*       | species   | 100.00      | 98.26       | 0.00          | 1     | 1        | 100.00    |\n| query13  | 0             | Dikarya*;uncultured bacterium/eukaryote        | species   | 100.00      | 82.88       | 0.00          | 1     | 1        | 100.00    |\n| query14  | 0             | Saccharomyces cerevisiae*;uncultured eukaryote | species   | 100.00      | 99.00       | 0.00          | 1     | 1        | 100.00    |\n| query2   | 0             | Homo sapiens;Pan troglodytes                   | species   | 97.07       | 95.40       | 0.00          | 1     | 1        | 100.00    |\n| query6   | 0             | Bacteria*;Escherichia coli;Staphylococcus      | species   | 100.00      | 98.62       | 0.00          | 1     | 1        | 100.00    |\n| query7   | 0             | Bacteroidetes*;uncultured bacterium*/organism* | species   | 100.00      | 91.61       | 0.00          | 1     | 1        | 100.00    |\n| query8   | 0             | Bacteria*;Escherichia coli;Staphylococcus      | species   | 100.00      | 98.62       | 0.00          | 1     | 1        | 100.00    |\n| query9   | 0             | Bacteria*;uncultured organism*                 | species   | 100.00      | 98.62       | 0.00          | 1     | 1        | 100.00    |\n|----------+---------------+------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n```\n\n1,084 lines of blast results are conveniently grouped taxonomically and \nwith a single row per specimen query sequence.\n\nTaxonomy grouping is accomplished with a lineages table that can be specified\nusing the `--lineages` argument.  If a lineages file is not supplied it will be\ngenerated automatically using NCBI taxonomy data by default.  A Moose classify \nbuilt lineages table can be saved to a file using the `lineages-out` command \nwhich will speed up subsequent classify runs:\n\n```\nclassify --columns qaccver,saccver,pident,staxid --lineages-out lineages.csv --specimen one blast.csv\n```\n\n### Taxonomony grouping\n\nBy default, classifications are taxonomically grouped according to\n`--max-group-size` with 3 being the default.  Classification names will start\nat the species level by default and recursively regroup at a higher taxonomony\nuntil `--max-group-size` is satisfied.  \n\nBy increasing the `--max-group-size 5`:\n```\nclassify --columns qaccver,saccver,pident,staxid --lineages lineages.csv --max-group-size 5 blast.csv\n|----------+---------------+---------------------------------------------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| specimen | assignment_id | assignment                                                                                                          | best_rank | max_percent | min_percent | min_threshold | reads | clusters | pct_reads |\n|----------+---------------+---------------------------------------------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| query1   | 0             | Homo sapiens                                                                                                        | species   | 99.67       | 99.67       | 0.00          | 1     | 1        | 100.00    |\n| query10  | 0             | Actinomycetales bacterium 'ARUP UnID 260'*;Corynebacterium*;uncultured actinobacterium/bacterium*                   | species   | 100.00      | 93.21       | 0.00          | 1     | 1        | 100.00    |\n| query11  | 0             | Prevotella*;uncultured Bacteroidales bacterium*;uncultured Bacteroidetes bacterium;uncultured bacterium*/organism   | species   | 100.00      | 91.26       | 0.00          | 1     | 1        | 100.00    |\n| query12  | 0             | Apteryx australis*;Bacilli*;Staphylococcus*;bacterium*;uncultured Firmicutes bacterium*;uncultured bacterium*       | species   | 100.00      | 98.26       | 0.00          | 1     | 1        | 100.00    |\n| query13  | 0             | Saccharomycetales*;Xanthophyllomyces dendrorhous;uncultured bacterium/eukaryote                                     | species   | 100.00      | 82.88       | 0.00          | 1     | 1        | 100.00    |\n| query14  | 0             | Saccharomyces cerevisiae*;uncultured eukaryote                                                                      | species   | 100.00      | 99.00       | 0.00          | 1     | 1        | 100.00    |\n| query2   | 0             | Homo sapiens;Pan troglodytes                                                                                        | species   | 97.07       | 95.40       | 0.00          | 1     | 1        | 100.00    |\n| query6   | 0             | Escherichia coli;Staphylococcus;bacterium CulaenoE10F;human oral bacterium C20;uncultured bacterium*                | species   | 100.00      | 98.62       | 0.00          | 1     | 1        | 100.00    |\n| query7   | 0             | Prevotella*;uncultured Bacteroidales bacterium*;uncultured Bacteroidetes bacterium*;uncultured bacterium*/organism* | species   | 100.00      | 91.61       | 0.00          | 1     | 1        | 100.00    |\n| query8   | 0             | Escherichia coli;Staphylococcus;bacterium CulaenoE10F;human oral bacterium C20;uncultured bacterium*                | species   | 100.00      | 98.62       | 0.00          | 1     | 1        | 100.00    |\n| query9   | 0             | Bacteria*;Escherichia coli;Staphylococcus;uncultured organism*                                                      | species   | 100.00      | 98.62       | 0.00          | 1     | 1        | 100.00    |\n|----------+---------------+---------------------------------------------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n```\n\nAnd\n\n```\nclassify --columns qaccver,saccver,pident,staxid --lineages lineages.csv --max-group-size 9 blast.csv\n|----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| specimen | assignment_id | assignment                                                                                                                                                                    | best_rank | max_percent | min_percent | min_threshold | reads | clusters | pct_reads |\n|----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| query1   | 0             | Homo sapiens                                                                                                                                                                  | species   | 99.67       | 99.67       | 0.00          | 1     | 1        | 100.00    |\n| query10  | 0             | Actinomycetales bacterium 'ARUP UnID 260'*;Corynebacterium*;uncultured actinobacterium/bacterium*                                                                             | species   | 100.00      | 93.21       | 0.00          | 1     | 1        | 100.00    |\n| query11  | 0             | Prevotella amnii/bivia;Prevotella sp. 3-5;uncultured Bacteroidales bacterium*;uncultured Bacteroidetes bacterium;uncultured Prevotella sp.*;uncultured bacterium*/organism    | species   | 100.00      | 91.26       | 0.00          | 1     | 1        | 100.00    |\n| query12  | 0             | Apteryx australis*;Bacilli*;Staphylococcus*;bacterium*;uncultured Firmicutes bacterium*;uncultured bacterium*                                                                 | species   | 100.00      | 98.26       | 0.00          | 1     | 1        | 100.00    |\n| query13  | 0             | Saccharomycetales*;Xanthophyllomyces dendrorhous;uncultured bacterium/eukaryote                                                                                               | species   | 100.00      | 82.88       | 0.00          | 1     | 1        | 100.00    |\n| query14  | 0             | Saccharomyces cerevisiae*;uncultured eukaryote                                                                                                                                | species   | 100.00      | 99.00       | 0.00          | 1     | 1        | 100.00    |\n| query2   | 0             | Homo sapiens;Pan troglodytes                                                                                                                                                  | species   | 97.07       | 95.40       | 0.00          | 1     | 1        | 100.00    |\n| query6   | 0             | Escherichia coli;Staphylococcus;bacterium CulaenoE10F;human oral bacterium C20;uncultured bacterium*                                                                          | species   | 100.00      | 98.62       | 0.00          | 1     | 1        | 100.00    |\n| query7   | 0             | Prevotella amnii/bivia*;Prevotella sp. 3-5;uncultured Bacteroidales bacterium*;uncultured Bacteroidetes bacterium*;uncultured Prevotella sp.*;uncultured bacterium*/organism* | species   | 100.00      | 91.61       | 0.00          | 1     | 1        | 100.00    |\n| query8   | 0             | Escherichia coli;Staphylococcus;bacterium CulaenoE10F;human oral bacterium C20;uncultured bacterium*                                                                          | species   | 100.00      | 98.62       | 0.00          | 1     | 1        | 100.00    |\n| query9   | 0             | Escherichia coli;Staphylococcus;bacterium CulaenoE10F;human oral bacterium C20;uncultured bacterium*/organism*                                                                | species   | 100.00      | 98.62       | 0.00          | 1     | 1        | 100.00    |\n|----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n```\n\nAnd using `--max-group-size 1`:\n\n```\nclassify --columns qaccver,saccver,pident,staxid --lineages lineages.csv --max-group-size 1 blast.csv\n|----------+---------------+--------------+--------------+-------------+-------------+---------------+-------+----------+-----------|\n| specimen | assignment_id | assignment   | best_rank    | max_percent | min_percent | min_threshold | reads | clusters | pct_reads |\n|----------+---------------+--------------+--------------+-------------+-------------+---------------+-------+----------+-----------|\n| query1   | 0             | Homo sapiens | species      | 99.67       | 99.67       | 0.00          | 1     | 1        | 100.00    |\n| query10  | 0             | Bacteria*    | superkingdom | 100.00      | 93.21       | 0.00          | 1     | 1        | 100.00    |\n| query11  | 0             | root*        | root         | 100.00      | 91.26       | 0.00          | 1     | 1        | 100.00    |\n| query12  | 0             | root*        | root         | 100.00      | 98.26       | 0.00          | 1     | 1        | 100.00    |\n| query13  | 0             | root*        | root         | 100.00      | 82.88       | 0.00          | 1     | 1        | 100.00    |\n| query14  | 0             | Eukaryota*   | superkingdom | 100.00      | 99.00       | 0.00          | 1     | 1        | 100.00    |\n| query2   | 0             | Homininae    | subfamily    | 97.07       | 95.40       | 0.00          | 1     | 1        | 100.00    |\n| query6   | 0             | Bacteria*    | superkingdom | 100.00      | 98.62       | 0.00          | 1     | 1        | 100.00    |\n| query7   | 0             | root*        | root         | 100.00      | 91.61       | 0.00          | 1     | 1        | 100.00    |\n| query8   | 0             | Bacteria*    | superkingdom | 100.00      | 98.62       | 0.00          | 1     | 1        | 100.00    |\n| query9   | 0             | root*        | root         | 100.00      | 98.62       | 0.00          | 1     | 1        | 100.00    |\n|----------+---------------+--------------+--------------+-------------+-------------+---------------+-------+----------+-----------|\n```\n\nUsing the `--specimen` argument the results can be grouped together to further simplify the results:\n\n```\nclassify --columns qaccver,saccver,pident,staxid --lineages lineages.csv --max-group-size 1 --specimen one blast.csv\n|----------+---------------+--------------+--------------+-------------+-------------+---------------+-------+----------+-----------|\n| specimen | assignment_id | assignment   | best_rank    | max_percent | min_percent | min_threshold | reads | clusters | pct_reads |\n|----------+---------------+--------------+--------------+-------------+-------------+---------------+-------+----------+-----------|\n| one      | 0             | root*        | root         | 100.00      | 82.88       | 0.00          | 5     | 5        | 45.45     |\n| one      | 1             | Bacteria*    | superkingdom | 100.00      | 93.21       | 0.00          | 3     | 3        | 27.27     |\n| one      | 2             | Homo sapiens | species      | 99.67       | 99.67       | 0.00          | 1     | 1        | 9.09      |\n| one      | 3             | Homininae    | subfamily    | 97.07       | 95.40       | 0.00          | 1     | 1        | 9.09      |\n| one      | 4             | Eukaryota*   | superkingdom | 100.00      | 99.00       | 0.00          | 1     | 1        | 9.09      |\n|----------+---------------+--------------+--------------+-------------+-------------+---------------+-------+----------+-----------|\n```\n\nIf `--columns` is not specified the classifier will check for a header with \nminimum qseqid,sseqid,pident columns.  If no header than blast outfmt 6 columns\nare assumed.\n\n### Alignment selection\nTODO\n\n### Rank thresholds\n\nThe Moose classifier is built to accept dynamic thresholds for any taxonomic\nto provide the best possible classification using the `--rank-thresholds`\nargument. An example input looks like this:\n\n```\n% cl rank_thresholds.csv\n|--------+------+---------+--------+-------+-------+--------+-------+---------|\n| tax_id | root | kingdom | phylum | class | order | family | genus | species | subspecies |\n|--------+------+---------+--------+-------+-------+--------+-------+---------|\n| 1      | 75.0 | 75.0    | 80.0   | 90.0  | 93.0  | 95.0   | 97.0  |  99.0   |\n|--------+------+---------+--------+-------+-------+--------+-------+---------|\n```\n\nAny tax_id can be specified in a rank thresholds file.  If a tax_id is not\npresent the rank thresholds file the classifier will work its way up the \nreference sequence's taxonomy lineage in order to assign rank thresholds.\n\nThe classifier will use the rank threshold table to select the lowest possible\nbest hits available for classification.  Example usage:\n\n```\nclassify --columns qaccver,saccver,pident,staxid --lineages lineages.csv --rank-thresholds rank_thresholds.csv --specimen one blast.csv\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| specimen | assignment_id | assignment                                                                        | best_rank | max_percent | min_percent | min_threshold | reads | clusters | pct_reads |\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| one      | 0             | Bacteroidetes*;uncultured bacterium*/organism*                                    | species   | 100.00      | 99.27       | 99.00         | 2     | 2        | 18.18     |\n| one      | 1             | Bacteria*;Escherichia coli;Staphylococcus                                         | species   | 100.00      | 99.31       | 99.00         | 2     | 2        | 18.18     |\n| one      | 2             | Saccharomyces cerevisiae*;uncultured eukaryote                                    | species   | 100.00      | 99.33       | 99.00         | 1     | 1        | 9.09      |\n| one      | 3             | Homo sapiens                                                                      | species   | 99.67       | 99.67       | 99.00         | 1     | 1        | 9.09      |\n| one      | 4             | Homo                                                                              | genus     | 97.07       | 97.07       | 97.00         | 1     | 1        | 9.09      |\n| one      | 5             | Clavispora lusitaniae*                                                            | species   | 100.00      | 100.00      | 99.00         | 1     | 1        | 9.09      |\n| one      | 6             | Bacteria*;uncultured organism*                                                    | species   | 100.00      | 99.31       | 99.00         | 1     | 1        | 9.09      |\n| one      | 7             | Apteryx australis*;Bacteria*;Firmicutes*                                          | species   | 100.00      | 99.31       | 99.00         | 1     | 1        | 9.09      |\n| one      | 8             | Actinomycetales bacterium 'ARUP UnID 260'*;Corynebacterium*;uncultured bacterium* | species   | 100.00      | 99.64       | 99.00         | 1     | 1        | 9.09      |\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n```\n\nThere are a few things to notice when using a rank thresholds table.\nThe first is the min_percent column will correspond to the lowest rank \nthreshold used for hit selection.  The second is the difference in \nclassifications after dropping hits below the min_threshold.\n\nLastly, the genus level Homo classification was not rolled into the \nHomo sapiens classification because the rank thresholds table determined that\nthe best hits available for that query sequence could only be classified at the genus\nlevel.  This is despite the fact that the genus level Homo classification was\nderived from Homo sapien reference sequences.  What the rank thresholds table\ndefines is classification uncertainty.  So, despite the Homo sapien\nreference sequences hits the classifier could not determine the query sequence \nwas in fact Homo sapien but only of genus level Homo origin.\n\n### Specimen map\n\nA three column specimen,qseqid,weight file included using the `--specimen-map`\nargument.  An example might look like this:\n\n```\ncat specimen_map.csv\none,query1,100\none,query2,95\none,query6,75\none,query7,70\none,query8,65\none,query9,60\none,query10,55\none,query11,50\none,query12,45\none,query13,40\none,query14,35\nclassify --columns qaccver,saccver,pident,staxid --lineages lineages.csv --rank-thresholds rank_thresholds.csv --specimen-map specimen_map.csv blast.csv | cl\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| specimen | assignment_id | assignment                                                                        | best_rank | max_percent | min_percent | min_threshold | reads | clusters | pct_reads |\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| one      | 0             | Bacteria*;Escherichia coli;Staphylococcus                                         | species   | 100.00      | 99.31       | 99.00         | 140   | 2        | 20.29     |\n| one      | 1             | Bacteroidetes*;uncultured bacterium*/organism*                                    | species   | 100.00      | 99.27       | 99.00         | 120   | 2        | 17.39     |\n| one      | 2             | Homo sapiens                                                                      | species   | 99.67       | 99.67       | 99.00         | 100   | 1        | 14.49     |\n| one      | 3             | Homo                                                                              | genus     | 97.07       | 97.07       | 97.00         | 95    | 1        | 13.77     |\n| one      | 4             | Bacteria*;uncultured organism*                                                    | species   | 100.00      | 99.31       | 99.00         | 60    | 1        | 8.70      |\n| one      | 5             | Actinomycetales bacterium 'ARUP UnID 260'*;Corynebacterium*;uncultured bacterium* | species   | 100.00      | 99.64       | 99.00         | 55    | 1        | 7.97      |\n| one      | 6             | Apteryx australis*;Bacteria*;Firmicutes*                                          | species   | 100.00      | 99.31       | 99.00         | 45    | 1        | 6.52      |\n| one      | 7             | Clavispora lusitaniae*                                                            | species   | 100.00      | 100.00      | 99.00         | 40    | 1        | 5.80      |\n| one      | 8             | Saccharomyces cerevisiae*;uncultured eukaryote                                    | species   | 100.00      | 99.33       | 99.00         | 35    | 1        | 5.07      |\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n```\n\nThe classifer will interpret each qseqid in the specimen map file as part of\nthe specimen.  If a qseqid is not included in the blast.csv results then a \nclassification of `[no blast result]` will be assigned:\n\n```\ncat specimen_map.csv\none,query1,100\none,query2,95\none,query3,90\none,query4,85\none,query5,80\none,query6,75\none,query7,70\none,query8,65\none,query9,60\none,query10,55\none,query11,50\none,query12,45\none,query13,40\none,query14,35\nclassify --columns qaccver,saccver,pident,staxid --lineages lineages.csv --rank-thresholds rank_thresholds.csv --specimen-map specimen_map.csv blast.csv\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| specimen | assignment_id | assignment                                                                        | best_rank | max_percent | min_percent | min_threshold | reads | clusters | pct_reads |\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| one      | 0             | [no blast result]                                                                 |           |             |             |               | 255   | 3        | 26.98     |\n| one      | 1             | Bacteria*;Escherichia coli;Staphylococcus                                         | species   | 100.00      | 99.31       | 99.00         | 140   | 2        | 14.81     |\n| one      | 2             | Bacteroidetes*;uncultured bacterium*/organism*                                    | species   | 100.00      | 99.27       | 99.00         | 120   | 2        | 12.70     |\n| one      | 3             | Homo sapiens                                                                      | species   | 99.67       | 99.67       | 99.00         | 100   | 1        | 10.58     |\n| one      | 4             | Homo                                                                              | genus     | 97.07       | 97.07       | 97.00         | 95    | 1        | 10.05     |\n| one      | 5             | Bacteria*;uncultured organism*                                                    | species   | 100.00      | 99.31       | 99.00         | 60    | 1        | 6.35      |\n| one      | 6             | Actinomycetales bacterium 'ARUP UnID 260'*;Corynebacterium*;uncultured bacterium* | species   | 100.00      | 99.64       | 99.00         | 55    | 1        | 5.82      |\n| one      | 7             | Apteryx australis*;Bacteria*;Firmicutes*                                          | species   | 100.00      | 99.31       | 99.00         | 45    | 1        | 4.76      |\n| one      | 8             | Clavispora lusitaniae*                                                            | species   | 100.00      | 100.00      | 99.00         | 40    | 1        | 4.23      |\n| one      | 9             | Saccharomyces cerevisiae*;uncultured eukaryote                                    | species   | 100.00      | 99.33       | 99.00         | 35    | 1        | 3.70      |\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n```\n\nMultiple specimens can be specified with qseqids of the same name:\n\n```\ncat specimen_map.csv\none,query1,100\none,query2,95\none,query3,90\none,query4,85\none,query5,80\none,query6,75\none,query7,70\none,query8,65\none,query9,60\none,query10,55\none,query11,50\none,query12,45\none,query13,40\none,query14,35\ntwo,query3,1000\ntwo,query6,500\ntwo,query1,25\ntwo,query8,900\nclassify --columns qaccver,saccver,pident,staxid --lineages lineages.csv --rank-thresholds rank_thresholds.csv --specimen-map specimen_map.csv blast.csv\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| specimen | assignment_id | assignment                                                                        | best_rank | max_percent | min_percent | min_threshold | reads | clusters | pct_reads |\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| one      | 0             | [no blast result]                                                                 |           |             |             |               | 255   | 3        | 26.98     |\n| one      | 1             | Bacteria*;Escherichia coli;Staphylococcus                                         | species   | 100.00      | 99.31       | 99.00         | 140   | 2        | 14.81     |\n| one      | 2             | Bacteroidetes*;uncultured bacterium*/organism*                                    | species   | 100.00      | 99.27       | 99.00         | 120   | 2        | 12.70     |\n| one      | 3             | Homo sapiens                                                                      | species   | 99.67       | 99.67       | 99.00         | 100   | 1        | 10.58     |\n| one      | 4             | Homo                                                                              | genus     | 97.07       | 97.07       | 97.00         | 95    | 1        | 10.05     |\n| one      | 5             | Bacteria*;uncultured organism*                                                    | species   | 100.00      | 99.31       | 99.00         | 60    | 1        | 6.35      |\n| one      | 6             | Actinomycetales bacterium 'ARUP UnID 260'*;Corynebacterium*;uncultured bacterium* | species   | 100.00      | 99.64       | 99.00         | 55    | 1        | 5.82      |\n| one      | 7             | Apteryx australis*;Bacteria*;Firmicutes*                                          | species   | 100.00      | 99.31       | 99.00         | 45    | 1        | 4.76      |\n| one      | 8             | Clavispora lusitaniae*                                                            | species   | 100.00      | 100.00      | 99.00         | 40    | 1        | 4.23      |\n| one      | 9             | Saccharomyces cerevisiae*;uncultured eukaryote                                    | species   | 100.00      | 99.33       | 99.00         | 35    | 1        | 3.70      |\n| two      | 0             | Bacteria*;Escherichia coli;Staphylococcus                                         | species   | 100.00      | 99.31       | 99.00         | 1400  | 2        | 57.73     |\n| two      | 1             | [no blast result]                                                                 |           |             |             |               | 1000  | 1        | 41.24     |\n| two      | 2             | Homo sapiens                                                                      | species   | 99.67       | 99.67       | 99.00         | 25    | 1        | 1.03      |\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n```\n\nIf a query sequence is not included in the specimen map but returned as part\nof the blast.csv results then it will be added as its own specimen:\n\n```\ncat specimen_map.csv\none,query1,100\none,query4,85\none,query5,80\none,query6,75\none,query7,70\nclassify --columns qaccver,saccver,pident,staxid --lineages lineages.csv --rank-thresholds rank_thresholds.csv --specimen-map specimen_map.csv blast.csv\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| specimen | assignment_id | assignment                                                                        | best_rank | max_percent | min_percent | min_threshold | reads | clusters | pct_reads |\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n| one      | 0             | [no blast result]                                                                 |           |             |             |               | 165   | 2        | 40.24     |\n| one      | 1             | Homo sapiens                                                                      | species   | 99.67       | 99.67       | 99.00         | 100   | 1        | 24.39     |\n| one      | 2             | Bacteria*;Escherichia coli;Staphylococcus                                         | species   | 100.00      | 99.31       | 99.00         | 75    | 1        | 18.29     |\n| one      | 3             | Bacteroidetes*;uncultured bacterium*/organism*                                    | species   | 100.00      | 99.27       | 99.00         | 70    | 1        | 17.07     |\n| query10  | 0             | Actinomycetales bacterium 'ARUP UnID 260'*;Corynebacterium*;uncultured bacterium* | species   | 100.00      | 99.64       | 99.00         | 1     | 1        | 100.00    |\n| query11  | 0             | Bacteroidetes*;uncultured bacterium*/organism                                     | species   | 100.00      | 99.27       | 99.00         | 1     | 1        | 100.00    |\n| query12  | 0             | Apteryx australis*;Bacteria*;Firmicutes*                                          | species   | 100.00      | 99.31       | 99.00         | 1     | 1        | 100.00    |\n| query13  | 0             | Clavispora lusitaniae*                                                            | species   | 100.00      | 100.00      | 99.00         | 1     | 1        | 100.00    |\n| query14  | 0             | Saccharomyces cerevisiae*;uncultured eukaryote                                    | species   | 100.00      | 99.33       | 99.00         | 1     | 1        | 100.00    |\n| query2   | 0             | Homo                                                                              | genus     | 97.07       | 97.07       | 97.00         | 1     | 1        | 100.00    |\n| query8   | 0             | Bacteria*;Escherichia coli;Staphylococcus                                         | species   | 100.00      | 99.31       | 99.00         | 1     | 1        | 100.00    |\n| query9   | 0             | Bacteria*;uncultured organism*                                                    | species   | 100.00      | 99.31       | 99.00         | 1     | 1        | 100.00    |\n|----------+---------------+-----------------------------------------------------------------------------------+-----------+-------------+-------------+---------------+-------+----------+-----------|\n```\n\n### copy numbers\n\nMultiple copies of the gene may be present in a species genome which may\ndistort the relative weight abundance of a classification.  The Moose\nClassifier excepts a two column csv file with columns `--copy-numbers`:\n\n```\n|--------+-------|\n| tax_id | count |\n|--------+-------|\n```\n\nand will divide the final classification tax_id by the count number in this\nfile and expressed under the `corrected` column in the output file.  This is\nuseful for adjusting relative abundance of species when, for example, \ndoing 16s classifications.\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrosenth%2Fmoose","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcrosenth%2Fmoose","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcrosenth%2Fmoose/lists"}