{"id":28569294,"url":"https://github.com/wang-q/nwr","last_synced_at":"2025-06-10T17:11:21.800Z","repository":{"id":53899237,"uuid":"461995967","full_name":"wang-q/nwr","owner":"wang-q","description":"`nwr` is a command line tool for working with NCBI taxonomy, Newick files and assembly reports","archived":false,"fork":false,"pushed_at":"2025-05-06T19:52:33.000Z","size":21783,"stargazers_count":7,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-05-06T20:38:18.330Z","etag":null,"topics":["assembly","bioinformatics","evolution","ncbi","newick-format","phylogenetic-trees","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wang-q.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2022-02-21T19:05:20.000Z","updated_at":"2025-05-06T19:52:36.000Z","dependencies_parsed_at":"2023-12-15T19:58:44.409Z","dependency_job_id":"c98a9bd0-4084-4534-91b7-986d2a1d3a58","html_url":"https://github.com/wang-q/nwr","commit_stats":{"total_commits":356,"total_committers":4,"mean_commits":89.0,"dds":"0.016853932584269704","last_synced_commit":"d981d3f2f69386feb4c4f98e298e589701bc4935"},"previous_names":[],"tags_count":45,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wang-q%2Fnwr","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wang-q%2Fnwr/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wang-q%2Fnwr/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wang-q%2Fnwr/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wang-q","download_url":"https://codeload.github.com/wang-q/nwr/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wang-q%2Fnwr/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":258866590,"owners_count":22769975,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["assembly","bioinformatics","evolution","ncbi","newick-format","phylogenetic-trees","rust"],"created_at":"2025-06-10T17:11:21.086Z","updated_at":"2025-06-10T17:11:21.791Z","avatar_url":"https://github.com/wang-q.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# nwr\n\n[![Publish](https://github.com/wang-q/nwr/actions/workflows/publish.yml/badge.svg)](https://github.com/wang-q/nwr/actions)\n[![Build](https://github.com/wang-q/nwr/actions/workflows/build.yml/badge.svg)](https://github.com/wang-q/nwr/actions)\n[![Codecov](https://img.shields.io/codecov/c/github/wang-q/nwr/master.svg)](https://codecov.io/github/wang-q/nwr?branch=master)\n[![Crates.io](https://img.shields.io/crates/v/nwr.svg)](https://crates.io/crates/nwr)\n![](https://img.shields.io/crates/d/nwr?label=downloads%20%28crates.io%29)\n[![Lines of code](https://www.aschey.tech/tokei/github/wang-q/nwr)](https://github.com//wang-q/nwr)\n\n`nwr` is a command line tool for working with **N**CBI taxonomy, Ne**W**ick files and assembly\n**R**eports, written in Rust.\n\n## Install\n\nCurrent release: 0.8.5\n\n```shell\ncargo install nwr\n\n# or\ncargo install --path . --force # --offline\n\n# Concurrent tests may trigger sqlite locking\ncargo test -- --test-threads=1\n\n# build under WSL 2\nmkdir -p /tmp/cargo\nexport CARGO_TARGET_DIR=/tmp/cargo\ncargo build\n\n# build for CentOS 7\n# rustup target add x86_64-unknown-linux-gnu\n# pip3 install cargo-zigbuild\ncargo zigbuild --target x86_64-unknown-linux-gnu.2.17 --release\nll $CARGO_TARGET_DIR/x86_64-unknown-linux-gnu/release/\n\n```\n\n## `nwr help`\n\n```console\n$ nwr help\n`nwr` is a command line tool for working with NCBI taxonomy, Newick files and assembly reports\n\nUsage: nwr [COMMAND]\n\nCommands:\n  download     Download the latest releases of `taxdump` and assembly reports\n  txdb         Init the taxonomy database\n  ardb         Init the assembly database\n  info         Information of Taxonomy ID(s) or scientific name(s)\n  lineage      Output the lineage of the term\n  member       List members (of certain ranks) under ancestral term(s)\n  append       Append fields of higher ranks to a TSV file\n  restrict     Restrict taxonomy terms to ancestral descendants\n  common       Output the common tree of terms\n  template     Create dirs, data and scripts for a phylogenomic research\n  kb           Prints docs (knowledge bases)\n  seqdb        Init the seq database\n  data         Newick data commands\n  ops          Newick operation commands\n  viz          Newick visualization commands\n  mat          Distance matrix commands\n  build        Build tree from distance matrix\n  plot         Plot commands\n  pl-condense  Pipeline - condense subtrees based on taxonomy\n  help         Print this message or the help of the given subcommand(s)\n\nOptions:\n  -h, --help     Print help\n  -V, --version  Print version\n\nSubcommand groups:\n\n* Database\n    * download / txdb / ardb\n* Taxonomy\n    * info / lineage / member / append / restrict / common\n* Assembly\n    * template / kb / seqdb\n* Newick\n    * data label / data stat / data distance\n    * Operations\n        * ops order / ops rename / ops replace / ops topo / ops subtree /\n          ops prune / ops  reroot\n    * Visualization\n        * viz indent / viz comment / viz tex\n    * pl-condense\n* Distance matrix\n    * mat pair / mat phylip / mat format / mat subset / mat compare\n* Build tree\n    * build upgma / build nj\n* Plots\n    * plot hh / plot venn\n\n```\n\n## Examples\n\n### Usage of each command\n\nFor practical uses of `nwr` and other awesome companions, follow this [page](doc/ncbi_ar.md).\n\n```shell\nnwr download\n\nnwr txdb\n\nnwr info \"Homo sapiens\" 4932\n\nnwr lineage \"Homo sapiens\"\nnwr lineage 4932\n\nnwr restrict \"Vertebrata\" -c 2 -f tests/nwr/taxon.tsv\n##sci_name       tax_id\n#Human   9606\n\nnwr member \"Homo\"\n\nnwr append tests/nwr/taxon.tsv -c 2 -r species -r family --id\n\nnwr ardb\nnwr ardb --genbank\n\nnwr common \"Escherichia coli\" 4932 Drosophila_melanogaster 9606 Mus_musculus\n\n# rm ~/.nwr/*.dmp\n\n```\n\n### Development\n\n```shell\ncargo test --color=always --package nwr --test cli_nwr command_template -- --show-output\n\n# debug mode has a slow connection\ncargo run --release --bin nwr download\n\n# tests/nwr/\ncargo run --bin nwr txdb -d tests/nwr/\n\ncargo run --bin nwr info -d tests/nwr/ --tsv Viruses \"Actinophage JHJ-1\" \"Bacillus phage bg1\"\n\ncargo run --bin nwr common -d tests/nwr/ \"Actinophage JHJ-1\" \"Bacillus phage bg1\"\n\ncargo run --bin nwr template tests/assembly/Trichoderma.assembly.tsv --ass -o stdout\n\n```\n\n### seqdb\n\n```shell\nexport SPECIES=\"$HOME/data/Archaea/Protein/Sulfolobus_acidocaldarius\"\n\ncargo run --bin nwr seqdb -d ${SPECIES} --init --strain\n\ncargo run --bin nwr seqdb -d ${SPECIES} \\\n    --size \u003c(\n        hnsm size ${SPECIES}/pro.fa.gz\n    ) \\\n    --clust\n\ncargo run --bin nwr seqdb -d ${SPECIES} \\\n    --anno \u003c(\n        gzip -dcf \"${SPECIES}\"/anno.tsv.gz\n    ) \\\n    --asmseq \u003c(\n        gzip -dcf \"${SPECIES}\"/asmseq.tsv.gz\n    )\n\ncargo run --bin nwr seqdb -d ${SPECIES} --rep f1=\"${SPECIES}\"/fam88_cluster.tsv\n\necho \"\n    SELECT\n        *\n    FROM asm\n    WHERE 1=1\n    \" |\n    sqlite3 -tabs ${SEQ_DIR}/seq.sqlite\n\necho \"\n    SELECT\n        COUNT(distinct asm_seq.asm_id)\n    FROM asm_seq\n    WHERE 1=1\n    \" |\n    sqlite3 -tabs ${SEQ_DIR}/seq.sqlite\n\necho \"\n.header ON\n    SELECT\n        'species' AS species,\n        COUNT(distinct asm_seq.asm_id) AS strain,\n        COUNT(*) AS total,\n        COUNT(distinct rep_seq.seq_id) AS dedup,\n        COUNT(distinct rep_seq.rep_id) AS rep\n    FROM asm_seq\n    JOIN rep_seq ON asm_seq.seq_id = rep_seq.seq_id\n    WHERE 1=1\n    \" |\n    sqlite3 -tabs ${SEQ_DIR}/seq.sqlite\n\n\n```\n\n### Newick files and LaTeX\n\nFor more detailed usages, check [this file](tree/README.md).\n\n#### Get data from the tree\n\n```shell\n# List all names\nnwr data label tests/newick/hg38.7way.nwk\n\n# The intersection between the nodes in the tree and the provided\nnwr data label tests/newick/hg38.7way.nwk -r \"^ch\" -n Mouse -n foo\nnwr data label tests/newick/catarrhini.nwk -n Homo -n Pan -n Gorilla -M\n# Is Pongo the sibling of Homininae?\nnwr data label tests/newick/catarrhini.nwk -n Homininae -n Pongo -DM\n# All leaves belong to Hominidae\nnwr data label tests/newick/catarrhini.nwk -t Hominidae -I\n\nnwr data label tests/newick/catarrhini.nwk -c dup\nnwr data label tests/newick/catarrhini.comment.nwk -c full\n\nnwr data stat tests/newick/hg38.7way.nwk\n\n# Various distances\nnwr data distance -m root -I tests/newick/catarrhini.nwk\nnwr data distance -m parent -I tests/newick/catarrhini.nwk\nnwr data distance -m pairwise -I tests/newick/catarrhini.nwk\nnwr data distance -m lca -I tests/newick/catarrhini.nwk\n\nnwr data distance -m root -L tests/newick/catarrhini_topo.nwk\n\n# Phylip distance matrix\nnwr data distance -m phylip tests/newick/catarrhini.nwk\n\n```\n\n#### Operations of the tree\n\n```shell\necho \"((A,B),C);\" | nwr ops order --ndr stdin\nnwr ops order --nd tests/newick/hg38.7way.nwk\n\nnwr ops order --list tests/newick/abcde.list tests/newick/abcde.nwk\n\n# gene tree as the order of species tree\nnwr ops order tests/newick/pmxc.nwk \\\n    --list \u003c(nwr data label tests/newick/species.nwk)\n\nnwr ops rename tests/newick/abc.nwk -n C -r F -l A,B -r D\n\nnwr ops replace tests/newick/abc.nwk tests/newick/abc.replace.tsv\nnwr ops replace tests/newick/abc.nwk tests/newick/abc3.replace.tsv\n\nnwr ops topo tests/newick/catarrhini.nwk\n\n# The behavior is very similar to `nwr label`, but outputs a subtree instead of labels\nnwr ops subtree tests/newick/hg38.7way.nwk -n Human -n Rhesus -r \"^ch\" -M\n\n# Condense the subtree to a node\nnwr ops subtree tests/newick/hg38.7way.nwk -n Human -n Rhesus -r \"^ch\" -M -c Primates\n\nnwr ops subtree tests/newick/catarrhini.nwk -t Hominidae\n\nnwr ops prune tests/newick/catarrhini.nwk -n Homo -n Pan\n\necho \"((A:1,B:1)D:1,C:1)E;\" |\n    nwr ops reroot stdin -n B\nnwr ops reroot tests/newick/catarrhini_wrong.nwk -n Cebus\n\nnwr ops reroot tests/newick/bs.nw -n C\n\nnwr viz tex tests/newick/bs.nw | tectonic -\nmv texput.pdf bs.pdf\nnwr ops reroot tests/newick/bs.nw -n C | nwr viz tex stdin | tectonic -\nmv texput.pdf bs.reroot.pdf\n\nnwr pl-condense tests/newick/catarrhini.nwk -r family\n\n```\n\n#### Visualization of the tree\n\n```shell\nnwr viz indent tests/newick/hg38.7way.nwk --text \".   \"\n\necho \"((A,B),C);\" |\n    nwr viz comment stdin -n A -n C --color green |\n    nwr viz comment stdin -l A,B --dot\n\ntectonic doc/template.tex\n\necho \"((A[color=green],B)[dot=black],C[color=green]);\" |\n    nwr viz comment stdin -r \"color=\"\n\nnwr viz tex tests/newick/catarrhini.nwk -o output.tex\ntectonic output.tex\n\nnwr viz tex --bl tests/newick/hg38.7way.nwk |\n    tectonic - \u0026\u0026\n    mv texput.pdf hg38.7way.pdf\n\nnwr viz tex --forest --bare tests/newick/test.forest\n\nnwr viz common \"Escherichia coli\" 4932 Drosophila_melanogaster 9606 \"Mus musculus\" |\n    nwr viz tex --bare stdin\n\n```\n\n### Matrix commands\n\n```bash\nnwr mat phylip tests/mat/IBPA.fa.tsv\n\nnwr mat pair tests/mat/IBPA.phy\n\nnwr mat format tests/mat/IBPA.phy\n\nnwr mat subset tests/mat/IBPA.phy tests/mat/IBPA.list\n\nhnsm distance tests/mat/IBPA.fa -k 7 -w 1 |\n    nwr mat phylip stdin -o tests/mat/IBPA.71.phy\n\nnwr mat compare tests/mat/IBPA.phy tests/mat/IBPA.71.phy --method all\n# Sequences in matrices: 10 and 10\n# Common sequences: 10\n# Method  Score\n# pearson 0.935803\n# spearman        0.919631\n# mae     0.113433\n# cosine  0.978731\n# jaccard 0.759106\n# euclid  1.229844\n\n```\n\n### Build tree from distance matrix\n\n```shell\nnwr build upgma tests/build/wiki.phy |\n    nwr viz tex stdin --bl |\n    tectonic - \u0026\u0026\n    mv texput.pdf wiki.upgma.pdf\n\nnwr build nj tests/build/wiki-nj.phy |\n    nwr viz tex stdin --bl |\n    tectonic - \u0026\u0026\n    mv texput.pdf wiki.nj.pdf\n\nneighbor\n# tests/build/wiki-nj.phy\n# r\n# y\n# r\nnwr ops reroot outtree -n e\n\n```\n\n### Plots\n\n```shell\n# venn\nnwr plot venn \\\n    tests/plot/rocauc.result.tsv \\\n    tests/plot/mcox.05.result.tsv |\n    tectonic - \u0026\u0026\n    mv texput.pdf venn2.pdf\n\nnwr plot venn \\\n    tests/plot/rocauc.result.tsv \\\n    tests/plot/mcox.05.result.tsv \\\n    tests/plot/mcox.result.tsv |\n    tectonic - \u0026\u0026\n    mv texput.pdf venn3.pdf\n\nnwr plot venn \\\n    tests/plot/rocauc.result.tsv \\\n    tests/plot/rocauc.result.tsv \\\n    tests/plot/mcox.05.result.tsv \\\n    tests/plot/mcox.result.tsv |\n    tectonic - \u0026\u0026\n    mv texput.pdf venn4.pdf\n\nplotr venn tests/plot/rocauc.result.tsv tests/plot/mcox.05.result.tsv\n\ntectonic doc/venn4.tex\n\n# histo\nnwr plot hh tests/plot/hist.tsv -g 2 --bins 20 --xl \"\" --unit 0.5,1.5 |\n    tectonic - \u0026\u0026\n    mv texput.pdf hist.pdf\n\nnwr plot hh tests/plot/hist.tsv --bins 30 --xl \"\" --xmm 45,75 --unit 0.5,1.5 |\n    tectonic - \u0026\u0026\n    mv texput.pdf hist.pdf\n\ncargo run --bin nwr plot hh tests/plot/adomain.tsv -g 2 --bins 40 --xl \"\" --yl \"\" --unit 0.3,0.5 |\n    tectonic - \u0026\u0026\n    mv texput.pdf hist.pdf\n\ntectonic doc/heatmap.tex\n\n# nrps\ncargo run --bin nwr plot nrps tests/plot/srf.tsv --legend --color blue |\n    tectonic - \u0026\u0026\n    mv texput.pdf srf.pdf\n\ntectonic doc/nrps.tex\n\ntectonic doc/da.tex\n\n```\n\n## Database schema\n\n```shell\nbrew install k1LoW/tap/tbls\n\ntbls doc sqlite://./tests/nwr/taxonomy.sqlite doc/txdb\n\ntbls doc sqlite://./tests/nwr/ar_refseq.sqlite doc/ardb\n\n```\n\n[txdb](./doc/txdb/README.md)\n\n[ardb](./doc/ardb/README.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwang-q%2Fnwr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwang-q%2Fnwr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwang-q%2Fnwr/lists"}