{"id":17652264,"url":"https://github.com/zyxue/ncbitax2lin","last_synced_at":"2025-04-09T20:04:59.158Z","repository":{"id":43069835,"uuid":"56956100","full_name":"zyxue/ncbitax2lin","owner":"zyxue","description":"🐞 Convert NCBI taxonomy dump into lineages","archived":false,"fork":false,"pushed_at":"2025-04-08T20:34:24.000Z","size":404,"stargazers_count":144,"open_issues_count":3,"forks_count":30,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-04-09T20:04:54.899Z","etag":null,"topics":["lineage","ncbi","ncbi-taxonomy","pandas","python","taxdump","taxonomy"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zyxue.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-04-24T05:41:45.000Z","updated_at":"2025-04-08T20:34:28.000Z","dependencies_parsed_at":"2022-07-16T19:16:11.047Z","dependency_job_id":null,"html_url":"https://github.com/zyxue/ncbitax2lin","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zyxue%2Fncbitax2lin","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zyxue%2Fncbitax2lin/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zyxue%2Fncbitax2lin/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zyxue%2Fncbitax2lin/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zyxue","download_url":"https://codeload.github.com/zyxue/ncbitax2lin/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248103868,"owners_count":21048245,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["lineage","ncbi","ncbi-taxonomy","pandas","python","taxdump","taxonomy"],"created_at":"2024-10-23T11:46:24.000Z","updated_at":"2025-04-09T20:04:59.147Z","avatar_url":"https://github.com/zyxue.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# NCBItax2lin\n\n[![Downloads](https://pepy.tech/badge/ncbitax2lin/week)](https://pepy.tech/project/ncbitax2lin)\n\nConvert NCBI taxonomy dump into lineages. An example for [human\n(tax_id=9606)](https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=9606)\nis like\n\n| tax_id | superkingdom | phylum   | class    | order    | family    | genus | species      | family1 | forma | genus1 | infraclass | infraorder  | kingdom | no rank            | no rank1     | no rank10            | no rank11 | no rank12 | no rank13 | no rank14 | no rank15     | no rank16 | no rank17 | no rank18 | no rank19 | no rank2  | no rank20 | no rank21 | no rank22 | no rank3  | no rank4      | no rank5   | no rank6      | no rank7   | no rank8     | no rank9      | parvorder  | species group | species subgroup | species1 | subclass | subfamily | subgenus | subkingdom | suborder    | subphylum | subspecies | subtribe | superclass | superfamily | superorder       | superorder1 | superphylum | tribe | varietas |\n|--------|--------------|----------|----------|----------|-----------|-------|--------------|---------|-------|--------|------------|-------------|---------|--------------------|--------------|----------------------|-----------|-----------|-----------|-----------|---------------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|-----------|---------------|------------|---------------|------------|--------------|---------------|------------|---------------|------------------|----------|----------|-----------|----------|------------|-------------|-----------|------------|----------|------------|-------------|------------------|-------------|-------------|-------|----------|\n| 9606   | Eukaryota    | Chordata | Mammalia | Primates | Hominidae | Homo  | Homo sapiens |         |       |        |            | Simiiformes | Metazoa | cellular organisms | Opisthokonta | Dipnotetrapodomorpha | Tetrapoda | Amniota   | Theria    | Eutheria  | Boreoeutheria |           |           |           |           | Eumetazoa |           |           |           | Bilateria | Deuterostomia | Vertebrata | Gnathostomata | Teleostomi | Euteleostomi | Sarcopterygii | Catarrhini |               |                  |          |          | Homininae |          |            | Haplorrhini | Craniata  |            |          |            | Hominoidea  | Euarchontoglires |             |             |       |          |\n\n### Install\n\nncbitax2lin supports python-3.7, python-3.8, and python-3.9.\n\n```\npip install -U ncbitax2lin\n```\n\n### Generate lineages\n\nFirst download taxonomy dump from NCBI:\n\n```bash\nwget -N ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz\nmkdir -p taxdump \u0026\u0026 tar zxf taxdump.tar.gz -C ./taxdump\n```\n\nThen, run ncbitax2lin\n\n```bash\nncbitax2lin --nodes-file taxdump/nodes.dmp --names-file taxdump/names.dmp\n```\n\nBy default, the generated lineages will be saved to\n`ncbi_lineages_[date_of_utcnow].csv.gz`. The output file can be overwritten with\n`--output` option.\n\n\n## FAQ\n\n**Q**: I have a large number of sequences with their corresponding accession\nnumbers from NCBI, how to get their lineages?\n\n**A**: First, you need to map accession numbers (GI is deprecated) to tax IDs\nbased on `nucl_*accession2taxid.gz` files from\nftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/. Secondly, you can trace a\nsequence's whole lineage based on its tax ID. The tax-id-to-lineage mapping is\nwhat NCBItax2lin can generate for you.\n\nIf you have any question about this project, please feel free to create a new\n[issue](https://github.com/zyxue/ncbitax2lin/issues/new).\n\n## Note on `taxdump.tar.gz.md5`\n\nIt appears that NCBI periodically regenerates `taxdump.tar.gz` and\n`taxdump.tar.gz.md5` even when its content is still the same. I am not sure how\ntheir regeneration works, but `taxdump.tar.gz.md5` will differ simply because\nof a different timestamp.\n\n## Used in\n\n* Mahmoudabadi, G., \u0026 Phillips, R. (2018). A comprehensive and quantitative exploration of thousands of viral genomes. ELife, 7. https://doi.org/10.7554/eLife.31955\n* Dombrowski, N. et al. (2020) Undinarchaeota illuminate DPANN phylogeny and the impact of gene transfer on archaeal evolution, Nature Communications. Springer US, 11(1). doi: 10.1038/s41467-020-17408-w. https://www.nature.com/articles/s41467-020-17408-w\n* Schenberger Santos, A. R. et al. (2020) NAD+ biosynthesis in bacteria is controlled by global carbon/ nitrogen levels via PII signaling, Journal of Biological Chemistry, 295(18), pp. 6165–6176. doi: 10.1074/jbc.RA120.012793. https://www.sciencedirect.com/science/article/pii/S0021925817482433\n* Villada, J. C., Duran, M. F. and Lee, P. K. H. (2020) Interplay between Position-Dependent Codon Usage Bias and Hydrogen Bonding at the 5' End of ORFeomes, mSystems, 5(4), pp. 1–18. doi: 10.1128/msystems.00613-20. https://msystems.asm.org/content/5/4/e00613-20\n* Byadgi, O. et al. (2020) Transcriptome analysis of amyloodinium ocellatum tomonts revealed basic information on the major potential virulence factors, Genes, 11(11), pp. 1–12. doi: 10.3390/genes11111252. https://www.mdpi.com/2073-4425/11/11/1252\n\n## Development\n\n### Install dependencies\n\n```\npoetry shell\npoetry install\n```\n\n### Testing\n\n```\nmake format\nmake all\n```\n\n### Publish (only for administrator)\n\n```\npoetry version [minor/major etc.]\npoetry publish --build -u __token__ --password pypi-\u003ctoken-from-pypi\u003e\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzyxue%2Fncbitax2lin","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzyxue%2Fncbitax2lin","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzyxue%2Fncbitax2lin/lists"}