{"id":24560501,"url":"https://github.com/ctb/2025-ncbi-rest-api","last_synced_at":"2025-04-19T14:44:17.444Z","repository":{"id":272072092,"uuid":"915441336","full_name":"ctb/2025-ncbi-rest-api","owner":"ctb","description":"Grabbing genome accessions \u0026 download info based on taxonomy, using the NCBI REST API, w00t","archived":false,"fork":false,"pushed_at":"2025-02-23T01:40:34.000Z","size":985,"stargazers_count":12,"open_issues_count":3,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-29T08:51:07.059Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ctb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-11T21:13:37.000Z","updated_at":"2025-03-03T10:51:54.000Z","dependencies_parsed_at":"2025-01-11T22:23:04.186Z","dependency_job_id":"940d5ad6-c892-4ffa-ab45-b9e0b99f6f99","html_url":"https://github.com/ctb/2025-ncbi-rest-api","commit_stats":null,"previous_names":["ctb/2025-ncbi-rest-api"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctb%2F2025-ncbi-rest-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctb%2F2025-ncbi-rest-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctb%2F2025-ncbi-rest-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ctb%2F2025-ncbi-rest-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ctb","download_url":"https://codeload.github.com/ctb/2025-ncbi-rest-api/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249716842,"owners_count":21315068,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-23T07:17:01.908Z","updated_at":"2025-04-19T14:44:17.439Z","avatar_url":"https://github.com/ctb.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 2025-ncbi-rest-api - examples of using the NCBI Datasets API in Python\n\nThis repo contains demo and example code to use the\n[NCBI Datasets REST API](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/api/rest-api/)\nto grab accessions of all (reference) genomes under a certain\ntaxonomic node, and save/retrieve/manipulate the resulting information\nfor fun and profit.\n\nThe Snakefile provides a few different examples, including the use of\nthe\n[sourmash directsketch plugin](https://github.com/sourmash-bio/sourmash_plugin_directsketch)\nto download all of the genomes in bulk.\n\nSpecifically, this repo contains code to:\n* Download a dataset zip for one or more accessions;\n* Retrieve genome accessions for all eukaryotic genomes.\n* Create \"subtracted\" lists for polyphyletic taxonomic nodes such as\n  invertebrates, non-bilateria, and \"other\" eukaryotes.\n* Download 10 fungal genome sequences.\n* Retrieve NCBI lineage information for a given taxid using pytaxonkit.\n\nand maybe more.\n\n## Running this code\n\nTo run, set your NCBI API key like so:\n\n```\nexport NCBI_API_KEY=foobarbaz\n```\n\nCreate a conda environment or otherwise install the things in\n`environment.yml`:\n\n```\nconda env create -n ncbi-rest-api -f environment.yml\nconda activate ncbi-rest-api\n```\n\nThen:\n\n```\nsnakemake -p\n```\n\nto do some basic things.\n\n## Appendix: getting an API key\n\nFollow [these instructions](https://www.ncbi.nlm.nih.gov/datasets/docs/v2/api/api-keys/).\n\n## Related repos\n\n* https://github.com/sourmash-bio/2025-sourmash-eukaryotic-databases:\n  Build eukaryotic databases for sourmash.\n* https://github.com/sourmash-bio/2025-sourmash-ncbi-viral-databases:\n  Build viral databases for sourmash.\n\n## Support\n\nI can't guarantee support for this code, of course, but odds are good\nthat if you find a bug or need a fix it'll be useful to me and\nothers. Please\n[file an issue](https://github.com/ctb/2025-ncbi-rest-api/issues) with\nany questions or comments! And feel free to say hi over on bluesky.\n\nC. Titus Brown, 1/26/2025\n\n[me on Bluesky](https://bsky.app/profile/titus.idyll.org)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fctb%2F2025-ncbi-rest-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fctb%2F2025-ncbi-rest-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fctb%2F2025-ncbi-rest-api/lists"}