{"id":16916913,"url":"https://github.com/althonos/pyskani","last_synced_at":"2025-08-31T23:36:58.194Z","repository":{"id":65782101,"uuid":"596749839","full_name":"althonos/pyskani","owner":"althonos","description":"PyO3 bindings and Python interface to skani, a method for fast genomic identity calculation using sparse chaining.","archived":false,"fork":false,"pushed_at":"2025-08-20T18:07:45.000Z","size":2983,"stargazers_count":27,"open_issues_count":0,"forks_count":2,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-08-20T18:51:20.705Z","etag":null,"topics":["ani","average-nucleotide-identity","bioinformatics","metagenomes","python-bindings","python-library","taxonomy"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/althonos.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"COPYING","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2023-02-02T21:15:23.000Z","updated_at":"2025-07-21T16:15:25.000Z","dependencies_parsed_at":"2024-10-27T12:15:31.968Z","dependency_job_id":"6777e099-653e-4320-89f9-deb635685a81","html_url":"https://github.com/althonos/pyskani","commit_stats":{"total_commits":74,"total_committers":1,"mean_commits":74.0,"dds":0.0,"last_synced_commit":"30668adb2eb98dcc056288d255394e892f5b6a0b"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/althonos/pyskani","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyskani","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyskani/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyskani/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyskani/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/althonos","download_url":"https://codeload.github.com/althonos/pyskani/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/althonos%2Fpyskani/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273054365,"owners_count":25037582,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-31T02:00:09.071Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ani","average-nucleotide-identity","bioinformatics","metagenomes","python-bindings","python-library","taxonomy"],"created_at":"2024-10-13T19:31:19.848Z","updated_at":"2025-08-31T23:36:58.184Z","avatar_url":"https://github.com/althonos.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🐍⛓️🧬 Pyskani [![Stars](https://img.shields.io/github/stars/althonos/pyskani.svg?style=social\u0026maxAge=3600\u0026label=Star)](https://github.com/althonos/pyskani/stargazers)\n\n*[PyO3](https://pyo3.rs/) bindings and Python interface to [skani](https://github.com/bluenote-1577/skani), a method for fast fast genomic identity calculation using sparse chaining.*\n\n[![Actions](https://img.shields.io/github/actions/workflow/status/althonos/pyskani/test.yml?branch=main\u0026logo=github\u0026style=flat-square\u0026maxAge=300)](https://github.com/althonos/pyskani/actions)\n[![Coverage](https://img.shields.io/codecov/c/github/althonos/pyskani/main?style=flat-square\u0026maxAge=3600)](https://codecov.io/gh/althonos/pyskani/)\n[![License](https://img.shields.io/badge/license-MIT-blue.svg?style=flat-square\u0026maxAge=2678400)](https://choosealicense.com/licenses/mit/)\n[![PyPI](https://img.shields.io/pypi/v/pyskani.svg?style=flat-square\u0026maxAge=3600)](https://pypi.org/project/pyskani)\n[![Bioconda](https://img.shields.io/conda/vn/bioconda/pyskani?style=flat-square\u0026maxAge=3600\u0026logo=anaconda)](https://anaconda.org/bioconda/pyskani)\n[![AUR](https://img.shields.io/aur/version/python-pyskani?logo=archlinux\u0026style=flat-square\u0026maxAge=3600)](https://aur.archlinux.org/packages/python-pyskani)\n[![Wheel](https://img.shields.io/pypi/wheel/pyskani.svg?style=flat-square\u0026maxAge=3600)](https://pypi.org/project/pyskani/#files)\n[![Python Versions](https://img.shields.io/pypi/pyversions/pyskani.svg?style=flat-square\u0026maxAge=600)](https://pypi.org/project/pyskani/#files)\n[![Python Implementations](https://img.shields.io/pypi/implementation/pyskani.svg?style=flat-square\u0026maxAge=600\u0026label=impl)](https://pypi.org/project/pyskani/#files)\n[![Source](https://img.shields.io/badge/source-GitHub-303030.svg?maxAge=2678400\u0026style=flat-square)](https://github.com/althonos/pyskani/)\n[![Mirror](https://img.shields.io/badge/mirror-EMBL-009f4d?style=flat-square\u0026maxAge=2678400)](https://git.embl.de/larralde/pyskani/)\n[![Issues](https://img.shields.io/github/issues/althonos/pyskani.svg?style=flat-square\u0026maxAge=600)](https://github.com/althonos/pyskani/issues)\n[![Docs](https://img.shields.io/readthedocs/pyskani/latest?style=flat-square\u0026maxAge=600)](https://pyskani.readthedocs.io)\n[![Changelog](https://img.shields.io/badge/keep%20a-changelog-8A0707.svg?maxAge=2678400\u0026style=flat-square)](https://github.com/althonos/pyskani/blob/master/CHANGELOG.md)\n[![Downloads](https://img.shields.io/badge/dynamic/regex?url=https%3A%2F%2Fpepy.tech%2Fprojects%2Fpyskani\u0026search=%5B0-9%5D%2B.%5B0-9%5D%2B(k%7CM)\u0026style=flat-square\u0026label=downloads\u0026color=303f9f\u0026cacheSeconds=86400)](https://pepy.tech/project/pyskani)\n[![Paper](https://img.shields.io/badge/paper-nargab%2Flqaf095-darkblue?style=flat-square\u0026maxAge=2678400)](https://academic.oup.com/nargab/article/7/3/lqaf095/8196481)\n\n## 🗺️ Overview\n\n`skani`[\\[1\\]](#ref1) is a method developed by [Jim Shaw](https://jim-shaw-bluenote.github.io/)\nand [Yun William Yu](https://github.com/yunwilliamyu) for fast and robust\nmetagenomic sequence comparison through sparse chaining. It improves on\nFastANI by being more accurate and much faster, while requiring less memory.\n\n`pyskani` is a Python module, implemented using the [PyO3](https://pyo3.rs/)\nframework, that provides bindings to `skani`. It directly links to the\n`skani` code, which has the following advantages over CLI wrappers:\n\n- **pre-built wheels**: `pyskani` is distributed on PyPI and features\n  pre-built wheels for common platforms, including x86-64 and Arm64 UNIX.\n- **single dependency**: If your software or your analysis pipeline is\n  distributed as a Python package, you can add `pyskani` as a dependency to\n  your project, and stop worrying about the `skani` binary being present on\n  the end-user machine.\n- **sans I/O**: Everything happens in memory, in Python objects you control,\n  making it easier to pass your sequences to `skani` without having to write\n  them to a temporary file.\n\n*This library is still a work-in-progress, and in an experimental stage,\nbut it should already pack enough features to be used in a standard pipeline.*\n\n\n## 🔧 Installing\n\nPyskani can be installed directly from [PyPI](https://pypi.org/project/pyskani/),\nwhich hosts some pre-built CPython wheels for x86-64 Unix platforms, as well\nas the code required to compile from source with Rust:\n```console\n$ pip install pyskani\n```\n\u003c!-- Otherwise, pyskani is also available as a [Bioconda](https://anaconda.org/bioconda/pyskani)\npackage:\n```console\n$ conda install -c bioconda pyskani\n``` --\u003e\n\nIn the event you have to compile the package from source, all the required\nRust libraries are vendored in the source distribution, and a Rust compiler\nwill be setup automatically if there is none on the host machine.\n\n## 🔖 Citation\n\nIf you found Pyskani useful, please cite [our paper](https://academic.oup.com/nargab/article/7/3/lqaf095/8196481), as well as the original [skani paper](https://www.nature.com/articles/s41592-023-02018-3).\n\nTo cite Pyskani:\n\n\u003e Martin Larralde, Georg Zeller, Laura M. Carroll. 2025. PyOrthoANI, PyFastANI, and Pyskani: a suite of Python libraries for computation of average nucleotide identity. *NAR Genomics and Bioinformatics* 7(3):lqaf095. doi:10.1093/nargab/lqaf095.\n\nTo cite skani:\n\n\u003e Jim Shaw, Yun William Yu. 2023. Fast and robust metagenomic sequence comparison through sparse chaining with skani. *Nature Methods* 20(11):1661-1665. doi:10.1038/s41592-023-02018-3.\n\n## 💡 Examples\n\n### 📝 Creating a database\n\nA database can be created either in memory or using a folder on the machine\nfilesystem to store the sketches. Independently of the storage, a database\ncan be used immediately for querying, or saved to a different location.\n\nHere is how to create a database into memory,\nusing [Biopython](https://github.com/biopython/biopython)\nto load the record:\n```python\ndatabase = pyskani.Database()\nrecord = Bio.SeqIO.read(\"vendor/skani/test_files/e.coli-EC590.fasta\", \"fasta\")\ndatabase.sketch(\"E. coli EC590\", bytes(record.seq))\n```\n\nFor draft genomes, simply pass more arguments to the `sketch` method, for\nwhich you can use the splat operator:\n```python\ndatabase = pyskani.Database()\nrecords = Bio.SeqIO.parse(\"vendor/skani/test_files/e.coli-o157.fasta\", \"fasta\")\nsequences = (bytes(record.seq) for record in records)\ndatabase.sketch(\"E. coli O157\", *sequences)\n```\n\n### 🗒️ Loading a database\n\nTo load a database, either created from `skani` or `pyskani`, you can either\nload all sketches into memory, for fast querying:\n```python\ndatabase = pyskani.Database.load(\"path/to/sketches\")\n```\n\nOr load the files lazily to save memory, at the cost of slower querying:\n```python\ndatabase = pyskani.Database.open(\"path/to/sketches\")\n```\n\n### 🔎 Querying a database\n\nOnce a database has been created or loaded, use the `Database.query` method\nto compute ANI for some query genomes:\n```python\nrecord = Bio.SeqIO.read(\"vendor/skani/test_files/e.coli-K12.fasta\", \"fasta\")\nhits = database.query(\"E. coli K12\", bytes(record.seq))\n```\n\n## 🔎 See Also\n\nComputing ANI for closed genomes? You may also be interested in\n[`pyfastani`, a Python package for computing ANI](https://github.com/althonos/pyfastani)\nusing the [FastANI method](https://www.nature.com/articles/s41467-018-07641-9)\ndeveloped by [Chirag Jain](https://github.com/cjain7) *et al.*\n\n## 💭 Feedback\n\n### ⚠️ Issue Tracker\n\nFound a bug ? Have an enhancement request ? Head over to the\n[GitHub issue tracker](https://github.com/althonos/pyskani/issues) if you need\nto report or ask something. If you are filing in on a bug, please include as\nmuch information as you can about the issue, and try to recreate the same bug\nin a simple, easily reproducible situation.\n\n### 🏗️ Contributing\n\nContributions are more than welcome! See\n[`CONTRIBUTING.md`](https://github.com/althonos/pyskani/blob/master/CONTRIBUTING.md)\nfor more details.\n\n\n## ⚖️ License\n\nThis library is provided under the [MIT License](https://choosealicense.com/licenses/mit/).\n\nThe `skani` code was written by [Jim Shaw](https://jim-shaw-bluenote.github.io/)\nand is distributed under the terms of the [MIT License](https://choosealicense.com/licenses/mit/)\nas well. See `vendor/skani/LICENSE` for more information. Source distributions\nof `pyskani` vendors additional sources under their own terms using\nthe [`cargo vendor`](https://doc.rust-lang.org/cargo/commands/cargo-vendor.html)\ncommand.\n\n*This project is in no way not affiliated, sponsored, or otherwise endorsed\nby the [original `skani` authors](https://jim-shaw-bluenote.github.io/).\nIt was developed by [Martin Larralde](https://github.com/althonos/) during his\nPhD project at the [European Molecular Biology Laboratory](https://www.embl.de/)\nin the [Zeller team](https://github.com/zellerlab).*\n\n## 📚 References\n\n- \u003ca id=\"ref1\"\u003e\\[1\\]\u003c/a\u003e Jim Shaw and Yun William Yu. 'Fast and robust metagenomic sequence comparison through sparse chaining with skani' (2023). Nature Methods. [doi:10.1038/s41592-023-02018-3](https://doi.org/10.1038/s41592-023-02018-3). [PMID:37735570](https://pubmed.ncbi.nlm.nih.gov/37735570/).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falthonos%2Fpyskani","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falthonos%2Fpyskani","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falthonos%2Fpyskani/lists"}