Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/althonos/pyskani
PyO3 bindings and Python interface to skani, a method for fast genomic identity calculation using sparse chaining.
https://github.com/althonos/pyskani
ani average-nucleotide-identity bioinformatics metagenomes python-bindings python-library taxonomy
Last synced: about 2 months ago
JSON representation
PyO3 bindings and Python interface to skani, a method for fast genomic identity calculation using sparse chaining.
- Host: GitHub
- URL: https://github.com/althonos/pyskani
- Owner: althonos
- License: mit
- Created: 2023-02-02T21:15:23.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-12-06T16:00:52.000Z (3 months ago)
- Last Synced: 2024-12-28T14:32:07.599Z (about 2 months ago)
- Topics: ani, average-nucleotide-identity, bioinformatics, metagenomes, python-bindings, python-library, taxonomy
- Language: Rust
- Homepage:
- Size: 2.78 MB
- Stars: 25
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: COPYING
Awesome Lists containing this project
README
# πβοΈπ§¬ Pyskani [data:image/s3,"s3://crabby-images/10d23/10d237d5c3d2e18b1d3d9a1922882ef464a5cda9" alt="Stars"](https://github.com/althonos/pyskani/stargazers)
*[PyO3](https://pyo3.rs/) bindings and Python interface to [skani](https://github.com/bluenote-1577/skani), a method for fast fast genomic identity calculation using sparse chaining.*
[data:image/s3,"s3://crabby-images/25ae0/25ae03bbb0376a24abc043401b9391186d3ae6c9" alt="Actions"](https://github.com/althonos/pyskani/actions)
[data:image/s3,"s3://crabby-images/1be2f/1be2fb5c916ce2cb90c065252f2dc7764ecfd5ad" alt="Coverage"](https://codecov.io/gh/althonos/pyskani/)
[data:image/s3,"s3://crabby-images/482f2/482f20eb17168b7f473e4d609afae367c9f68ddb" alt="License"](https://choosealicense.com/licenses/mit/)
[data:image/s3,"s3://crabby-images/1f88d/1f88de4bfc2f02a4395c785bcd119661ccd00bad" alt="PyPI"](https://pypi.org/project/pyskani)
[data:image/s3,"s3://crabby-images/2aa20/2aa203ebfe3691347be4feba04c2375ce0acee74" alt="Bioconda"](https://anaconda.org/bioconda/pyskani)
[data:image/s3,"s3://crabby-images/6ea8c/6ea8c424204a1163befd2cd4c32956e9aa00d8ad" alt="AUR"](https://aur.archlinux.org/packages/python-pyskani)
[data:image/s3,"s3://crabby-images/748c7/748c7bb38941deb0f81fa2f83e6e713316724898" alt="Wheel"](https://pypi.org/project/pyskani/#files)
[data:image/s3,"s3://crabby-images/35204/35204c134b1e5af1164213ae086b44ac3cf3882a" alt="Python Versions"](https://pypi.org/project/pyskani/#files)
[data:image/s3,"s3://crabby-images/a889f/a889fcfc4ffc813ab5e10a716b72b49cc98fd7c0" alt="Python Implementations"](https://pypi.org/project/pyskani/#files)
[data:image/s3,"s3://crabby-images/2dc75/2dc75b1d436159058cb3644d4aad8903b18ab307" alt="Source"](https://github.com/althonos/pyskani/)
[data:image/s3,"s3://crabby-images/a107c/a107c2d788fdf0616638573fbb4256c6163a0dd3" alt="Mirror"](https://git.embl.de/larralde/pyskani/)
[data:image/s3,"s3://crabby-images/8abf1/8abf13e0249cbee1d73ded3848aad80898404695" alt="Issues"](https://github.com/althonos/pyskani/issues)
[data:image/s3,"s3://crabby-images/80c12/80c12c79aa74636206a5454fdc63453108708730" alt="Docs"](https://pyskani.readthedocs.io)
[data:image/s3,"s3://crabby-images/1f14d/1f14d68f4c4627b8fc90e23c382050502e6ce4d6" alt="Changelog"](https://github.com/althonos/pyskani/blob/master/CHANGELOG.md)
[data:image/s3,"s3://crabby-images/fcc16/fcc16d0077a8c0acda7ccd333f953ae1047d216a" alt="Downloads"](https://pepy.tech/project/pyskani)## πΊοΈ Overview
`skani`[\[1\]](#ref1) is a method developed by [Jim Shaw](https://jim-shaw-bluenote.github.io/)
and [Yun William Yu](https://github.com/yunwilliamyu) for fast and robust
metagenomic sequence comparison through sparse chaining. It improves on
FastANI by being more accurate and much faster, while requiring less memory.`pyskani` is a Python module, implemented using the [PyO3](https://pyo3.rs/)
framework, that provides bindings to `skani`. It directly links to the
`skani` code, which has the following advantages over CLI wrappers:- **pre-built wheels**: `pyskani` is distributed on PyPI and features
pre-built wheels for common platforms, including x86-64 and Arm64 UNIX.
- **single dependency**: If your software or your analysis pipeline is
distributed as a Python package, you can add `pyskani` as a dependency to
your project, and stop worrying about the `skani` binary being present on
the end-user machine.
- **sans I/O**: Everything happens in memory, in Python objects you control,
making it easier to pass your sequences to `skani` without having to write
them to a temporary file.*This library is still a work-in-progress, and in an experimental stage,
but it should already pack enough features to be used in a standard pipeline.*## π§ Installing
Pyskani can be installed directly from [PyPI](https://pypi.org/project/pyskani/),
which hosts some pre-built CPython wheels for x86-64 Unix platforms, as well
as the code required to compile from source with Rust:
```console
$ pip install pyskani
```In the event you have to compile the package from source, all the required
Rust libraries are vendored in the source distribution, and a Rust compiler
will be setup automatically if there is none on the host machine.## π Citation
Pyskani is scientific software, and builds on top of `skani`. Please cite [`skani`](https://github.com/bluenote-1577/skani) if you are using it in
an academic work, for instance as:> `pyskani`, a Python library binding to `skani` (Shaw & Yu, 2023).
## π‘ Examples
### π Creating a database
A database can be created either in memory or using a folder on the machine
filesystem to store the sketches. Independently of the storage, a database
can be used immediately for querying, or saved to a different location.Here is how to create a database into memory,
using [Biopython](https://github.com/biopython/biopython)
to load the record:
```python
database = pyskani.Database()
record = Bio.SeqIO.read("vendor/skani/test_files/e.coli-EC590.fasta", "fasta")
database.sketch("E. coli EC590", bytes(record.seq))
```For draft genomes, simply pass more arguments to the `sketch` method, for
which you can use the splat operator:
```python
database = pyskani.Database()
records = Bio.SeqIO.parse("vendor/skani/test_files/e.coli-o157.fasta", "fasta")
sequences = (bytes(record.seq) for record in records)
database.sketch("E. coli O157", *sequences)
```### ποΈ Loading a database
To load a database, either created from `skani` or `pyskani`, you can either
load all sketches into memory, for fast querying:
```python
database = pyskani.Database.load("path/to/sketches")
```Or load the files lazily to save memory, at the cost of slower querying:
```python
database = pyskani.Database.open("path/to/sketches")
```### π Querying a database
Once a database has been created or loaded, use the `Database.query` method
to compute ANI for some query genomes:
```python
record = Bio.SeqIO.read("vendor/skani/test_files/e.coli-K12.fasta", "fasta")
hits = database.query("E. coli K12", bytes(record.seq))
```## π See Also
Computing ANI for closed genomes? You may also be interested in
[`pyfastani`, a Python package for computing ANI](https://github.com/althonos/pyfastani)
using the [FastANI method](https://www.nature.com/articles/s41467-018-07641-9)
developed by [Chirag Jain](https://github.com/cjain7) *et al.*## π Feedback
### β οΈ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the
[GitHub issue tracker](https://github.com/althonos/pyskani/issues) if you need
to report or ask something. If you are filing in on a bug, please include as
much information as you can about the issue, and try to recreate the same bug
in a simple, easily reproducible situation.### ποΈ Contributing
Contributions are more than welcome! See
[`CONTRIBUTING.md`](https://github.com/althonos/pyskani/blob/master/CONTRIBUTING.md)
for more details.## βοΈ License
This library is provided under the [MIT License](https://choosealicense.com/licenses/mit/).
The `skani` code was written by [Jim Shaw](https://jim-shaw-bluenote.github.io/)
and is distributed under the terms of the [MIT License](https://choosealicense.com/licenses/mit/)
as well. See `vendor/skani/LICENSE` for more information. Source distributions
of `pyskani` vendors additional sources under their own terms using
the [`cargo vendor`](https://doc.rust-lang.org/cargo/commands/cargo-vendor.html)
command.*This project is in no way not affiliated, sponsored, or otherwise endorsed
by the [original `skani` authors](https://jim-shaw-bluenote.github.io/).
It was developed by [Martin Larralde](https://github.com/althonos/) during his
PhD project at the [European Molecular Biology Laboratory](https://www.embl.de/)
in the [Zeller team](https://github.com/zellerlab).*## π References
- \[1\] Jim Shaw and Yun William Yu. ast and robust metagenomic sequence comparison through sparse chaining with skani (2023). Nature Methods. [doi:10.1038/s41592-023-02018-3](https://doi.org/10.1038/s41592-023-02018-3). [PMID:37735570](https://pubmed.ncbi.nlm.nih.gov/37735570/).