Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/althonos/pyskani
PyO3 bindings and Python interface to skani, a method for fast genomic identity calculation using sparse chaining.
https://github.com/althonos/pyskani
ani average-nucleotide-identity bioinformatics metagenomes python-bindings python-library taxonomy
Last synced: about 2 months ago
JSON representation
PyO3 bindings and Python interface to skani, a method for fast genomic identity calculation using sparse chaining.
- Host: GitHub
- URL: https://github.com/althonos/pyskani
- Owner: althonos
- License: mit
- Created: 2023-02-02T21:15:23.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-12-06T16:00:52.000Z (3 months ago)
- Last Synced: 2024-12-28T14:32:07.599Z (about 2 months ago)
- Topics: ani, average-nucleotide-identity, bioinformatics, metagenomes, python-bindings, python-library, taxonomy
- Language: Rust
- Homepage:
- Size: 2.78 MB
- Stars: 25
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: COPYING
Awesome Lists containing this project
README
# πβοΈπ§¬ Pyskani [](https://github.com/althonos/pyskani/stargazers)
*[PyO3](https://pyo3.rs/) bindings and Python interface to [skani](https://github.com/bluenote-1577/skani), a method for fast fast genomic identity calculation using sparse chaining.*
[](https://github.com/althonos/pyskani/actions)
[](https://codecov.io/gh/althonos/pyskani/)
[](https://choosealicense.com/licenses/mit/)
[](https://pypi.org/project/pyskani)
[](https://anaconda.org/bioconda/pyskani)
[](https://aur.archlinux.org/packages/python-pyskani)
[](https://pypi.org/project/pyskani/#files)
[](https://pypi.org/project/pyskani/#files)
[](https://pypi.org/project/pyskani/#files)
[](https://github.com/althonos/pyskani/)
[](https://git.embl.de/larralde/pyskani/)
[](https://github.com/althonos/pyskani/issues)
[](https://pyskani.readthedocs.io)
[](https://github.com/althonos/pyskani/blob/master/CHANGELOG.md)
[](https://pepy.tech/project/pyskani)## πΊοΈ Overview
`skani`[\[1\]](#ref1) is a method developed by [Jim Shaw](https://jim-shaw-bluenote.github.io/)
and [Yun William Yu](https://github.com/yunwilliamyu) for fast and robust
metagenomic sequence comparison through sparse chaining. It improves on
FastANI by being more accurate and much faster, while requiring less memory.`pyskani` is a Python module, implemented using the [PyO3](https://pyo3.rs/)
framework, that provides bindings to `skani`. It directly links to the
`skani` code, which has the following advantages over CLI wrappers:- **pre-built wheels**: `pyskani` is distributed on PyPI and features
pre-built wheels for common platforms, including x86-64 and Arm64 UNIX.
- **single dependency**: If your software or your analysis pipeline is
distributed as a Python package, you can add `pyskani` as a dependency to
your project, and stop worrying about the `skani` binary being present on
the end-user machine.
- **sans I/O**: Everything happens in memory, in Python objects you control,
making it easier to pass your sequences to `skani` without having to write
them to a temporary file.*This library is still a work-in-progress, and in an experimental stage,
but it should already pack enough features to be used in a standard pipeline.*## π§ Installing
Pyskani can be installed directly from [PyPI](https://pypi.org/project/pyskani/),
which hosts some pre-built CPython wheels for x86-64 Unix platforms, as well
as the code required to compile from source with Rust:
```console
$ pip install pyskani
```In the event you have to compile the package from source, all the required
Rust libraries are vendored in the source distribution, and a Rust compiler
will be setup automatically if there is none on the host machine.## π Citation
Pyskani is scientific software, and builds on top of `skani`. Please cite [`skani`](https://github.com/bluenote-1577/skani) if you are using it in
an academic work, for instance as:> `pyskani`, a Python library binding to `skani` (Shaw & Yu, 2023).
## π‘ Examples
### π Creating a database
A database can be created either in memory or using a folder on the machine
filesystem to store the sketches. Independently of the storage, a database
can be used immediately for querying, or saved to a different location.Here is how to create a database into memory,
using [Biopython](https://github.com/biopython/biopython)
to load the record:
```python
database = pyskani.Database()
record = Bio.SeqIO.read("vendor/skani/test_files/e.coli-EC590.fasta", "fasta")
database.sketch("E. coli EC590", bytes(record.seq))
```For draft genomes, simply pass more arguments to the `sketch` method, for
which you can use the splat operator:
```python
database = pyskani.Database()
records = Bio.SeqIO.parse("vendor/skani/test_files/e.coli-o157.fasta", "fasta")
sequences = (bytes(record.seq) for record in records)
database.sketch("E. coli O157", *sequences)
```### ποΈ Loading a database
To load a database, either created from `skani` or `pyskani`, you can either
load all sketches into memory, for fast querying:
```python
database = pyskani.Database.load("path/to/sketches")
```Or load the files lazily to save memory, at the cost of slower querying:
```python
database = pyskani.Database.open("path/to/sketches")
```### π Querying a database
Once a database has been created or loaded, use the `Database.query` method
to compute ANI for some query genomes:
```python
record = Bio.SeqIO.read("vendor/skani/test_files/e.coli-K12.fasta", "fasta")
hits = database.query("E. coli K12", bytes(record.seq))
```## π See Also
Computing ANI for closed genomes? You may also be interested in
[`pyfastani`, a Python package for computing ANI](https://github.com/althonos/pyfastani)
using the [FastANI method](https://www.nature.com/articles/s41467-018-07641-9)
developed by [Chirag Jain](https://github.com/cjain7) *et al.*## π Feedback
### β οΈ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the
[GitHub issue tracker](https://github.com/althonos/pyskani/issues) if you need
to report or ask something. If you are filing in on a bug, please include as
much information as you can about the issue, and try to recreate the same bug
in a simple, easily reproducible situation.### ποΈ Contributing
Contributions are more than welcome! See
[`CONTRIBUTING.md`](https://github.com/althonos/pyskani/blob/master/CONTRIBUTING.md)
for more details.## βοΈ License
This library is provided under the [MIT License](https://choosealicense.com/licenses/mit/).
The `skani` code was written by [Jim Shaw](https://jim-shaw-bluenote.github.io/)
and is distributed under the terms of the [MIT License](https://choosealicense.com/licenses/mit/)
as well. See `vendor/skani/LICENSE` for more information. Source distributions
of `pyskani` vendors additional sources under their own terms using
the [`cargo vendor`](https://doc.rust-lang.org/cargo/commands/cargo-vendor.html)
command.*This project is in no way not affiliated, sponsored, or otherwise endorsed
by the [original `skani` authors](https://jim-shaw-bluenote.github.io/).
It was developed by [Martin Larralde](https://github.com/althonos/) during his
PhD project at the [European Molecular Biology Laboratory](https://www.embl.de/)
in the [Zeller team](https://github.com/zellerlab).*## π References
- \[1\] Jim Shaw and Yun William Yu. ast and robust metagenomic sequence comparison through sparse chaining with skani (2023). Nature Methods. [doi:10.1038/s41592-023-02018-3](https://doi.org/10.1038/s41592-023-02018-3). [PMID:37735570](https://pubmed.ncbi.nlm.nih.gov/37735570/).