https://github.com/sourmash-bio/sourmash
Quickly search, compare, and analyze genomic and metagenomic data sets.
https://github.com/sourmash-bio/sourmash
bioinformatics fracminhash hacktoberfest kmer minhash python rust scaled-minhash sketching sourmash taxonomic-classification taxonomic-profiling
Last synced: 8 months ago
JSON representation
Quickly search, compare, and analyze genomic and metagenomic data sets.
- Host: GitHub
- URL: https://github.com/sourmash-bio/sourmash
- Owner: sourmash-bio
- License: other
- Created: 2016-04-09T17:35:30.000Z (about 10 years ago)
- Default Branch: latest
- Last Pushed: 2025-07-15T11:14:31.000Z (12 months ago)
- Last Synced: 2025-07-15T11:48:26.998Z (12 months ago)
- Topics: bioinformatics, fracminhash, hacktoberfest, kmer, minhash, python, rust, scaled-minhash, sketching, sourmash, taxonomic-classification, taxonomic-profiling
- Language: Python
- Homepage: http://sourmash.readthedocs.io/en/latest/
- Size: 47.2 MB
- Stars: 512
- Watchers: 16
- Forks: 84
- Open Issues: 808
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.rst
- Citation: CITATION.cff
- Codemeta: codemeta.json
- Zenodo: .zenodo.json
Awesome Lists containing this project
- StarryDivineSky - sourmash-bio/sourmash - mer 分析多功能工具,为各种序列比较提供稳定、强大的编程和命令行 API。 (其他_生物医药 / 网络服务_其他)
README
# sourmash
Quickly search, compare, and analyze genomic and metagenomic data sets.
[](https://www.repostatus.org/#active)
[](http://sourmash.readthedocs.io/en/latest/)
[](https://gitter.im/sourmash-bio/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge)
[](https://doi.org/10.21105/joss.06830)
[](https://github.com/pyOpenSci/software-submission/issues/129)
[](https://anaconda.org/bioconda/sourmash)
[](https://anaconda.org/conda-forge/sourmash-minimal)
[](https://packages.debian.org/testing/sourmash) [](https://packages.debian.org/unstable/sourmash)



[](https://github.com/sourmash-bio/sourmash/actions/)
[](https://codecov.io/gh/sourmash-bio/sourmash)

Usage:
sourmash sketch dna *.fq.gz
sourmash compare *.sig -o distances.cmp -k 31
sourmash plot distances.cmp
sourmash 1.0 is [published on JOSS](https://doi.org/10.21105/joss.06830); please cite that paper if you use sourmash (`doi: 10.21105/joss.06830`):.
The latest major release is sourmash v4, which has several
command-line and Python incompatibilities with previous
versions. Please
[visit our migration guide](https://sourmash.readthedocs.io/en/latest/support.html#migrating-from-sourmash-v3-x-to-sourmash-4-x)
to upgrade!
----
sourmash is a k-mer analysis multitool, and we aim to provide stable, robust programmatic and command-line APIs for a variety of sequence comparisons. Some of our special sauce includes:
- `FracMinHash` sketching, which enables accurate comparisons (including ANI) between data sets of different sizes
- `sourmash gather`, a combinatorial k-mer approach for more accurate metagenomic profiling
Please see the [sourmash publications](https://sourmash.readthedocs.io/en/latest/publications.html#sourmash-fundamentals) for details.
The name is a riff off of [Mash](https://github.com/marbl/Mash),
combined with @ctb's love of whiskey.
([Sour mash](https://en.wikipedia.org/wiki/Sour_mash) is used in
making whiskey.)
Maintainers: [C. Titus Brown](mailto:titus@idyll.org) ([@ctb](http://github.com/ctb)), [Luiz C. Irber, Jr](mailto:luiz@sourmash.bio) ([@luizirber](http://github.com/luizirber)), and [N. Tessa Pierce-Ward](mailto:tessa@sourmash.bio) ([@bluegenes](http://github.com/bluegenes)).
sourmash was initially developed by the
[Lab for Data-Intensive Biology](http://ivory.idyll.org/lab/) at the
[UC Davis School of Veterinary Medicine](http://www.vetmed.ucdavis.edu),
and now includes contributions from the global research and developer
community.
## Installation
We recommend using conda-forge to install sourmash:
```
conda install -c conda-forge sourmash-minimal
```
This will install the latest stable version of sourmash 4.
You can also use pip to install sourmash:
```
pip install sourmash
```
A quickstart tutorial [is available](https://sourmash.readthedocs.io/en/latest/tutorials.html).
### Requirements
sourmash runs under Python 3.11 and later on Windows, Mac OS X, and
Linux. The base requirements are screed, cffi, numpy, matplotlib, and
scipy. Conda will install everything necessary, and is
our recommended installation method (see below).
### Installation with conda
conda-forge is a community maintained channel for the
[conda](http://conda.pydata.org/docs/intro.html) package manager.
[installing conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/),
you can install sourmash by running:
```bash
$ conda create -n sourmash_env -c conda-forge sourmash-minimal
$ conda activate sourmash_env
$ sourmash --help
```
which will install
[the latest released version](https://github.com/sourmash-bio/sourmash/releases).
## Support
For questions, please open an issue [on Github](https://github.com/sourmash-bio/sourmash/issues), or ask in our [chat](https://gitter.im/sourmash-bio/community?utm_source=share-link&utm_medium=link&utm_campaign=share-link).
## Development
Development happens on github at
[sourmash-bio/sourmash](https://github.com/sourmash-bio/sourmash).
sourmash is developed in Python and Rust, and you will need a Rust
environment to build it; see [the developer notes](doc/developer.md)
for our suggested development setup.
After installation, `sourmash` is the main command-line entry point;
run it with `python -m sourmash`, or do `pip install -e /path/to/repo` to
do a developer install in a virtual environment.
The `sourmash/` directory contains the Python library and command-line interface code.
The `src/core/` directory contains the Rust library implementing core
functionality.
Tests require py.test and can be run with `make test`.
Please see [the developer notes](doc/developer.md) for more information
on getting set up with a development environment.
CTB
Jan 2024