https://github.com/ctb/sourmash_plugin_commonhash
sourmash plugin to filter hashes/k-mers by presence across many sketches
https://github.com/ctb/sourmash_plugin_commonhash
sourmash
Last synced: 11 months ago
JSON representation
sourmash plugin to filter hashes/k-mers by presence across many sketches
- Host: GitHub
- URL: https://github.com/ctb/sourmash_plugin_commonhash
- Owner: ctb
- License: bsd-3-clause
- Created: 2023-07-30T11:11:59.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-08-27T03:39:03.000Z (almost 3 years ago)
- Last Synced: 2025-04-15T17:15:39.046Z (about 1 year ago)
- Topics: sourmash
- Language: Python
- Homepage:
- Size: 121 KB
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# sourmash_plugin_commonhash
If you have sketched many samples and you want to remove "rare" k-mers
(present in 1, or only a few samples), this plugin is for you! This
procedure helps reduce noise in Jaccard comparisons between samples.
See
[sourmash#2383](https://github.com/sourmash-bio/sourmash/issues/2383)
for an extended discussion!
Thanks to Taylor Reiter and Jessica Lumian for all their work on this!
## Installation
```
pip install sourmash_plugin_commonhash
```
## Usage
```
sourmash scripts commonhash -o commonhashes.zip
```
commonhash will output one filtered sketch for _each_ input sketch.
You can then use the various `sourmash sig` commands to union these
sketches, extract individual ones, etc.
### Example
```
sourmash scripts commonhash examples/*.sig.gz -o commonhash.zip
```
should yield:
```
...
Selecting k=31, DNA
Loaded 10587 hashes from 3 sketches in 3 files.
Of 10587 hashes, keeping 2529 that are in 2 or more samples.
Saved 3 signatures to 'commonhash.zip'
```
## Support
We suggest filing issues in [the main sourmash issue tracker](https://github.com/dib-lab/sourmash/issues) as that receives more attention!
## Dev docs
`commonhash` is developed at https://github.com/ctb/sourmash_plugin_commonhash.
### Generating a release
Bump version number in `pyproject.toml` and push.
Make a new release on github.
Then pull, and:
```
python -m build
```
followed by `twine upload dist/...`.