Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/edawson/mkmh
Generate kmers/minimizers/hashes/MinHash signatures, including with multiple kmer sizes.
https://github.com/edawson/mkmh
kmer locality-sensitive-hashing minhash
Last synced: about 2 months ago
JSON representation
Generate kmers/minimizers/hashes/MinHash signatures, including with multiple kmer sizes.
- Host: GitHub
- URL: https://github.com/edawson/mkmh
- Owner: edawson
- License: mit
- Created: 2016-06-16T14:32:52.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2021-01-09T01:42:34.000Z (almost 4 years ago)
- Last Synced: 2023-04-04T14:14:19.668Z (almost 2 years ago)
- Topics: kmer, locality-sensitive-hashing, minhash
- Language: C++
- Homepage:
- Size: 204 KB
- Stars: 23
- Watchers: 2
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# mkmh
Make kmers, minimizers, hashes, and MinHash sketches (with multiple k), and compare them.![C/C++ CI for mkmh](https://github.com/edawson/mkmh/workflows/C/C++%20CI/badge.svg)
## Usage
To use mkmh functions in your code:
1. Include the header file in your code
```#include "mkmh.hpp"```
2. Compile the library:
`` cd mkmh && make lib``
3. Make sure the lib and header are on the LD include/lib paths (e.g. in your makefile):
`` gcc -o my_code my_code.cpp -L/path/to/mkmh -I/path/to/mkmh -lmkmh
4. That's it!## Available functionality
Convenience functions:
- Reverse complement a string
- Reverse a string
- Capitalize the characters of a string
- Check if a string contains only canonical DNA letters ("A", "a", "C", "c", "T", "t", "G", "g")Substrings and transforms:
- Get the forward shingles of a string
- Get the kmers size *k* of a string
- For multiple *k*, Get the kmers of a string for all *k*
- Get the (*w*, *k*) minimizers of a string
- Calculate the 64-bit hashes of the kmers of a string (with either single or multiple *k* values)
- Get the MinHash sketch of a string (from either single or multiple *k* values), using either the top *s* hashes or the bottom *s* hashes.Compare sets of shingles / kmers / minimizers / hashes:
- Take the union of two sets of kmers or hashes.
- Take the intersection of two sets of kmers or hashes.Fun extras:
- Given a string and a set of query strings, sort the queries in order
of percent similarity.## Getting help
Please reach out through [github](https://github.com/edawson/mkmh) by posting an issue (even if it's just feedback). Email is acceptable as a secondary medium.