Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dnbaker/wmh
Weighted Minhash Code
https://github.com/dnbaker/wmh
Last synced: 20 days ago
JSON representation
Weighted Minhash Code
- Host: GitHub
- URL: https://github.com/dnbaker/wmh
- Owner: dnbaker
- Created: 2021-01-31T22:04:29.000Z (about 4 years ago)
- Default Branch: main
- Last Pushed: 2021-02-21T00:14:02.000Z (almost 4 years ago)
- Last Synced: 2023-03-01T16:46:27.146Z (almost 2 years ago)
- Language: C++
- Size: 35.2 KB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Contains re-implementations of data structures from https://dl.acm.org/doi/10.1145/3219819.3220089 and https://arxiv.org/abs/1911.00675, as well as Python bindings.
Installation:
```bash
git clone --recursive https://github.com/dnbaker/wmh
cd wmh/python
python3 setup.py install
```Use:
```python
import wmh
import numpy as npweights = np.random.rand(1000)
ids = np.random(100000, size=(1000,))
signature_size = 500
sigs = [wmh.hash(weights, ids, m=signature_size, stype=sketcher) for sketcher in
("bmh1", "bmh2", "pmh1", "phm1a")]
```PMinHash effectively normalizes all counts, while bmh treats the id, w pairs as weighted elements in a weighted set.