Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/dnbaker/wmh

Weighted Minhash Code
https://github.com/dnbaker/wmh

Last synced: 20 days ago
JSON representation

Weighted Minhash Code

Awesome Lists containing this project

README

        

Contains re-implementations of data structures from https://dl.acm.org/doi/10.1145/3219819.3220089 and https://arxiv.org/abs/1911.00675, as well as Python bindings.

Installation:
```bash
git clone --recursive https://github.com/dnbaker/wmh
cd wmh/python
python3 setup.py install
```

Use:
```python
import wmh
import numpy as np

weights = np.random.rand(1000)
ids = np.random(100000, size=(1000,))
signature_size = 500
sigs = [wmh.hash(weights, ids, m=signature_size, stype=sketcher) for sketcher in
("bmh1", "bmh2", "pmh1", "phm1a")]
```

PMinHash effectively normalizes all counts, while bmh treats the id, w pairs as weighted elements in a weighted set.