https://github.com/mbrg/py-hyperminhash
HyperLogLog with intersection
https://github.com/mbrg/py-hyperminhash
estimation hyperloglog minhash
Last synced: 4 months ago
JSON representation
HyperLogLog with intersection
- Host: GitHub
- URL: https://github.com/mbrg/py-hyperminhash
- Owner: mbrg
- License: mit
- Created: 2020-07-01T10:19:12.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2021-04-21T18:26:53.000Z (over 4 years ago)
- Last Synced: 2024-05-09T00:03:04.034Z (over 1 year ago)
- Topics: estimation, hyperloglog, minhash
- Language: Python
- Homepage:
- Size: 70.3 KB
- Stars: 4
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# HyperMinHash
This repository is a Python>=3.6 port of golang [hyperminhash](https://github.com/axiomhq/hyperminhash):
> Besides being a compact and pretty speedy HyperLogLog implementation for cardinality counting, this modified HyperLogLog allows **intersection** and **similarity** estimation of different HyperLogLogs.
## Install
```
pip install hyperminhash
```## Example Usage
```python
from hyperminhash import HyperMinHashsk1 = HyperMinHash()
sk2 = HyperMinHash()for i in range(10000):
sk1.add(i)print(len(sk1))
# 10001 (should be 10000)for i in range(3333, 23333):
sk2.add(i)print(len(sk2))
# 19977 (should be 20000)print(sk1.similarity(sk2))
# 0.284589082 (should be 0.2857326533)print(sk1.intersection(sk2))
# 6623 (should be 6667)sk1.merge(sk2)
print(sk1.cardinality())
# 23271 (should be 23333)
```