https://github.com/wesdoyle/hyperloglog
Exploring HyperLogLog
https://github.com/wesdoyle/hyperloglog
hll-algorithm hyperloglog
Last synced: 11 months ago
JSON representation
Exploring HyperLogLog
- Host: GitHub
- URL: https://github.com/wesdoyle/hyperloglog
- Owner: wesdoyle
- Created: 2024-08-17T17:17:42.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-18T01:02:34.000Z (over 1 year ago)
- Last Synced: 2025-01-28T01:17:59.521Z (about 1 year ago)
- Topics: hll-algorithm, hyperloglog
- Language: Rust
- Homepage:
- Size: 25.2 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# HyperLogLog exploration
This repo explores implementations of the [HyperLogLog](https://www.wikipedia.org/wiki/HyperLogLog) algorithm for estimating the cardinality of a dataset.
These are basic implementations for learning purposes.
## Example using text file on disk
```sh
> python3 ./py-version/hyperloglog.py ./data/complete_works_of_shakespeare.txt 16
Comparison of HyperLogLog vs Exact Counting
----------------------------------------------------------
Method Count Error (%)
----------------------------------------------------------
HyperLogLog 25838 0.37
Exact 25934 N/A
----------------------------------------------------------
```
## Example using sqlite database on disk
This example uses the [Wikibooks dataset](https://www.kaggle.com/datasets/dhruvildave/wikibooks-dataset) (license is [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/))
### Python Version
```sh
> python3 ./py-version/hll_sqlite.py ./data/wikibooks.sqlite en body_text 20
```
### Rust Version
```sh
cargo run --release ../../data/wikibooks.sqlite en body_text 22
Comparison of HyperLogLog vs Exact Counting (SQLite)
----------------------------------------------------------
Method Count Time (s) Error (%)
----------------------------------------------------------
HyperLogLog 1361262 19.68 0.01
Exact 1361142 12.87 N/A
----------------------------------------------------------
```