https://github.com/guysmoilov/streaming-hll
Extension to Clearspring impl. of HLL++, which allows merging directly from a stream
https://github.com/guysmoilov/streaming-hll
hyperloglog
Last synced: 7 months ago
JSON representation
Extension to Clearspring impl. of HLL++, which allows merging directly from a stream
- Host: GitHub
- URL: https://github.com/guysmoilov/streaming-hll
- Owner: guysmoilov
- License: apache-2.0
- Created: 2017-04-23T22:29:39.000Z (about 9 years ago)
- Default Branch: master
- Last Pushed: 2017-06-13T12:08:58.000Z (almost 9 years ago)
- Last Synced: 2025-02-09T08:28:56.617Z (over 1 year ago)
- Topics: hyperloglog
- Language: Java
- Size: 67.4 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# streaming-hll
Extension to Clearspring impl. of HLL++, which allows merging directly from a stream.
Their implementation can be found here: https://github.com/addthis/stream-lib
I would like to thank the original creators, their implementation is amazing, elegant, and written as simply as it can be.
The only extension needed here was to better handle bulk merging of HLL++ instances, as efficiently as possible.
## Example usage
StreamingHyperLogLogPlus target = new StreamingHyperLogLogPlus(14);
HyperLogLogPlus source = new HyperLogLogPlus(14, 16);
// offer some items to source....
target.add(new ByteArrayInputStream( soure.getBytes() ));
assert target.cardinality() == source.cardinality();
## Why the old dependency version?
The dependency on clearspring is set to 2.5.2 because that's the version Cassandra is using, at least in version 2.2.6
For my own practical reasons this is critical to my work.
However, there should be no reason this wouldn't work with newer versions of HLL++,
though at time of writing I haven't tested this yet.