{"id":19459541,"url":"https://github.com/sundy-li/simple_hll","last_synced_at":"2025-04-25T07:31:59.220Z","repository":{"id":220764855,"uuid":"752529218","full_name":"sundy-li/simple_hll","owner":"sundy-li","description":"A simple HyperLogLog implementation in rust","archived":false,"fork":false,"pushed_at":"2024-09-29T01:26:47.000Z","size":31,"stargazers_count":3,"open_issues_count":0,"forks_count":2,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-10-30T22:27:44.530Z","etag":null,"topics":["cardinality","database","hyperloglog"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/simple_hll","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sundy-li.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["sundy-li"]}},"created_at":"2024-02-04T05:22:11.000Z","updated_at":"2024-09-29T01:26:50.000Z","dependencies_parsed_at":null,"dependency_job_id":"b1b8e935-d072-4b71-b853-241619d01b14","html_url":"https://github.com/sundy-li/simple_hll","commit_stats":null,"previous_names":["sundy-li/simple_hll"],"tags_count":0,"template":false,"template_full_name":"Xuanwo/formwork","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sundy-li%2Fsimple_hll","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sundy-li%2Fsimple_hll/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sundy-li%2Fsimple_hll/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sundy-li%2Fsimple_hll/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sundy-li","download_url":"https://codeload.github.com/sundy-li/simple_hll/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223988610,"owners_count":17236926,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cardinality","database","hyperloglog"],"created_at":"2024-11-10T17:33:00.026Z","updated_at":"2024-11-10T17:33:00.131Z","avatar_url":"https://github.com/sundy-li.png","language":"Rust","funding_links":["https://github.com/sponsors/sundy-li"],"categories":[],"sub_categories":[],"readme":"# simple_hll \u0026emsp; [![Build Status]][actions] [![Latest Version]][crates.io]\n\n[Build Status]: https://img.shields.io/github/actions/workflow/status/sundy-li/simple_hll/ci.yml\n[actions]: https://github.com/sundy-li/simple_hll/actions?query=branch%3Amain\n[Latest Version]: https://img.shields.io/crates/v/simple_hll.svg\n[crates.io]: https://crates.io/crates/simple_hll\n\n`simple_hll` is a simple HyperLogLog implementation in rust. It is designed to be simple to use and less bytes to store (with Sparse HyperLogLog).\n\n## Quick Start\n\n```rust\nuse simple_hll::HyperLogLog;\n\nlet mut hll = HyperLogLog::\u003c14\u003e::new();\nhll.add_object(\"hello\");\nhll.add_object(\"world\");\nhll.add_object(\"simple_hll\");\n\nprintln!(\"cardinality: {}\", hll.count());\n```\n\n\n## Serde\n`simple_hll` supports serde and borsh with feature `serde_borsh` enabled, so you can serialize and deserialize the HyperLogLog instance.\n\n```rust\n   let val = serde_json::to_vec(hll)?;\n```\n\nNotice that in order to reduce the serialized size, we introduce a sparse intermediate struct for the HyperLogLog instance. When the non-zero registers are less than a threshold, we will use the sparse mode to serialize the HyperLogLog instance.\n\n``` rust\nenum HyperLogLogVariant\u003cconst P: usize\u003e {\n    Empty,\n    Sparse { data: Vec\u003c(u16, u8)\u003e },\n    Full(Vec\u003cu8\u003e),\n}\n```\n\n## None-Fixed type\n\nDifferent from other hyperloglog implementation, we don't use fixed type `HyperLogLog\u003cT\u003e` for the HyperLogLog instance, but we use a const generic parameter to specify the precision. The precision `P` is the number of bits to use for the register index. The number of registers is `2^P`. The precision `P` is a trade-off between the accuracy and the memory usage. The default precision is 14, which means the memory usage is about 16KB.\n\nThe reason is that in databend or other dbms, we will store the `HyperLogLog` inside the metadata. We don't want to use `HyperLogLog\u003cDatum\u003e` for simplicity and less overhead to hash the enum.\n\n## Contributing\n\nCheck out the [CONTRIBUTING.md](./CONTRIBUTING.md) guide for more details on getting started with contributing to this project.\n\n## Acknowledgements\n\nSome codes and tests are borrowed and inspired from:\n- [redis](https://github.com/redis/redis/blob/4930d19e70c391750479951022e207e19111eb55/src/hyperloglog.c)\n- [datafusion](https://github.com/apache/arrow-datafusion/blob/f203d863f5c8bc9f133f6dd9b2e34e57ac3cdddc/datafusion/physical-expr/src/aggregate/hyperloglog.rs)\n- [pdatastructs](https://github.com/crepererum/pdatastructs.rs/blob/3997ed50f6b6871c9e53c4c5e0f48f431405fc63/src/hyperloglog.rs)\n\nReference papers:\n- [New cardinality estimation algorithms for HyperLogLog sketches](https://arxiv.org/abs/1702.01284)\n\n\nThanks for the great work of the authors and contributors.\n\n#### License\n\n\u003csup\u003e\nLicensed under \u003ca href=\"./LICENSE\"\u003eApache License, Version 2.0\u003c/a\u003e.\n\u003c/sup\u003e","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsundy-li%2Fsimple_hll","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsundy-li%2Fsimple_hll","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsundy-li%2Fsimple_hll/lists"}