{"id":15715354,"url":"https://github.com/cloudflare/cardinality-estimator","last_synced_at":"2025-10-20T04:30:24.186Z","repository":{"id":237105875,"uuid":"791996282","full_name":"cloudflare/cardinality-estimator","owner":"cloudflare","description":"A crate for estimating the cardinality of distinct elements in a stream or dataset.","archived":false,"fork":false,"pushed_at":"2025-01-23T05:29:55.000Z","size":572,"stargazers_count":17,"open_issues_count":1,"forks_count":4,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-01-30T14:22:58.484Z","etag":null,"topics":["cardinality-estimation","distinct-elements","hyperloglog","probalistic-data-structures","sketches"],"latest_commit_sha":null,"homepage":"https://crates.io/crates/cardinality-estimator","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cloudflare.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-25T19:17:07.000Z","updated_at":"2025-01-23T05:29:51.000Z","dependencies_parsed_at":"2024-04-30T00:42:56.641Z","dependency_job_id":"f63b16d6-8af4-4f7a-a7ef-8f8a0f21e8e9","html_url":"https://github.com/cloudflare/cardinality-estimator","commit_stats":null,"previous_names":["cloudflare/cardinality-estimator"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudflare%2Fcardinality-estimator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudflare%2Fcardinality-estimator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudflare%2Fcardinality-estimator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cloudflare%2Fcardinality-estimator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cloudflare","download_url":"https://codeload.github.com/cloudflare/cardinality-estimator/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":237261624,"owners_count":19281275,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cardinality-estimation","distinct-elements","hyperloglog","probalistic-data-structures","sketches"],"created_at":"2024-10-03T21:41:04.736Z","updated_at":"2025-10-20T04:30:23.754Z","avatar_url":"https://github.com/cloudflare.png","language":"Rust","funding_links":[],"categories":["Rust"],"sub_categories":[],"readme":"# cardinality-estimator\n![build](https://img.shields.io/github/actions/workflow/status/cloudflare/cardinality-estimator/ci.yml?branch=main)\n[![docs.rs](https://docs.rs/cardinality-estimator/badge.svg)](https://docs.rs/cardinality-estimator)\n[![crates.io](https://img.shields.io/crates/v/cardinality-estimator.svg)](https://crates.io/crates/cardinality-estimator)\n[![License](https://img.shields.io/badge/license-Apache%202.0-blue)](LICENSE)\n\n`cardinality-estimator` is a Rust crate designed to estimate the number of distinct elements in a stream or dataset in an efficient manner.\nThis library uses HyperLogLog++ with an optimized low memory footprint and high accuracy approach, suitable for large-scale data analysis tasks.\nWe're using `cardinality-estimator` for large-scale machine learning, computing cardinality features across multiple dimensions of the request.\n\n## Overview\nOur `cardinality-estimator` is highly efficient in terms of memory usage, latency, and accuracy.\nThis is achieved by leveraging a combination of unique data structure design, efficient algorithms, and HyperLogLog++ for high cardinality ranges.\n\n## Getting Started\nTo use `cardinality-estimator`, add it to your `Cargo.toml` under `[dependencies]`:\n```toml\n[dependencies]\ncardinality-estimator = \"1.0.0\"\n```\nThen, import `cardinality-estimator` in your Rust program:\n```rust\nuse cardinality_estimator::CardinalityEstimator;\n\nlet mut estimator = CardinalityEstimator::\u003c12, 6\u003e::new();\nestimator.insert(\"test\");\nlet estimate = estimator.estimate();\n\nprintln!(\"estimate = {}\", estimate);\n```\n\nPlease refer to our [examples](examples) and [benchmarks](benches) in the repository for more complex scenarios.\n\n## Low memory footprint\nThe `cardinality-estimator` achieves low memory footprint by leveraging an efficient data storage format.\nThe data is stored in three different representations - `Small`, `Array`, and `HyperLogLog` - depending on the cardinality range.\nFor instance, for a cardinality of 0 to 2, only **8 bytes** of stack memory and 0 bytes of heap memory are used.\n\n## Low latency\nThe crate offers low latency by using auto-vectorization for slice operations via compiler hints to use SIMD instructions.\nThe number of zero registers and registers' harmonic sum are stored and updated dynamically as more data is inserted, resulting in fast estimate operations.\n\n## High accuracy\nThe cardinality-estimator achieves high accuracy by using precise counting for small cardinality ranges and HyperLogLog++ with LogLog-Beta bias correction for larger ranges.\nThis provides expected error rates as low as 0.02% for large cardinalities.\n\n## Benchmarks\n\nTo run benchmarks you first need to install `cargo-criterion` binary:\n```shell\ncargo install cargo-criterion\n```\n\nThen benchmarks with output format JSON to save results for further analysis:\n```shell\nmake bench\n```\n\nWe've benchmarked cardinality-estimator against several other crates in the ecosystem:\n* [hyperloglog](https://crates.io/crates/hyperloglog)\n* [hyperloglogplus](https://crates.io/crates/hyperloglogplus)\n* [amadeus-streaming](https://crates.io/crates/amadeus-streaming)\n* [probabilistic-collections](https://crates.io/crates/probabilistic-collections)\n\nPlease note, that [hyperloglog](https://github.com/jedisct1/rust-hyperloglog/blob/1.0.2/src/lib.rs#L33) and [probabilistic-collections](https://gitlab.com/jeffrey-xiao/probabilistic-collections-rs/-/blob/da2a331e9679e4686bdcc772c369b639b9c33dee/src/hyperloglog.rs#L103) crates have bug in calculation of precision `p` based on provided `probability`:\n* incorrect formula: `p = (1.04 / error_probability).powi(2).ln().ceil() as usize;`\n* corrected formula: `p = (1.04 / error_probability).powi(2).log2().ceil() as usize;`\n\nWe're continuously working to make `cardinality-estimator` the fastest, lightest, and most accurate tool for cardinality estimation in Rust.\n\nBenchmarks presented below are executed on Linux laptop with `13th Gen Intel(R) Core(TM) i7-13800H` processor and compiler flags set to `RUSTFLAGS=-C target-cpu=native`.\n\n### Memory usage\n![Cardinality Estimators Memory Usage](benches/memory_bytes.png)\n\nTable below compares memory usage of different cardinality estimators.\nThe number in each cell represents `stack memory bytes / heap memory bytes / heap memory blocks` at each measured cardinality.\n\nOur `cardinality-estimator` achieves the lowest stack and heap memory allocations across all different cardinalities.\n\nNote, that `hyperloglogplus` implementation has particularly high memory usage especially for cardinalities above 256.\n\n| cardinality | cardinality_estimator | amadeus_streaming | probabilistic_collections | hyperloglog    | hyperloglogplus    |\n|-------------|-----------------------|-------------------|---------------------------|----------------|--------------------|\n| 0           | **8 / 0 / 0**         | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4464 / 2 | 160 / 0 / 0        |\n| 1           | **8 / 0 / 0**         | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 36 / 1       |\n| 2           | **8 / 0 / 0**         | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 36 / 1       |\n| 4           | **8 / 16 / 1**        | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 92 / 2       |\n| 8           | **8 / 48 / 2**        | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 188 / 3      |\n| 16          | **8 / 112 / 3**       | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 364 / 4      |\n| 32          | **8 / 240 / 4**       | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 700 / 5      |\n| 64          | **8 / 496 / 5**       | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 1400 / 13    |\n| 128         | **8 / 1008 / 6**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 3261 / 23    |\n| 256         | **8 / 4092 / 7**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 10361 / 43   |\n| 512         | **8 / 4092 / 7**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 38295 / 83   |\n| 1024        | **8 / 4092 / 7**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 146816 / 163 |\n| 2048        | **8 / 4092 / 7**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 207711 / 194 |\n| 4096        | **8 / 4092 / 7**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 207711 / 194 |\n| 8192        | **8 / 4092 / 7**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 207711 / 194 |\n| 16384       | **8 / 4092 / 7**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 207711 / 194 |\n| 32768       | **8 / 4092 / 7**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 207711 / 194 |\n| 65536       | **8 / 4092 / 7**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 207711 / 194 |\n| 131072      | **8 / 4092 / 7**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 207711 / 194 |\n| 262144      | **8 / 4092 / 7**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 207711 / 194 |\n| 524288      | **8 / 4092 / 7**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 207711 / 194 |\n| 1048576     | **8 / 4092 / 7**      | 48 / 4096 / 1     | 128 / 4096 / 1            | 120 / 4096 / 1 | 160 / 207711 / 194 |\n\n### Insert performance\n![Cardinality Estimators Insert Time](benches/insert_time.png)\n\nTable below represents insert time in nanoseconds per element.\n\nOur `cardinality-estimator` demonstrates the lowest insert time for most of the cardinalities.\n\n|   cardinality | cardinality-estimator   | amadeus-streaming   |   probabilistic-collections |   hyperloglog | hyperloglogplus   |\n|---------------|-------------------------|---------------------|-----------------------------|---------------|-------------------|\n|             0 | **0.64**                | 88.12               |                       70.19 |         82.69 | 17.45             |\n|             1 | **2.42**                | 91.5                |                       80.2  |        131.86 | 60.65             |\n|             2 | **2.21**                | 44.3                |                       45.34 |         81.48 | 34.96             |\n|             4 | **6.9**                 | 25.59               |                       24.85 |         54.38 | 36.22             |\n|             8 | **7.27**                | 15.62               |                       17.92 |         43.54 | 35.55             |\n|            16 | **6.99**                | 12.15               |                       14.44 |         37.24 | 33.4              |\n|            32 | **7.9**                 | 9.6                 |                       12.78 |         34.23 | 32.49             |\n|            64 | 10.14                   | **8.97**            |                       11.86 |         32.55 | 39.04             |\n|           128 | 15.47                   | **8.52**            |                       11.49 |         31.76 | 48.37             |\n|           256 | 13.42                   | **8.01**            |                       11.24 |         31.44 | 65.58             |\n|           512 | 9.92                    | **8.1**             |                       11.11 |         31.34 | 100.25            |\n|          1024 | 8.32                    | **8.14**            |                       12.52 |         31.73 | 171.71            |\n|          2048 | **7.31**                | 7.92                |                       12.52 |         32.03 | 120.71            |\n|          4096 | **7.11**                | 8.01                |                       11.04 |         32.73 | 63.5              |\n|          8192 | 8.81                    | **8.02**            |                       10.97 |         33.08 | 37.36             |\n|         16384 | 8.08                    | **8.01**            |                       11.03 |         32.75 | 22.24             |\n|         32768 | **6.55**                | 7.96                |                       11.01 |         32.37 | 13.3              |\n|         65536 | **5.35**                | 7.96                |                       10.96 |         31.95 | 8.41              |\n|        131072 | **4.48**                | 7.9                 |                       10.97 |         31.71 | 5.71              |\n|        262144 | **3.91**                | 7.95                |                       10.95 |         31.52 | 4.26              |\n|        524288 | 3.58                    | 7.64                |                       10.95 |         31.47 | **3.47**          |\n|       1048576 | 3.35                    | 7.95                |                       10.95 |         31.47 | **3.04**          |\n\n### Estimate performance\n![Cardinality Estimators Estimate Time](benches/estimate_time.png)\n\nTable below represents estimate time in nanoseconds per call.\n\nOur `cardinality-estimator` shows the lowest estimate time for most of the cardinalities, especially smaller cardinalities up to 128.\n\nNote, that `amadeus-streaming` implementation is also quite effective at estimate operation, however it has higher memory usage as indicated by table above.\nImplementations `probabilistic-collections`, `hyperloglogplus` and `hyperloglogplus` have much higher estimate time, especially for higher cardinalities.\n\n|   cardinality | cardinality-estimator   | amadeus-streaming   |   probabilistic-collections |   hyperloglog | hyperloglogplus   |\n|---------------|-------------------------|---------------------|-----------------------------|---------------|-------------------|\n|             0 | **0.18**                | 7.9                 |                     15576.4 |        125.03 | 24.89             |\n|             1 | **0.18**                | 9.19                |                     15619.8 |        134.3  | 64.62             |\n|             2 | **0.18**                | 9.18                |                     15615.5 |        134.4  | 70.51             |\n|             4 | **0.18**                | 9.2                 |                     15642.7 |        134.01 | 89.16             |\n|             8 | **0.18**                | 9.19                |                     15611.1 |        134.41 | 132.0             |\n|            16 | **0.18**                | 9.19                |                     15621.6 |        134.39 | 211.4             |\n|            32 | **0.18**                | 9.19                |                     15637.1 |        130.58 | 357.55            |\n|            64 | **0.18**                | 9.19                |                     15626   |        130.26 | 619.95            |\n|           128 | **0.18**                | 9.18                |                     15640.8 |        130.33 | 1134.12           |\n|           256 | 11.31                   | **9.09**            |                     15668   |        133.5  | 2205.7            |\n|           512 | 11.3                    | **9.09**            |                     15652   |        129.58 | 4334.05           |\n|          1024 | 11.31                   | **9.09**            |                     15687.1 |        129.79 | 8392.59           |\n|          2048 | 11.28                   | 9.11                |                     15680.4 |        129.8  | **8.08**          |\n|          4096 | **11.29**               | 38.63               |                     15803.4 |        129.49 | 4342.07           |\n|          8192 | **11.28**               | 38.98               |                     23285   |        129.51 | 4345.7            |\n|         16384 | **11.29**               | 38.17               |                     26950.7 |        132.96 | 4341.9            |\n|         32768 | **6.02**                | 10.86               |                     31168   |       7674.3  | 4334.98           |\n|         65536 | 6.05                    | **4.1**             |                     33123.8 |      40986.4  | 4327.48           |\n|        131072 | 6.02                    | **4.1**             |                     33772.4 |      42113.7  | 4327.29           |\n|        262144 | 6.02                    | **4.11**            |                     34711.7 |      43587    | 4329.63           |\n|        524288 | 6.02                    | **4.1**             |                     36091.2 |      43582.8  | 4327.8            |\n|       1048576 | 6.02                    | **4.11**            |                     37877.1 |      45055.3  | 4327.37           |\n\n### Error rate\n![Cardinality Estimators Error Rate](benches/error_rate.png)\n\nTable below represents average absolute relative error across 100 runs of estimator on random elements at given cardinality.\n\nOur `cardinality-estimator` performs on par well with `amadeus-streaming` and `hyperloglog` estimators, but has especially smaller low error rate for cardinalities up to 128.\n\nNote, that `probabilistic-collections` implementation seems to have bug in its estimation operation for cardinalities \u003e=32768.\n\n| cardinality | cardinality_estimator | amadeus_streaming | probabilistic_collections | hyperloglog | hyperloglogplus |\n|-------------|-----------------------|-------------------|---------------------------|-------------|-----------------|\n| 0           | **0.0000**            | **0.0000**        | **0.0000**                | **0.0000**  | **0.0000**      |\n| 1           | **0.0000**            | **0.0000**        | **0.0000**                | **0.0000**  | **0.0000**      |\n| 2           | **0.0000**            | **0.0000**        | **0.0000**                | **0.0000**  | **0.0000**      |\n| 4           | **0.0000**            | **0.0000**        | **0.0000**                | **0.0000**  | **0.0000**      |\n| 8           | **0.0000**            | **0.0000**        | **0.0000**                | **0.0000**  | **0.0000**      |\n| 16          | **0.0000**            | 0.0019            | 0.0013                    | 0.0025      | **0.0000**      |\n| 32          | **0.0000**            | 0.0041            | 0.0031                    | 0.0041      | **0.0000**      |\n| 64          | **0.0000**            | 0.0066            | 0.0086                    | 0.0078      | **0.0000**      |\n| 128         | **0.0000**            | 0.0123            | 0.0116                    | 0.0140      | **0.0000**      |\n| 256         | 0.0080                | 0.0097            | 0.0094                    | 0.0084      | **0.0000**      |\n| 512         | 0.0088                | 0.0100            | 0.0087                    | 0.0090      | **0.0000**      |\n| 1024        | 0.0080                | 0.0094            | 0.0101                    | 0.0095      | **0.0000**      |\n| 2048        | 0.0092                | 0.0093            | **0.0090**                | 0.0107      | 0.0100          |\n| 4096        | **0.0099**            | 0.0108            | 0.0113                    | 0.0114      | 0.0103          |\n| 8192        | 0.0096                | **0.0095**        | 0.0131                    | 0.0126      | 0.0109          |\n| 16384       | 0.0116                | **0.0107**        | 0.0204                    | 0.0229      | 0.0117          |\n| 32768       | 0.0125                | **0.0109**        | 1.46e14                   | 0.0437      | 0.0116          |\n| 65536       | 0.0132                | 0.0133            | 2.81e14                   | 0.0143      | **0.0118**      |\n| 131072      | **0.0116**            | 0.0121            | 1.41e14                   | 0.0128      | 0.0127          |\n| 262144      | 0.0137                | 0.0144            | 7.04e13                   | 0.0122      | **0.0116**      |\n| 524288      | 0.0138                | 0.0136            | 3.52e13                   | **0.0116**  | 0.0121          |\n| 1048576     | 0.0113                | 0.0124            | 1.76e13                   | 0.0141      | **0.0110**      |\n| **mean**    | 0.0064                | 0.0078            | 3.14e13                   | 0.0101      | **0.0052**      |\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudflare%2Fcardinality-estimator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcloudflare%2Fcardinality-estimator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcloudflare%2Fcardinality-estimator/lists"}