{"id":22472202,"url":"https://github.com/ckampfe/prolly","last_synced_at":"2025-10-05T05:07:44.870Z","repository":{"id":57536696,"uuid":"94597540","full_name":"ckampfe/prolly","owner":"ckampfe","description":"Probabilistic data structures for Elixir","archived":false,"fork":false,"pushed_at":"2017-07-22T22:46:51.000Z","size":21,"stargazers_count":2,"open_issues_count":4,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-13T21:46:41.556Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ckampfe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-06-17T03:15:34.000Z","updated_at":"2024-02-23T19:57:51.000Z","dependencies_parsed_at":"2022-08-29T00:40:42.119Z","dependency_job_id":null,"html_url":"https://github.com/ckampfe/prolly","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/ckampfe/prolly","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ckampfe%2Fprolly","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ckampfe%2Fprolly/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ckampfe%2Fprolly/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ckampfe%2Fprolly/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ckampfe","download_url":"https://codeload.github.com/ckampfe/prolly/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ckampfe%2Fprolly/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278411261,"owners_count":25982368,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-05T02:00:06.059Z","response_time":54,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-06T12:12:37.608Z","updated_at":"2025-10-05T05:07:44.848Z","avatar_url":"https://github.com/ckampfe.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Prolly\n\nProbabilistic data structures\n\n[![Build Status](https://travis-ci.org/ckampfe/prolly.svg?branch=master)](https://travis-ci.org/ckampfe/prolly)\n\n## Installation\n\nThis package is [available in Hex](https://hex.pm/packages/prolly), and can be\ninstalled by adding `prolly` to your list of dependencies in `mix.exs`:\n\n```elixir\ndef deps do\n  [{:prolly, \"~\u003e 0.2\"}]\nend\n```\n\n## Use\n\nFor examples and use, see [the documentation](https://hexdocs.pm/prolly/api-reference.html).\n\n## Datastructures\n\n- [x] CountMinSketch\n- [x] Bloom filter\n- [x] HyperLogLog\n- [ ] K-Minimum Values\n\n## Rationale\n\nThe goals of this library are, in order:\n\n1. Correctness\n2. Readability\n3. Performance\n\nThere are probably other implementations of these data structures in Elixir or Erlang -- or C, for that matter\n-- that are more performant. That's ok.\n\nI would rather this library be more digestible and self-evidently correct than the other way around.\nThat's not to say performance doesn't matter. These kinds of datastructures are useful only insofar as they are performant,\nso this library will do its best to realize that goal while still being the most approachable of the bunch.\n\n## Benchmarks\n\nTo run the benchmarks:\n\n```\n$ mix deps.get \u0026\u0026 mix deps.compile \u0026\u0026 mix compile\n$ mix run benchmark.exs\n```\n\nBenchmarks as of `20170618`:\n\n```\nxcxk066$\u003e mix run benchmark.exs\nOperating System: macOS\nCPU Information: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz\nNumber of Available Cores: 8\nAvailable memory: 17.179869184 GB\nElixir 1.4.4\nErlang 19.3\nBenchmark suite executing with the following configuration:\nwarmup: 5.00 s\ntime: 10.00 s\nparallel: 1\ninputs: none specified\nEstimated total run time: 8.00 min\n\n\nBenchmarking bloom filter possible_member? 1000...\nBenchmarking bloom filter possible_member? 10000...\nBenchmarking bloom filter possible_member? 100000...\nBenchmarking bloom filter possible_member? 1000000...\nBenchmarking bloom filter update 1000...\nBenchmarking bloom filter update 10000...\nBenchmarking bloom filter update 100000...\nBenchmarking bloom filter update 1000000...\nBenchmarking hll phash2 m=16 count 1000...\nBenchmarking hll phash2 m=16 count 10000...\nBenchmarking hll phash2 m=16 count 100000...\nBenchmarking hll phash2 m=16 count 1000000...\nBenchmarking hll phash2 m=16 update 1000...\nBenchmarking hll phash2 m=16 update 10000...\nBenchmarking hll phash2 m=16 update 100000...\nBenchmarking hll phash2 m=16 update 1000000...\nBenchmarking hll phash2 m=64 count 1000...\nBenchmarking hll phash2 m=64 count 10000...\nBenchmarking hll phash2 m=64 count 100000...\nBenchmarking hll phash2 m=64 count 1000000...\nBenchmarking hll phash2 m=64 update 1000...\nBenchmarking hll phash2 m=64 update 10000...\nBenchmarking hll phash2 m=64 update 100000...\nBenchmarking hll phash2 m=64 update 1000000...\nBenchmarking sketch get_count 1000...\nBenchmarking sketch get_count 10000...\nBenchmarking sketch get_count 100000...\nBenchmarking sketch get_count 1000000...\nBenchmarking sketch update 1000...\nBenchmarking sketch update 10000...\nBenchmarking sketch update 100000...\nBenchmarking sketch update 1000000...\n\nName                                            ips        average  deviation         median\nbloom filter possible_member? 1000         538.47 K        1.86 μs  ±2132.66%        2.00 μs\nbloom filter possible_member? 10000        526.57 K        1.90 μs  ±2363.16%        2.00 μs\nbloom filter possible_member? 100000       519.92 K        1.92 μs  ±2115.96%        2.00 μs\nbloom filter possible_member? 1000000      515.75 K        1.94 μs  ±2688.88%        2.00 μs\nhll phash2 m=16 count 1000                 365.48 K        2.74 μs  ±1171.81%        3.00 μs\nhll phash2 m=16 count 10000                342.62 K        2.92 μs  ±1253.85%        3.00 μs\nhll phash2 m=16 count 1000000              335.43 K        2.98 μs  ±1191.45%        3.00 μs\nhll phash2 m=16 count 100000               335.38 K        2.98 μs  ±1244.69%        3.00 μs\nbloom filter update 1000                   284.11 K        3.52 μs   ±852.57%        3.00 μs\nbloom filter update 10000                  273.51 K        3.66 μs   ±814.49%        3.00 μs\nbloom filter update 100000                 266.74 K        3.75 μs   ±863.87%        3.00 μs\nbloom filter update 1000000                259.53 K        3.85 μs   ±746.74%        4.00 μs\nsketch get_count 1000                      250.27 K        4.00 μs   ±817.72%        4.00 μs\nsketch get_count 10000                     245.67 K        4.07 μs   ±716.67%        4.00 μs\nsketch get_count 100000                    234.84 K        4.26 μs   ±785.77%        4.00 μs\nsketch get_count 1000000                   226.83 K        4.41 μs   ±661.33%        4.00 μs\nsketch update 1000                         176.35 K        5.67 μs   ±392.28%        5.00 μs\nsketch update 10000                        174.11 K        5.74 μs   ±390.15%        5.00 μs\nsketch update 100000                       165.37 K        6.05 μs   ±467.76%        6.00 μs\nhll phash2 m=16 update 100000              163.23 K        6.13 μs   ±403.00%        5.00 μs\nhll phash2 m=16 update 1000000             162.67 K        6.15 μs   ±385.00%        5.00 μs\nhll phash2 m=16 update 1000                157.02 K        6.37 μs   ±405.85%        6.00 μs\nhll phash2 m=16 update 10000               156.02 K        6.41 μs   ±413.49%        6.00 μs\nsketch update 1000000                      147.72 K        6.77 μs   ±347.44%        6.00 μs\nhll phash2 m=64 update 1000                143.84 K        6.95 μs   ±304.67%        6.00 μs\nhll phash2 m=64 update 1000000             142.57 K        7.01 μs   ±308.13%        6.00 μs\nhll phash2 m=64 update 100000              142.12 K        7.04 μs   ±328.42%        6.00 μs\nhll phash2 m=64 update 10000               137.58 K        7.27 μs   ±307.34%        7.00 μs\nhll phash2 m=64 count 10000                122.83 K        8.14 μs   ±226.07%        8.00 μs\nhll phash2 m=64 count 1000                 120.94 K        8.27 μs   ±248.02%        8.00 μs\nhll phash2 m=64 count 100000               120.79 K        8.28 μs   ±261.38%        8.00 μs\nhll phash2 m=64 count 1000000              120.31 K        8.31 μs   ±222.56%        8.00 μs\n\nComparison:\nbloom filter possible_member? 1000         538.47 K\nbloom filter possible_member? 10000        526.57 K - 1.02x slower\nbloom filter possible_member? 100000       519.92 K - 1.04x slower\nbloom filter possible_member? 1000000      515.75 K - 1.04x slower\nhll phash2 m=16 count 1000                 365.48 K - 1.47x slower\nhll phash2 m=16 count 10000                342.62 K - 1.57x slower\nhll phash2 m=16 count 1000000              335.43 K - 1.61x slower\nhll phash2 m=16 count 100000               335.38 K - 1.61x slower\nbloom filter update 1000                   284.11 K - 1.90x slower\nbloom filter update 10000                  273.51 K - 1.97x slower\nbloom filter update 100000                 266.74 K - 2.02x slower\nbloom filter update 1000000                259.53 K - 2.07x slower\nsketch get_count 1000                      250.27 K - 2.15x slower\nsketch get_count 10000                     245.67 K - 2.19x slower\nsketch get_count 100000                    234.84 K - 2.29x slower\nsketch get_count 1000000                   226.83 K - 2.37x slower\nsketch update 1000                         176.35 K - 3.05x slower\nsketch update 10000                        174.11 K - 3.09x slower\nsketch update 100000                       165.37 K - 3.26x slower\nhll phash2 m=16 update 100000              163.23 K - 3.30x slower\nhll phash2 m=16 update 1000000             162.67 K - 3.31x slower\nhll phash2 m=16 update 1000                157.02 K - 3.43x slower\nhll phash2 m=16 update 10000               156.02 K - 3.45x slower\nsketch update 1000000                      147.72 K - 3.65x slower\nhll phash2 m=64 update 1000                143.84 K - 3.74x slower\nhll phash2 m=64 update 1000000             142.57 K - 3.78x slower\nhll phash2 m=64 update 100000              142.12 K - 3.79x slower\nhll phash2 m=64 update 10000               137.58 K - 3.91x slower\nhll phash2 m=64 count 10000                122.83 K - 4.38x slower\nhll phash2 m=64 count 1000                 120.94 K - 4.45x slower\nhll phash2 m=64 count 100000               120.79 K - 4.46x slower\nhll phash2 m=64 count 1000000              120.31 K - 4.48x slower\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fckampfe%2Fprolly","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fckampfe%2Fprolly","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fckampfe%2Fprolly/lists"}