{"id":22646690,"url":"https://github.com/livewareproblems/hyper","last_synced_at":"2025-04-12T02:03:20.368Z","repository":{"id":40001022,"uuid":"350382717","full_name":"LivewareProblems/hyper","owner":"LivewareProblems","description":"Erlang implementation of HyperLogLog","archived":false,"fork":false,"pushed_at":"2024-06-25T16:47:57.000Z","size":446,"stargazers_count":16,"open_issues_count":7,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2024-09-19T23:18:57.443Z","etag":null,"topics":["erlang","hacktoberfest"],"latest_commit_sha":null,"homepage":"","language":"Erlang","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/LivewareProblems.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-03-22T14:54:59.000Z","updated_at":"2024-05-01T12:43:38.000Z","dependencies_parsed_at":"2024-06-18T13:08:49.516Z","dependency_job_id":"87460894-8889-433e-8a9f-149e497e4448","html_url":"https://github.com/LivewareProblems/hyper","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LivewareProblems%2Fhyper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LivewareProblems%2Fhyper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LivewareProblems%2Fhyper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/LivewareProblems%2Fhyper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/LivewareProblems","download_url":"https://codeload.github.com/LivewareProblems/hyper/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228851036,"owners_count":17981425,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["erlang","hacktoberfest"],"created_at":"2024-12-09T07:27:17.959Z","updated_at":"2024-12-09T07:27:21.092Z","avatar_url":"https://github.com/LivewareProblems.png","language":"Erlang","funding_links":[],"categories":[],"sub_categories":[],"readme":"# HyperLogLog for Erlang\n\n![Hex.pm](https://img.shields.io/hexpm/v/hyper?style=flat-square)\n![Hex.pm](https://img.shields.io/hexpm/l/hyper?style=flat-square)\n\nThis is an implementation of the HyperLogLog algorithm in\nErlang. Using HyperLogLog you can estimate the cardinality of very\nlarge data sets using constant memory. The relative error is `1.04 *\nsqrt(2^P)`. When creating a new HyperLogLog filter, you provide the\nprecision P, allowing you to trade memory for accuracy. The union of\ntwo filters is lossless.\n\nIn practice this allows you to build efficient analytics systems. For\nexample, you can create a new filter in each mapper and feed it a\nportion of your dataset while the reducers simply union together all\nfilters they receive. The filter you end up with is exactly the same\nfilter as if you would sequentially insert all data into a single\nfilter.\n\nIn addition to the base algorithm, we have implemented the new estimator as\nbased on Mean Limit as described this great [paper by Otmar Ertl][].\nThis new estimator greatly improves the estimates for lower cardinalities while\nusing a single estimator for the whole range of cardinalities.\n\n## TODO\n\n- [x] Use rebar3\n- [x] Work on OTP 23\n- [x] Fix the estimator\n- [x] Fix `reduce_precision`\n- [x] add `reduce_precision` for array, allowing unions\n- [ ] Better document the main module\n- [x] Move documentation to ExDoc\n- [x] Delete dead code\n- [x] Rework test suite to be nice to modify\n- [ ] Rework Intersection using this [paper by Otmar Ertl][]\n- [ ] Redo benchmarks\n\n## Usage\n\n```erlang\n1\u003e hyper:insert(\u003c\u003c\"foobar\"\u003e\u003e, hyper:insert(\u003c\u003c\"quux\"\u003e\u003e, hyper:new(4))).\n{hyper,4,\n       {hyper_binary,{dense,\u003c\u003c0,0,0,0,0,0,0,0,64,0,0,0\u003e\u003e,\n                            [{8,1}],\n                            1,16}}}\n\n2\u003e hyper:card(v(-1)).\n2.136502281992361\n```\n\nThe errors introduced by estimations can be seen in this example:\n\n```erlang\n3\u003e rand:seed(exsss, {1, 2, 3}).\n{#{bits =\u003e 58,jump =\u003e #Fun\u003crand.3.47293030\u003e,\n   next =\u003e #Fun\u003crand.0.47293030\u003e,type =\u003e exsss,\n   uniform =\u003e #Fun\u003crand.1.47293030\u003e,\n   uniform_n =\u003e #Fun\u003crand.2.47293030\u003e},\n [117085240290607817|199386643319833935]}\n4\u003e Run = fun (P, Card) -\u003e hyper:card(lists:foldl(fun (_, H) -\u003e Int = rand:uniform(10000000000000), hyper:insert(\u003c\u003cInt:64/integer\u003e\u003e, H) end, hyper:new(P), lists:seq(1, Card))) end.\n#Fun\u003cerl_eval.12.80484245\u003e\n5\u003e Run(12, 10_000).\n10038.192365345985\n6\u003e Run(14, 10_000).\n9967.916262642864\n7\u003e Run(16, 10_000).\n9972.832893293473\n```\n\nA filter can be persisted and read later. The serialized struct is formatted for usage with jiffy:\n\n```erlang\n8\u003e Filter = hyper:insert(\u003c\u003c\"foo\"\u003e\u003e, hyper:new(4)).\n{hyper,4,\n       {hyper_binary,{dense,\u003c\u003c4,0,0,0,0,0,0,0,0,0,0,0\u003e\u003e,[],0,16}}}\n9\u003e Filter =:= hyper:from_json(hyper:to_json(Filter)).\ntrue\n```\n\n**As of today, we only support the binary backend. More to come**\nYou can select a different backend. See below for a description of why\nyou might want to do so. They serialize in exactly the same way, but\ncan't be mixed in memory.\n\n## Is it any good?\n\nNo idea ! I do not know anyone that uses it extensively, but it is relatively\nwell tested. As far as i can tell, it is the only FOSS implementation that does\nprecision reduction properly !\n\n## Hacking\n\n### Documentation\n\nWe use ex_doc for documentation. In order to generate the docs, you need to install it\n\n```bash\nmix escript.install hex ex_doc\nex_doc --version\n```\n\nThen generate the docs, after targetting the correct version in docs.sh\n\n```bash\ndocs.sh\n```\n\n## Backends\n\nEffort has been spent on implementing different backends in the\npursuit of finding the right performance trade-off. Fill rate refers to how many\nregisters has a value other than 0.\n\n- `hyper_binary`: Fixed memory usage (6 bits * 2^P), fastest on insert,\n  union, cardinality and serialization. Best default choice.\n\nYou can also implement your own backend. In `test` theres a\nbunch of tests run for all backends, including some PropEr tests. The\ntest suite will ensure your backend gives correct estimates and\ncorrectly encodes/decodes the serialized filters.\n\n\n## Fork\n\nThis is a fork of the original Hyper library by GameAnalytics. It was not\nmaintained anymore.\n\nThe main difference are a move to the `rand` module for tests and to `rebar3`\nas a build tool, in order to support OTP 23+.\n\nThe `carray` backend was dropped, as it was never moved outside of experimental\nstatus and could not be serialised for a distributed use. Some backends using\nNIF may come back in the future.\n\nThe bisect implementation was dropped too. Its use case was limited and it\nforced a dependency on a library that was not maintained either.\n\nThe gb backend was dropped for the time being too.\n\nThe Array backend was dropped for the time being too.\n\nThe estimator was rebuilt following this [paper by Otmar Ertl][], as it was\nbroken for any precision not 14. This should also provide better estimation\nacross the board for cardinality.\n\nThe `reduce_precision` function has been rebuilt properly, as it was quite\nsimply wrong. This fixed a lot of bugs for unions.\n\n[paper by Otmar Ertl]: https://arxiv.org/abs/1706.07290\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivewareproblems%2Fhyper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flivewareproblems%2Fhyper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flivewareproblems%2Fhyper/lists"}