{"id":13411623,"url":"https://github.com/axiomhq/hyperloglog","last_synced_at":"2025-12-15T00:40:37.758Z","repository":{"id":43820008,"uuid":"94682429","full_name":"axiomhq/hyperloglog","owner":"axiomhq","description":"HyperLogLog with lots of sugar (Sparse, LogLog-Beta bias correction and TailCut space reduction) brought to you by Axiom","archived":false,"fork":false,"pushed_at":"2025-03-13T11:32:17.000Z","size":270,"stargazers_count":976,"open_issues_count":5,"forks_count":74,"subscribers_count":21,"default_branch":"main","last_synced_at":"2025-05-06T18:43:20.099Z","etag":null,"topics":["axiom","data-structures","go","golang","hyperloglog"],"latest_commit_sha":null,"homepage":"https://axiom.co","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/axiomhq.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"Contributing.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2017-06-18T11:18:12.000Z","updated_at":"2025-05-04T15:25:39.000Z","dependencies_parsed_at":"2024-03-19T11:28:16.044Z","dependency_job_id":"f4d4ba3a-5838-453d-abda-1251fcf92340","html_url":"https://github.com/axiomhq/hyperloglog","commit_stats":{"total_commits":84,"total_committers":20,"mean_commits":4.2,"dds":0.3571428571428571,"last_synced_commit":"af9851f82b2788cec351526717ceaf661b1d796a"},"previous_names":[],"tags_count":7,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axiomhq%2Fhyperloglog","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axiomhq%2Fhyperloglog/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axiomhq%2Fhyperloglog/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/axiomhq%2Fhyperloglog/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/axiomhq","download_url":"https://codeload.github.com/axiomhq/hyperloglog/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254010791,"owners_count":21998993,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["axiom","data-structures","go","golang","hyperloglog"],"created_at":"2024-07-30T20:01:15.098Z","updated_at":"2025-12-15T00:40:37.723Z","avatar_url":"https://github.com/axiomhq.png","language":"Go","funding_links":[],"categories":["Data Structures and Algorithms","Go","数据结构与算法","Data Structures","数据结构","数据结构`go语言实现的数据结构与算法`","\u003cspan id=\"数据结构-data-structures\"\u003e数据结构 Data Structures\u003c/span\u003e","數據結構","Generators","Data Integration Frameworks","Uncategorized"],"sub_categories":["Miscellaneous Data Structures and Algorithms","杂项数据结构和算法","Advanced Console UIs","标准 CLI","高级控制台界面","\u003cspan id=\"高级控制台用户界面-advanced-console-uis\"\u003e高级控制台用户界面 Advanced Console UIs\u003c/span\u003e","高級控制台界面"],"readme":"# HyperLogLog - an algorithm for approximating the number of distinct elements\n\n[![GoDoc](https://godoc.org/github.com/axiomhq/hyperloglog?status.svg)](https://godoc.org/github.com/axiomhq/hyperloglog) [![Go Report Card](https://goreportcard.com/badge/github.com/axiomhq/hyperloglog)](https://goreportcard.com/report/github.com/axiomhq/hyperloglog) [![CircleCI](https://circleci.com/gh/axiomhq/hyperloglog/tree/master.svg?style=svg)](https://circleci.com/gh/axiomhq/hyperloglog/tree/master)\n\nAn improved version of [HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) for the count-distinct problem, approximating the number of distinct elements in a multiset. This implementation offers enhanced performance, flexibility, and simplicity while maintaining accuracy.\n\n## Note on Implementation History\n\nThe initial version of this work (tagged as v0.1.0) was based on [\"Better with fewer bits: Improving the performance of cardinality estimation of large data streams - Qingjun Xiao, You Zhou, Shigang Chen\"](http://cse.seu.edu.cn/PersonalPage/csqjxiao/csqjxiao_files/papers/INFOCOM17.pdf). However, the current implementation has evolved significantly from this original basis, notably moving away from the tailcut method.\n\n## Current Implementation\n\nThe current implementation is based on the LogLog-Beta algorithm, as described in:\n\n[\"LogLog-Beta and More: A New Algorithm for Cardinality Estimation Based on LogLog Counting\"](https://arxiv.org/pdf/1612.02284) by Jason Qin, Denys Kim, and Yumei Tung (2016).\n\nKey features of the current implementation:\n* **Metro hash** used instead of xxhash\n* **Sparse representation** for lower cardinalities (like HyperLogLog++)\n* **LogLog-Beta** for dynamic bias correction across all cardinalities\n* **8-bit registers** for convenience and simplified implementation\n* **Order-independent insertions and merging** for consistent results regardless of data input order\n* **Removal of tailcut method** for a more straightforward approach\n* **Flexible precision** allowing for 2^4 to 2^18 registers\n\nThis implementation is now more straightforward, efficient, and flexible, while remaining backwards compatible with previous versions. It provides a balance between precision, memory usage, speed, and ease of use.\n\n## Precision and Memory Usage\n\nThis implementation allows for creating HyperLogLog sketches with arbitrary precision between 2^4 and 2^18 registers. The memory usage scales with the number of registers:\n\n* Minimum (2^4 registers): 16 bytes\n* Default (2^14 registers): 16 KB\n* Maximum (2^18 registers): 256 KB\n\nUsers can choose the precision that best fits their use case, balancing memory usage against estimation accuracy.\n\n## Note\nA big thank you to Prof. Shigang Chen and his team at the University of Florida who are actively conducting research around \"Big Network Data\".\n\n## Contributing\n\nKindly check our [contributing guide](https://github.com/axiomhq/hyperloglog/blob/main/Contributing.md) on how to propose bugfixes and improvements, and submitting pull requests to the project\n\n## License\n\n\u0026copy; Axiom, Inc., 2024\n\nDistributed under MIT License (`The MIT License`).\n\nSee [LICENSE](LICENSE) for more information.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faxiomhq%2Fhyperloglog","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faxiomhq%2Fhyperloglog","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faxiomhq%2Fhyperloglog/lists"}