{"id":18546436,"url":"https://github.com/adroll/python-hll","last_synced_at":"2025-04-09T20:31:00.904Z","repository":{"id":37612220,"uuid":"207899601","full_name":"AdRoll/python-hll","owner":"AdRoll","description":"python-hll","archived":false,"fork":false,"pushed_at":"2022-12-26T20:47:53.000Z","size":2075,"stargazers_count":18,"open_issues_count":4,"forks_count":6,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-08T10:22:43.015Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AdRoll.png","metadata":{"files":{"readme":"README.rst","changelog":"HISTORY.rst","contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-11T20:19:42.000Z","updated_at":"2022-10-15T13:36:47.000Z","dependencies_parsed_at":"2023-01-31T01:31:12.034Z","dependency_job_id":null,"html_url":"https://github.com/AdRoll/python-hll","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdRoll%2Fpython-hll","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdRoll%2Fpython-hll/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdRoll%2Fpython-hll/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AdRoll%2Fpython-hll/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AdRoll","download_url":"https://codeload.github.com/AdRoll/python-hll/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248107189,"owners_count":21048876,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-06T20:24:58.777Z","updated_at":"2025-04-09T20:30:57.554Z","avatar_url":"https://github.com/AdRoll.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"==========\npython-hll\n==========\n\n\n.. image:: https://img.shields.io/pypi/v/python_hll.svg\n        :target: https://pypi.python.org/pypi/python_hll\n\n.. image:: https://readthedocs.org/projects/python-hll/badge/?version=latest\n        :target: https://python-hll.readthedocs.io/en/latest/?badge=latest\n        :alt: Documentation Status\n\n.. image:: https://img.shields.io/badge/github-python--hll-yellow\n        :target: https://github.com/AdRoll/python-hll\n\nA Python implementation of `HyperLogLog \u003chttp://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf\u003e`_\nwhose goal is to be `storage compatible \u003chttps://github.com/aggregateknowledge/hll-storage-spec\u003e`_\nwith `java-hll \u003chttps://github.com/aggregateknowledge/java-hll\u003e`_, `js-hll \u003chttps://github.com/aggregateknowledge/js-hll\u003e`_\nand `postgresql-hll \u003chttps://github.com/citusdata/postgresql-hll\u003e`_.\n\n**NOTE:** This is a fairly literal translation/port of `java-hll \u003chttps://github.com/aggregateknowledge/java-hll\u003e`_\nto Python. Internally, bytes are represented as Java-style bytes (-128 to 127) rather than Python-style bytes (0 to 255).\nAlso this implementation is quite slow: for example, in Java ``HLLSerializationTest`` takes 12 seconds to run\nwhile in Python ``test_hll_serialization`` takes 1.5 hours to run (about 400x slower).\n\n* Runs on: Python 2.7 and 3\n* Free software: MIT license\n* Documentation: https://python-hll.readthedocs.io\n* GitHub: https://github.com/AdRoll/python-hll\n\nOverview\n---------------\nSee `java-hll \u003chttps://github.com/aggregateknowledge/java-hll\u003e`_ for an overview of what HLLs are and how they work.\n\nUsage\n---------------\n\nHashing and adding a value to a new HLL::\n\n    from python_hll.hll import HLL\n    import mmh3\n    value_to_hash = 'foo'\n    hashed_value = mmh3.hash(value_to_hash)\n\n    hll = HLL(13, 5) # log2m=13, regwidth=5\n    hll.add_raw(hashed_value)\n\nRetrieving the cardinality of an HLL::\n\n    cardinality = hll.cardinality()\n\nUnioning two HLLs together (and retrieving the resulting cardinality)::\n\n    hll1 = HLL(13, 5) # log2m=13, regwidth=5\n    hll2 = HLL(13, 5) # log2m=13, regwidth=5\n\n    # ... (add values to both sets) ...\n\n    hll1.union(hll2) # modifies hll1 to contain the union\n    cardinalityUnion = hll1.cardinality()\n\nReading an HLL from a hex representation of\n`storage specification, v1.0.0 \u003chttps://github.com/aggregateknowledge/hll-storage-spec/blob/v1.0.0/STORAGE.md\u003e`_\n(for example, retrieved from a `PostgreSQL database \u003chttps://github.com/aggregateknowledge/postgresql-hll\u003e`_)::\n\n    from python_hll.util import NumberUtil\n    input = '\\\\x128D7FFFFFFFFFF6A5C420'\n    hex_string = input[2:]\n    hll = HLL.from_bytes(NumberUtil.from_hex(hex_string, 0, len(hex_string)))\n\nWriting an HLL to its hex representation of\n`storage specification, v1.0.0 \u003chttps://github.com/aggregateknowledge/hll-storage-spec/blob/v1.0.0/STORAGE.md\u003e`_\n(for example, to be inserted into a `PostgreSQL database \u003chttps://github.com/aggregateknowledge/postgresql-hll\u003e`_)::\n\n    bytes = hll.to_bytes()\n    output = \"\\\\x\" + NumberUtil.to_hex(bytes, 0, len(bytes))\n\nAlso see the `API documentation \u003chttps://python-hll.readthedocs.io/en/latest/docs/python_hll.html\u003e`_.\n\nDevelopment\n---------------\nSee `Contributing \u003chttps://python-hll.readthedocs.io/en/latest/contributing.html\u003e`_ for how to get started building, testing, and deploying the code.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadroll%2Fpython-hll","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fadroll%2Fpython-hll","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fadroll%2Fpython-hll/lists"}