{"id":17715632,"url":"https://github.com/borgbackup/borghash","last_synced_at":"2025-08-21T12:13:52.320Z","repository":{"id":258803564,"uuid":"869026944","full_name":"borgbackup/borghash","owner":"borgbackup","description":"A memory-efficient hashtable with serialization.","archived":false,"fork":false,"pushed_at":"2025-03-23T14:33:47.000Z","size":118,"stargazers_count":3,"open_issues_count":2,"forks_count":2,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-08-09T22:37:31.693Z","etag":null,"topics":["cython","hashtable"],"latest_commit_sha":null,"homepage":"","language":"Cython","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/borgbackup.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGES.rst","contributing":null,"funding":null,"license":"LICENSE.rst","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS","dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-07T15:38:51.000Z","updated_at":"2025-03-23T14:33:47.000Z","dependencies_parsed_at":"2024-10-27T20:12:34.095Z","dependency_job_id":"29a8b637-dcae-4fe7-8460-c0ad8dfb02a3","html_url":"https://github.com/borgbackup/borghash","commit_stats":{"total_commits":51,"total_committers":1,"mean_commits":51.0,"dds":0.0,"last_synced_commit":"37c708a5496648fe5136de3a3208fc4b27e693ad"},"previous_names":["borgbackup/borghash"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/borgbackup/borghash","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borgbackup%2Fborghash","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borgbackup%2Fborghash/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borgbackup%2Fborghash/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borgbackup%2Fborghash/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/borgbackup","download_url":"https://codeload.github.com/borgbackup/borghash/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/borgbackup%2Fborghash/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271478008,"owners_count":24766424,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-21T02:00:08.990Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cython","hashtable"],"created_at":"2024-10-25T12:06:33.855Z","updated_at":"2025-08-21T12:13:52.288Z","avatar_url":"https://github.com/borgbackup.png","language":"Cython","funding_links":[],"categories":[],"sub_categories":[],"readme":"BorgHash\n=========\n\nMemory-efficient hashtable implementations as a Python library,\nimplemented in Cython.\n\nHashTable\n---------\n\n``HashTable`` is a rather low-level implementation, usually one rather wants to\nuse the ``HashTableNT`` wrapper. But read on to get the basics...\n\nKeys and Values\n~~~~~~~~~~~~~~~\n\nThe keys MUST be perfectly random ``bytes`` of arbitrary, but constant length,\nlike from a cryptographic hash (sha256, hmac-sha256, ...).\nThe implementation relies on this \"perfectly random\" property and does not\nimplement an own hash function, but just takes 32 bits from the given key.\n\nThe values are binary ``bytes`` of arbitrary, but constant length.\n\nThe length of the keys and values is defined when creating a ``HashTable``\ninstance (after that, the length must always match that defined length).\n\nImplementation details\n~~~~~~~~~~~~~~~~~~~~~~\n\nTo have little memory overhead overall, the hashtable only stores uint32_t\nindexes into separate keys and values arrays (short: kv arrays).\n\nA new key just gets appended to the keys array. The corresponding value gets\nappended to the values array. After that, the key and value do not change their\nindex as long as they exist in the hashtable and the ht and kv arrays are in\nmemory. Even when kv pairs are deleted from ``HashTable``, the kv arrays never\nshrink and the indexes of other kv pairs don't change.\n\nThis is because we want to have stable array indexes for the keys/values so the\nindexes can be used outside of ``HashTable`` as memory-efficient references.\n\nMemory allocated\n~~~~~~~~~~~~~~~~\n\nFor a hashtable load factor of 0.1 - 0.5, a kv array grow factor of 1.3 and\nN kv pairs, memory usage in bytes is approximately:\n\n- Hashtable: from ``N * 4 / 0.5`` to ``N * 4 / 0.1``\n- Keys/Values: from ``N * len(key+value) * 1.0`` to ``N * len(key+value) * 1.3``\n- Overall: from ``N * (8 + len(key+value))`` to ``N * (40 + len(key+value) * 1.3)``\n\nWhen the hashtable or the kv arrays are resized, there will be short memory\nusage spikes. For the kv arrays, ``realloc()`` is used to avoid copying of\ndata and memory usage spikes, if possible.\n\nHashTableNT\n-----------\n\n``HashTableNT`` is a convenience wrapper around ``HashTable``:\n\n- accepts and returns ``namedtuple`` values\n- implements persistence: can read (write) the hashtable from (to) a file.\n\nKeys and Values\n~~~~~~~~~~~~~~~\n\nKeys: ``bytes``, see ``HashTable``.\n\nValues: any fixed type of ``namedtuple`` that can be serialized to ``bytes``\nby Python's ``struct`` module using a given format string.\n\nWhen setting a value, it is automatically serialized. When a value is returned,\nit will be a ``namedtuple`` of the given type.\n\nPersistence\n~~~~~~~~~~~\n\n``HashTableNT`` has ``.write()`` and ``.read()`` methods to save/load its\ncontent to/from a file, using an efficient binary format.\n\nWhen a ``HashTableNT`` is saved to disk, only the non-deleted entries are\npersisted and when it is loaded from disk, a new hashtable and new, dense\nkv arrays are built - thus, kv indexes will be different!\n\nAPI\n---\n\nHashTable / HashTableNT have an API similar to a dict:\n\n- ``__setitem__`` / ``__getitem__`` / ``__delitem__`` / ``__contains__``\n- ``get()``, ``pop()``, ``setdefault()``\n- ``items()``, ``len()``\n- ``read()``, ``write()``, ``size()``\n\nExample code\n------------\n\n::\n\n    # HashTableNT mapping 256bit key [bytes] --\u003e Chunk value [namedtuple]\n    Chunk = namedtuple(\"Chunk\", [\"refcount\", \"size\"])\n    ChunkFormat = namedtuple(\"ChunkFormat\", [\"refcount\", \"size\"])\n    chunk_format = ChunkFormat(refcount=\"I\", size=\"I\")\n\n    # 256bit (32Byte) key, 2x 32bit (4Byte) values\n    ht = HashTableNT(key_size=32, value_type=Chunk, value_format=chunk_format)\n\n    key = b\"x\" * 32  # the key is usually from a cryptographic hash fn\n    value = Chunk(refcount=1, size=42)\n    ht[key] = value\n    assert ht[key] == value\n\n    for key, value in ht.items():\n        assert isinstance(key, bytes)\n        assert isinstance(value, Chunk)\n\n    file = \"dump.bin\"  # giving an fd of a file opened in binary mode also works\n    ht.write(file)\n    ht = HashTableNT.read(file)\n\nBuilding / Installing\n---------------------\n::\n\n    python setup.py build_ext --inplace\n    python -m build\n    pip install dist/borghash*.tar.gz\n\n\nWant a demo?\n------------\n\nRun ``borghash-demo`` after installing the ``borghash`` package.\n\nIt will show you the demo code, run it and print the results for your machine.\n\nResults on an Apple MacBook Pro (M3 Pro CPU) are like:\n\n::\n\n    HashTableNT in-memory ops (count=50000): insert: 0.062s, lookup: 0.066s, pop: 0.061s.\n    HashTableNT serialization (count=50000): write: 0.020s, read: 0.021s.\n\n\nState of this project\n---------------------\n\n**API is still unstable and expected to change as development goes on.**\n\n**As long as the API is unstable, there will be no data migration tools,\nlike e.g. for reading an existing serialized hashtable.**\n\nThere might be missing features or optimization potential, feedback welcome!\n\nBorg?\n-----\n\nPlease note that this code is currently **not** used by the stable release of\nBorgBackup (aka \"borg\"), but might be used by borg master branch in the future.\n\nLicense\n-------\n\nBSD license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fborgbackup%2Fborghash","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fborgbackup%2Fborghash","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fborgbackup%2Fborghash/lists"}