{"id":13741704,"url":"https://github.com/judofyr/zini","last_synced_at":"2025-10-28T15:07:49.654Z","repository":{"id":54341373,"uuid":"521350027","full_name":"judofyr/zini","owner":"judofyr","description":"Succinct data structures for Zig","archived":false,"fork":false,"pushed_at":"2025-03-14T06:58:54.000Z","size":182,"stargazers_count":64,"open_issues_count":2,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-03-14T07:33:24.247Z","etag":null,"topics":["zig","zig-package"],"latest_commit_sha":null,"homepage":"","language":"Zig","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"0bsd","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/judofyr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-04T17:05:58.000Z","updated_at":"2025-03-14T06:58:58.000Z","dependencies_parsed_at":"2024-01-25T05:09:19.496Z","dependency_job_id":"29b7ff11-ffe5-4fad-82d0-149ea129e1ff","html_url":"https://github.com/judofyr/zini","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/judofyr%2Fzini","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/judofyr%2Fzini/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/judofyr%2Fzini/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/judofyr%2Fzini/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/judofyr","download_url":"https://codeload.github.com/judofyr/zini/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243858151,"owners_count":20359253,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["zig","zig-package"],"created_at":"2024-08-03T04:01:01.900Z","updated_at":"2025-10-28T15:07:44.617Z","avatar_url":"https://github.com/judofyr.png","language":"Zig","funding_links":[],"categories":["Libraries"],"sub_categories":[],"readme":"# Zini\n\nZini (Zig + Mini) is a [Zig](https://ziglang.org/) library providing some succinct data structures:\n\n- `zini.pthash`, a [**minimal perfect hash function**](https://en.wikipedia.org/wiki/Perfect_hash_function) construction algorithm, using less than 4 bits per element.\n- `zini.ribbon`, a **retrieval data structure** (sometimes called a \"static function\") construction algorithm, having less than 1% overhead.\n- `zini.CompactArray` stores n-bit numbers tightly packed, leaving no bits unused.\n  If the largest value in an array is `m` then you actually only need `n = log2(m) + 1` bits per element.\n  E.g. if the largest value is 270, you will get 7x compression using CompactArray over `[]u64` as it stores each element using only 9 bits (and 64 divided by 9 is roughly 7).\n- `zini.DictArray` finds all distinct elements in the array, stores each once into a CompactArray (the dictionary), and creates a new CompactArray containing indexes into the dictionary.\n  This will give excellent compression if there's a lot of repetition in the original array.\n- `zini.EliasFano` stores increasing 64-bit numbers in a compact manner.\n- `zini.darray` provides constant-time support for the `select1(i)` operation which returns the _i_-th set bit in a `std.DynamicBitSetUnmanaged`.\n\n## Overview\n\n### PTHash, minimal perfect hash function\n\n`zini.pthash` contains an implementation of [PTHash][pthash], a [minimal perfect hash function](https://en.wikipedia.org/wiki/Perfect_hash_function) construction algorithm.\nGiven a set of `n` elements, with the only requirement being that you can hash them, it generates a hash function which maps each element to a distinct number between `0` and `n - 1`.\nThe generated hash function is extremely small, typically consuming less than **4 _bits_ per element**, regardless of the size of the input type.\nThe algorithm provides multiple parameters to tune making it possible to optimize for (small) size, (short) construction time, or (short) lookup time.\n\nTo give a practical example:\nIn ~0.6 seconds Zini was able to create a hash function for /usr/share/dict/words containing 235886 words.\nThe resulting hash function required in total 865682 bits in memory.\nThis corresponds to 108.2 kB in total or 3.67 bits per word.\nIn comparison, the original file was 2.49 MB and compressing it with `gzip -9` only gets it down to 754 kB (which you can't use directly in memory without decompressing it).\nIt should of course be noted that they don't store the equivalent data as you can't use the generated hash function to determine if a word is present or not in the list.\nThe comparison is mainly useful to get a feeling of the magnitudes.\n\n### Bumped Ribbon Retrieval, a retrieval data structure\n\n`zini.ribbon` contains an implementation of [Bumped Ribbon Retrieval][burr] (_BuRR_), a retrieval data structure.\nGiven `n` keys (with the only requirement being that you can hash them) which each have an `r`-bit value, we'll build a data structure which will return the value for all of the `n` keys.\nHowever, the keys are actually not stored (we're only using the hash) so if you ask for the value for an _unknown_ key you will get a seemingly random answer; there's no way of knowing whether the key was present in the original dataset or not.\n\nThe theoretically minimal amount of space needed to store the _values_ is `n * r` (we have `n` `r`-bit values after all).\nWe use the term \"overhead\" to refer to how much _extra_ amount of data we need.\nThe Bumped Ribbon Retrieval will often have **less than 1% overhead**.\n\n## Usage\n\nZini is intended to be used as a library, but also ships the command-line tools `zini-pthash` and `zini-ribbon`.\nAs the documentation is a bit lacking it might be useful to look through `tools/zini-{pthash,ribbon}/main.zig` to understand how it's used.\n\n```\nUSAGE\n  ./zig-out/bin/zini-pthash [build | lookup] \u003coptions\u003e\n\nCOMMAND: build\n  Builds hash function for plain text file.\n\n  -i, --input \u003cfile\u003e\n  -o, --output \u003cfile\u003e\n  -c \u003cint\u003e\n  -a, --alpha \u003cfloat\u003e\n  -s, --seed \u003cint\u003e\n\nCOMMAND: lookup\n\n  -i, --input \u003cfile\u003e\n  -k, --key \u003ckey\u003e\n  -b, --benchmark\n```\n\nAnd here's an example run of using `zini-pthash`.\n\n```\n# Build zini-pthash:\n$ zig build -Drelease-safe\n\n# Build a hash function:\n$ ./zig-out/bin/zini-pthash build -i /usr/share/dict/words -o words.pth\nReading /usr/share/dict/words...\n\nBuilding hash function...\n\nSuccessfully built hash function:\n  seed: 12323441790160983030\n  bits: 865554\n  bits/n: 3.6693741892269993\n\nWriting to words.pth\n\n# Look up an index in the hash function:\n$ ./zig-out/bin/zini-pthash lookup -i words.pth --key hello\nReading words.pth...\n\nSuccessfully loaded hash function:\n  seed: 12323441790160983030\n  bits: 865554\n  bits/n: 3.6693741892269993\n\nLooking up key=hello:\n112576\n```\n\n## Acknowledgments\n\nZini is merely an implementation of existing algorithms and techniques already described in the literature:\n\n- The [PTHash][pthash] algorithm is described by Giulio Ermanno Pibiri and Roberto Trani in arXiv:2104.10402.\n- They also implemented PTHash as a C++ library in \u003chttps://github.com/jermp/pthash\u003e under the MIT license.\n  Zini uses no code directly from that repository, but it has been an invaluable resource for understanding how to implement PTHash in practice.\n- The [BuRR][burr] data structure is described by Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders and Stefan Walzer in arXiv:2109.01892.\n\n[pthash]: https://arxiv.org/abs/2104.10402\n[burr]: https://arxiv.org/abs/2109.01892\n\n## License\n\nZini is licensed under the [0BSD license](https://spdx.org/licenses/0BSD.html).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjudofyr%2Fzini","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjudofyr%2Fzini","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjudofyr%2Fzini/lists"}