{"id":21848911,"url":"https://github.com/ozgrakkurt/filterz","last_synced_at":"2025-04-14T14:41:23.029Z","repository":{"id":259954704,"uuid":"879904278","full_name":"ozgrakkurt/filterz","owner":"ozgrakkurt","description":"Probabilistic filter implementations. Ribbon, bloom, xor filters.","archived":false,"fork":false,"pushed_at":"2024-12-02T22:28:08.000Z","size":81,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-28T03:32:42.196Z","etag":null,"topics":["bloom-filter","probabilistic-data-structures","ribbon","xor","zig","ziglang"],"latest_commit_sha":null,"homepage":"","language":"Zig","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ozgrakkurt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE-APACHE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-28T18:49:48.000Z","updated_at":"2024-12-02T22:28:12.000Z","dependencies_parsed_at":"2024-11-17T10:17:58.904Z","dependency_job_id":"ffd3d24f-62b2-41d6-a415-d034532cff3e","html_url":"https://github.com/ozgrakkurt/filterz","commit_stats":null,"previous_names":["ozgrakkurt/filterz"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ozgrakkurt%2Ffilterz","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ozgrakkurt%2Ffilterz/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ozgrakkurt%2Ffilterz/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ozgrakkurt%2Ffilterz/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ozgrakkurt","download_url":"https://codeload.github.com/ozgrakkurt/filterz/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248898459,"owners_count":21179781,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloom-filter","probabilistic-data-structures","ribbon","xor","zig","ziglang"],"created_at":"2024-11-28T00:09:23.881Z","updated_at":"2025-04-14T14:41:23.023Z","avatar_url":"https://github.com/ozgrakkurt.png","language":"Zig","funding_links":[],"categories":[],"sub_categories":[],"readme":"# filterz\n\nImplementations of some probabilistic filter structures. Implemented with `build once, use many times` use case in mind.\nAll filters export a `Filter` interface that can be used like this:\n\n```zig\nconst Filter = filterz.ribbon.Filter(u10);\n\nvar my_filter = try Filter.init(alloc, hashes);\ndefer my_filter.deinit();\n\nfor (hashes) |h| {\n  try std.testing.expect(filter.check(h));\n} \n```\n\nEach filter also exports a lower level API that can be used to implement more advanced use cases like:\n- Reducing bits-per-key while the program is running to meet some memory usage criteria.\n- Loading only a part of a filter from disk and using it to query.\n\nRequires Zig 0.14.0-dev release.\n\n## Filters\n\n### Split-Block-Bloom-Filter\n\nSpeed optimized version of a bloom filter.\n\nAs described in https://github.com/apache/parquet-format/blob/master/BloomFilter.md\n\n### Xor (BinaryFuse) filter\n\nAs described in https://arxiv.org/abs/2201.01174\nConstruction is a bit janky but constructed filters reach slightly higher space efficiency. This is better in cases where construction is one time and the filter is used for a much longer time.\n\n### Ribbon filter \n\nAs described in https://arxiv.org/abs/2103.02515\n\nThe implementation corresponds to the standard ribbon filter with \"smash\" as described in the paper.\n\n## Benchmarks\n\n1. Download Benchmark Data - instructions [here](bench-data/README.md)\n2. Run benchmarks with:\n```bash\nmake benchmark\n```\n\n[Example results](./bench_result_low_hit.txt)\n\nNOTE: Cost estimate stat in the benchmark output is calculated by assuming every hit generates a disk read, which is priced at 50 microseconds.\n\nRibbon filter seems to be the best option when on a memory budget. Bloom filter is the king when there is no memory budget.\n\n### TODO\n\n- Implement [frayed ribbon based filter](https://github.com/bitwiseshiftleft/compressed_map)\n- Implement [bumped ribbon filter](https://github.com/lorenzhs/BuRR)\n- Implement interleaved columnar storage for ribbon filter as described in the paper.\n- Implement mixing column sizes in ribbon filter storage to support fractional bits-per-key configurations e.g. 6.6 bits per key instead of 6 or 7.\n- Improve Xor (Binary Fuse) filter construction speed by pre-sorting the hashes before filter construction as described in the paper.\n- Improve Xor filter parameter selection, currently it is done by trial-error at construction time.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fozgrakkurt%2Ffilterz","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fozgrakkurt%2Ffilterz","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fozgrakkurt%2Ffilterz/lists"}