{"id":19937093,"url":"https://github.com/farhadi/cuckoo_filter","last_synced_at":"2025-04-06T06:13:19.218Z","repository":{"id":41263571,"uuid":"327733830","full_name":"farhadi/cuckoo_filter","owner":"farhadi","description":"High-performance, concurrent, and mutable Cuckoo Filter for Erlang and Elixir","archived":false,"fork":false,"pushed_at":"2024-11-19T14:37:34.000Z","size":46,"stargazers_count":46,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-30T05:09:27.954Z","etag":null,"topics":["atomics","bloom-filter","cuckoo-filter","elixir","erlang","probablistic-data-structures"],"latest_commit_sha":null,"homepage":"","language":"Erlang","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/farhadi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-01-07T21:57:27.000Z","updated_at":"2025-03-26T17:01:34.000Z","dependencies_parsed_at":"2025-01-13T16:11:42.875Z","dependency_job_id":"025bfb23-5514-4624-a20d-c4a5a5b90225","html_url":"https://github.com/farhadi/cuckoo_filter","commit_stats":{"total_commits":38,"total_committers":1,"mean_commits":38.0,"dds":0.0,"last_synced_commit":"d2d2af626a2c3b180042e5258c79f5a65d82b1d0"},"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farhadi%2Fcuckoo_filter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farhadi%2Fcuckoo_filter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farhadi%2Fcuckoo_filter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/farhadi%2Fcuckoo_filter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/farhadi","download_url":"https://codeload.github.com/farhadi/cuckoo_filter/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247441059,"owners_count":20939239,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["atomics","bloom-filter","cuckoo-filter","elixir","erlang","probablistic-data-structures"],"created_at":"2024-11-12T23:30:50.065Z","updated_at":"2025-04-06T06:13:19.195Z","avatar_url":"https://github.com/farhadi.png","language":"Erlang","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Cuckoo Filter\n\n[![CI build status](https://github.com/farhadi/cuckoo_filter/workflows/CI/badge.svg)](https://github.com/farhadi/cuckoo_filter/actions?query=workflow%3ACI)\n[![codecov](https://codecov.io/gh/farhadi/cuckoo_filter/branch/main/graph/badge.svg)](https://codecov.io/gh/farhadi/cuckoo_filter)\n[![Hex docs](http://img.shields.io/badge/hex.pm-docs-green.svg?style=flat)](https://hexdocs.pm/cuckoo_filter)\n[![Hex Version](http://img.shields.io/hexpm/v/cuckoo_filter.svg?style=flat)](https://hex.pm/packages/cuckoo_filter)\n[![License](http://img.shields.io/hexpm/l/cuckoo_filter.svg?style=flat)](https://github.com/farhadi/cuckoo_filter/blob/master/LICENSE)\n\nA high-performance, concurrent, and mutable [Cuckoo Filter](https://www.cs.cmu.edu/~dga/papers/cuckoo-conext2014.pdf)\nimplemented using [atomics](https://erlang.org/doc/man/atomics.html) for Erlang and Elixir.\n\n## Introduction\n\nA **Cuckoo Filter** is a space-efficient probabilistic data structure for approximate\nset membership queries. It enables constant-time checks to determine if an element\nis in a set, using only a few bits per element. This efficiency comes with a trade-off:\na low rate of false positives may occur, but false negatives are guaranteed not to happen.\n\nCompared to a **Bloom Filter**, a Cuckoo Filter offers a more efficient use of space and\nsupports the deletion of inserted elements. However, as the filter's load factor increases,\ninsertion operations may become slower and could fail once the filter is nearly full.\n\n## Implementation Details\n\nIn this implementation, filter data is stored in an `atomics` array, a fixed-size, mutable\narray of 64-bit integers. Using atomics enables fast, concurrent access to the filter\nfor both reading and writing operations.\n\nThe first two integers in the atomics array act as a lock to prevent race conditions\nduring relocatiion of elements.\n\nThe third integer acts as an atomic counter to track the number of elements in the filter.\n\n### Fingerprint Constraints\n\nTo ensure atomic updates for fingerprints, this implementation only supports fingerprint\nsizes of **4, 8, 16, 32, and 64 bits**—allowing multiple fingerprints to fit within a single\n64-bit atomic integer.\n\n### Insertion\n\nEach element in a Cuckoo Filter can be placed in one of two possible buckets. If an empty\nslot is available in either bucket, it is updated atomically. However, if both buckets\nare full, elements need to be relocated to make room for the new one. In such cases,\natomic updates for all entries are not feasible, and insertions cannot be concurrent.\nTo manage this, a [spin lock](https://github.com/farhadi/spinlock) (using the first two\nintegers in the atomics array) prevents race conditions during these operations.\n\nTo maintain availability for lookups during element relocation, this implementation uses\nan eviction cache. When relocating elements, the filter temporarily stores a sequence of\nevictions. Once an empty slot is identified, changes are applied in reverse order,\nensuring that elements remain accessible for lookups throughout the relocation process.\nUnlike many traditional Cuckoo Filter implementations, where inserting a new element\nwhen the filter is full may lead to the removal of a random existing element,\nthis eviction cache technique helps avoid such unintended removals.\n\nThe eviction cache also provides early detection of loops, helping prevent excessive\nevictions.\n\n### Deletion\n\nDeletion operations also utilize a lock to avoid race conditions when elements are\nbeing relocated.\n\n## Configurations\n\nYou can customize the fingerprint size, bucket size, hash function, and the maximum\nnumber of evictions when creating a new filter.\n\nBy default, this implementation uses `erlang:phash2` for hashing. If a 32bit hash is\ninsufficient, [XXH3](https://github.com/farhadi/xxh3) hash functions are applied,\nwhich require adding `xxh3` to your project dependencies manually.\n\nWhen using a custom hash function, ensure that the hash output length meets or exceeds\nthe sum of the fingerprint and bucket sizes.\n\n## Usage\n\nIn Erlang\n\n```erlang\nFilter = cuckoo_filter:new(1000),\nok = cuckoo_filter:add(Filter, 1),\ntrue = cuckoo_filter:contains(Filter, 1),\nfalse = cuckoo_filter:contains(Filter, 2),\nok = cuckoo_filter:delete(Filter, 1),\n{error, not_found} = cuckoo_filter:delete(Filter, 1).\n```\n\nIn Elixir\n\n```elixir\nfilter = :cuckoo_filter.new(1000)\n:ok = :cuckoo_filter.add(filter, 1)\ntrue = :cuckoo_filter.contains(filter, 1)\nfalse = :cuckoo_filter.contains(filter, 2)\n:ok = :cuckoo_filter.delete(filter, 1)\n{:error, :not_found} = :cuckoo_filter.delete(filter, 1)\n```\n\nFor more details, see the [Hex documentation](https://hexdocs.pm/cuckoo_filter).\n\n## License\n\nCopyright 2021, Ali Farhadi \u003ca.farhadi@gmail.com\u003e.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\n    http://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffarhadi%2Fcuckoo_filter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffarhadi%2Fcuckoo_filter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffarhadi%2Fcuckoo_filter/lists"}