{"id":13411451,"url":"https://github.com/linvon/cuckoo-filter","last_synced_at":"2025-03-14T17:30:55.053Z","repository":{"id":46676004,"uuid":"340362521","full_name":"linvon/cuckoo-filter","owner":"linvon","description":"Cuckoo Filter go implement, better than Bloom Filter, configurable and space optimized  布谷鸟过滤器的Go实现，优于布隆过滤器，可以定制化过滤器参数，并进行了空间优化","archived":false,"fork":false,"pushed_at":"2023-08-16T16:55:07.000Z","size":132,"stargazers_count":289,"open_issues_count":2,"forks_count":27,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-07-31T20:45:56.528Z","etag":null,"topics":["bloom","bloom-filter","bloomfilter","configurable","cuckoo","cuckoo-filter","cuckoofilter","go"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/linvon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-02-19T12:27:43.000Z","updated_at":"2024-07-15T12:17:14.000Z","dependencies_parsed_at":"2024-06-18T13:57:41.175Z","dependency_job_id":"8d9f3bc2-da25-4ff6-a798-c09aa8873b30","html_url":"https://github.com/linvon/cuckoo-filter","commit_stats":null,"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linvon%2Fcuckoo-filter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linvon%2Fcuckoo-filter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linvon%2Fcuckoo-filter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/linvon%2Fcuckoo-filter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/linvon","download_url":"https://codeload.github.com/linvon/cuckoo-filter/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243618634,"owners_count":20320269,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloom","bloom-filter","bloomfilter","configurable","cuckoo","cuckoo-filter","cuckoofilter","go"],"created_at":"2024-07-30T20:01:13.794Z","updated_at":"2025-03-14T17:30:54.763Z","avatar_url":"https://github.com/linvon.png","language":"Go","funding_links":[],"categories":["数据结构与算法","Data Structures and Algorithms","Go","Data Structures","Uncategorized","Generators","Data Integration Frameworks"],"sub_categories":["布隆和布谷鸟过滤器","Bloom and Cuckoo Filters","Advanced Console UIs","Standard CLI"],"readme":"# cuckoo-filter\n[![Mentioned in Awesome Go](https://awesome.re/mentioned-badge.svg)](https://github.com/avelino/awesome-go)  \n\ncuckoo-filter go implement. Config by you\n\ntransplant from [efficient/cuckoofilter](https://github.com/efficient/cuckoofilter)\n\n[中文文档](./README_ZH.md)\n\nOverview\n--------\nCuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like \"if item x is in a set?\", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.\n\nCuckoo ﬁlters provide the ﬂexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter).  It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom ﬁlters, for applications that require low false positive rates (\u003c 3%).\n\nFor details about the algorithm and citations please use:\n\n[\"Cuckoo Filter: Practically Better Than Bloom\"](http://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf) in proceedings of ACM CoNEXT 2014 by Bin Fan, Dave Andersen and Michael Kaminsky\n\n## Implementation details\n\nThe paper cited above leaves several parameters to choose. \n\n2. Bucket size(b): Number of fingerprints per bucket\n3. Fingerprints size(f): Fingerprints bits size of hashtag\n\nIn other implementation:\n\n- [seiflotfy/cuckoofilter](https://github.com/seiflotfy/cuckoofilter) use b=4, f=8 bit, which correspond to a false positive rate of `r ~= 0.03`.\n- [panmari/cuckoofilter](https://github.com/panmari/cuckoofilter) use b=4, f=16 bit, which correspond to a false positive rate of `r ~= 0.0001`.\n- [irfansharif/cfilter](https://github.com/irfansharif/cfilter) can adjust b and f, but only can adjust f to 8x, which means it is in Bytes.\n\nIn this implementation, you can adjust b and f to any value you want in `TableTypeSingle` type implementation.\n\nIn addition, the Semi-sorting Buckets mentioned in paper which can save 1 bit per item is also available in `TableTypePacked` type,\nnote that b=4, only f is adjustable.\n\n##### Why custom is important?\n\nAccording to paper\n\n- Different  bucket size result in different filter loadfactor, which means occupancy rate of filter \n- Different bucket size is suitable for different target false positive rate\n- To keep a false positive rate, bigger bucket size, bigger fingerprint size\n\n Given a target false positive rate of `r` \n\n\u003e when  r \u003e 0.002, having two entries per bucket yields slightly better results than using four entries per bucket; when decreases to 0.00001 \u003c r ≤ 0.002, four entries per bucket minimizes space.\n\nwith a bucket size `b`, they suggest choosing the fingerprint size `f` using\n\n    f \u003e= log2(2b/r) bits\n\nas the same time, notice that we got loadfactor 84%, 95% or 98% when using bucket size b = 2, 4 or 8\n\n##### To know more about parameter choosing, refer to paper's section 5\n\nNote: generally b = 8 is enough, without more data support, we suggest you choosing b from 2, 4 or 8. And f is max 32 bits\n\n## Example usage:\n\n``` go\npackage main\n\nimport (\n\t\"fmt\"\n\t\"github.com/linvon/cuckoo-filter\"\n)\n\nfunc main() {\n\tcf := cuckoo.NewFilter(4, 9, 3900, cuckoo.TableTypePacked)\n\tfmt.Println(cf.Info())\n\tfmt.Println(cf.FalsePositiveRate())\n\n\ta := []byte(\"A\")\n\tcf.Add(a)\n\tfmt.Println(cf.Contain(a))\n\tfmt.Println(cf.Size())\n\n\tb := cf.Encode()\n\tncf, _ := cuckoo.Decode(b)\n\tfmt.Println(ncf.Contain(a))\n\n\tcf.Delete(a)\n\tfmt.Println(cf.Size())\n}\n```\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinvon%2Fcuckoo-filter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flinvon%2Fcuckoo-filter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flinvon%2Fcuckoo-filter/lists"}