Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/linvon/cuckoo-filter
Cuckoo Filter go implement, better than Bloom Filter, configurable and space optimized 布谷鸟过滤器的Go实现,优于布隆过滤器,可以定制化过滤器参数,并进行了空间优化
https://github.com/linvon/cuckoo-filter
bloom bloom-filter bloomfilter configurable cuckoo cuckoo-filter cuckoofilter go
Last synced: 13 days ago
JSON representation
Cuckoo Filter go implement, better than Bloom Filter, configurable and space optimized 布谷鸟过滤器的Go实现,优于布隆过滤器,可以定制化过滤器参数,并进行了空间优化
- Host: GitHub
- URL: https://github.com/linvon/cuckoo-filter
- Owner: linvon
- License: mit
- Created: 2021-02-19T12:27:43.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-08-16T16:55:07.000Z (about 1 year ago)
- Last Synced: 2024-07-31T20:45:56.528Z (3 months ago)
- Topics: bloom, bloom-filter, bloomfilter, configurable, cuckoo, cuckoo-filter, cuckoofilter, go
- Language: Go
- Homepage:
- Size: 129 KB
- Stars: 289
- Watchers: 8
- Forks: 27
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-go - cuckoo-filter - Cuckoo filter: a comprehensive cuckoo filter, which is configurable and space optimized compared with other implements, and all features mentioned in original paper are available. (Data Structures and Algorithms / Bloom and Cuckoo Filters)
- awesome-go - cuckoo-filter - Cuckoo filter: a comprehensive cuckoo filter, which is configurable and space optimized compared with other implements, and all features mentioned in original paper is available. (Data Structures and Algorithms / Bloom and Cuckoo Filters)
- awesome-go-extra - cuckoo-filter - 02-19T12:27:43Z|2022-03-22T21:14:17Z| (Generators / Bloom and Cuckoo Filters)
README
# cuckoo-filter
[![Mentioned in Awesome Go](https://awesome.re/mentioned-badge.svg)](https://github.com/avelino/awesome-go)cuckoo-filter go implement. Config by you
transplant from [efficient/cuckoofilter](https://github.com/efficient/cuckoofilter)
[中文文档](./README_ZH.md)
Overview
--------
Cuckoo filter is a Bloom filter replacement for approximated set-membership queries. While Bloom filters are well-known space-efficient data structures to serve queries like "if item x is in a set?", they do not support deletion. Their variances to enable deletion (like counting Bloom filters) usually require much more space.Cuckoo filters provide the flexibility to add and remove items dynamically. A cuckoo filter is based on cuckoo hashing (and therefore named as cuckoo filter). It is essentially a cuckoo hash table storing each key's fingerprint. Cuckoo hash tables can be highly compact, thus a cuckoo filter could use less space than conventional Bloom filters, for applications that require low false positive rates (< 3%).
For details about the algorithm and citations please use:
["Cuckoo Filter: Practically Better Than Bloom"](http://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf) in proceedings of ACM CoNEXT 2014 by Bin Fan, Dave Andersen and Michael Kaminsky
## Implementation details
The paper cited above leaves several parameters to choose.
2. Bucket size(b): Number of fingerprints per bucket
3. Fingerprints size(f): Fingerprints bits size of hashtagIn other implementation:
- [seiflotfy/cuckoofilter](https://github.com/seiflotfy/cuckoofilter) use b=4, f=8 bit, which correspond to a false positive rate of `r ~= 0.03`.
- [panmari/cuckoofilter](https://github.com/panmari/cuckoofilter) use b=4, f=16 bit, which correspond to a false positive rate of `r ~= 0.0001`.
- [irfansharif/cfilter](https://github.com/irfansharif/cfilter) can adjust b and f, but only can adjust f to 8x, which means it is in Bytes.In this implementation, you can adjust b and f to any value you want in `TableTypeSingle` type implementation.
In addition, the Semi-sorting Buckets mentioned in paper which can save 1 bit per item is also available in `TableTypePacked` type,
note that b=4, only f is adjustable.##### Why custom is important?
According to paper
- Different bucket size result in different filter loadfactor, which means occupancy rate of filter
- Different bucket size is suitable for different target false positive rate
- To keep a false positive rate, bigger bucket size, bigger fingerprint sizeGiven a target false positive rate of `r`
> when r > 0.002, having two entries per bucket yields slightly better results than using four entries per bucket; when decreases to 0.00001 < r ≤ 0.002, four entries per bucket minimizes space.
with a bucket size `b`, they suggest choosing the fingerprint size `f` using
f >= log2(2b/r) bits
as the same time, notice that we got loadfactor 84%, 95% or 98% when using bucket size b = 2, 4 or 8
##### To know more about parameter choosing, refer to paper's section 5
Note: generally b = 8 is enough, without more data support, we suggest you choosing b from 2, 4 or 8. And f is max 32 bits
## Example usage:
``` go
package mainimport (
"fmt"
"github.com/linvon/cuckoo-filter"
)func main() {
cf := cuckoo.NewFilter(4, 9, 3900, cuckoo.TableTypePacked)
fmt.Println(cf.Info())
fmt.Println(cf.FalsePositiveRate())a := []byte("A")
cf.Add(a)
fmt.Println(cf.Contain(a))
fmt.Println(cf.Size())b := cf.Encode()
ncf, _ := cuckoo.Decode(b)
fmt.Println(ncf.Contain(a))cf.Delete(a)
fmt.Println(cf.Size())
}
```