{"id":13616482,"url":"https://github.com/FastFilter/xorfilter","last_synced_at":"2025-04-14T00:32:20.240Z","repository":{"id":36606338,"uuid":"228477259","full_name":"FastFilter/xorfilter","owner":"FastFilter","description":"Go library implementing binary fuse and xor filters","archived":false,"fork":false,"pushed_at":"2024-01-09T00:57:59.000Z","size":401,"stargazers_count":660,"open_issues_count":1,"forks_count":50,"subscribers_count":20,"default_branch":"master","last_synced_at":"2024-06-19T13:57:39.102Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/FastFilter.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-12-16T21:15:56.000Z","updated_at":"2024-06-17T12:35:06.000Z","dependencies_parsed_at":"2023-01-17T03:03:44.178Z","dependency_job_id":"be70258d-fc98-46a3-a0b5-53ec674ebb5e","html_url":"https://github.com/FastFilter/xorfilter","commit_stats":{"total_commits":87,"total_committers":12,"mean_commits":7.25,"dds":"0.33333333333333337","last_synced_commit":"6e1eda073dd4927a1c807e2cc47e4a66b56ad7a6"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FastFilter%2Fxorfilter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FastFilter%2Fxorfilter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FastFilter%2Fxorfilter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/FastFilter%2Fxorfilter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/FastFilter","download_url":"https://codeload.github.com/FastFilter/xorfilter/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":213578664,"owners_count":15608019,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-08-01T20:01:29.126Z","updated_at":"2024-11-08T00:31:49.909Z","avatar_url":"https://github.com/FastFilter.png","language":"Go","funding_links":[],"categories":["Misc","开源类库","Open source library","Go"],"sub_categories":["算法","Algorithm"],"readme":"# xorfilter: Go library implementing xor and binary fuse filters\n[![GoDoc](https://godoc.org/github.com/FastFilter/xorfilter?status.svg)](https://godoc.org/github.com/FastFilter/xorfilter)\n[![Test](https://github.com/FastFilter/xorfilter/actions/workflows/test.yml/badge.svg)](https://github.com/FastFilter/xorfilter/actions/workflows/test.yml)\n\nBloom filters are used to quickly check whether an element is part of a set.\nXor and binary fuse filters are a faster and more concise alternative to Bloom filters.\nFurthermore, unlike Bloom filters, xor and binary fuse filters are naturally compressible using standard techniques (gzip, zstd, etc.).\nThey are also smaller than cuckoo filters. They are used in [production systems](https://github.com/datafuselabs/databend).\n\n* Thomas Mueller Graf, Daniel Lemire, [Binary Fuse Filters: Fast and Smaller Than Xor Filters](http://arxiv.org/abs/2201.01174), Journal of Experimental Algorithmics (to appear). DOI: 10.1145/3510449   \n* Thomas Mueller Graf,  Daniel Lemire, [Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters](https://arxiv.org/abs/1912.08258), Journal of Experimental Algorithmics 25 (1), 2020. DOI: 10.1145/3376122\n\nThis Go library is used by \n\n\n* [coherence-go-client](https://github.com/oracle/coherence-go-client): the Oracle Coherence client\n* [Matrixone](https://github.com/matrixorigin/matrixone): a Hyperconverged cloud-edge native database\n\n\n\u003cimg src=\"figures/comparison.png\" width=\"50%\"/\u003e\n\n\nWe are assuming that your set is made of 64-bit integers. If you have strings\nor other data structures, you need to hash them first to a 64-bit integer. It\nis not important to have a good hash function, but collision should be unlikely\n(~1/2^64). A few collisions are acceptable, but we expect that your initial set \nshould have no duplicated entry. \n\nThe current implementation has a false positive rate of about 0.4% and a memory usage\nof less than 9 bits per entry for sizeable sets.\n\nYou construct the filter as follows starting from a slice of 64-bit integers:\n\n```Go\nfilter,_ := xorfilter.PopulateBinaryFuse8(keys) // keys is of type []uint64\n```\nIt returns an object of type `BinaryFuse8`. The 64-bit integers would typically be hash values of your objects.\n\nYou can then query it as follows:\n\n\n```Go\nfilter.Contains(v) // v is of type uint64\n```\n\nIt will *always* return true if v was part of the initial construction (`Populate`) and almost always return false otherwise.\n\nAn xor filter is immutable, it is concurrent. The expectation is that you build it once and use it many times.\n\nThough the filter itself does not use much memory, the construction of the filter needs many bytes of memory per set entry.\n\nFor persistence, you only need to serialize the following data structure:\n\n```Go\ntype BinaryFuse8 struct {\n\tSeed               uint64\n\tSegmentLength      uint32\n\tSegmentLengthMask  uint32\n\tSegmentCount       uint32\n\tSegmentCountLength uint32\n\tFingerprints []uint8\n}\n```\n\nWhen constructing the filter, you should ensure that there are not too many  duplicate keys for best results.\n\n# Generic (8-bit, 16-bit, 32-bit)\n\nBy default, we use 8-bit fingerprints which provide a 0.4% false positive rate. Some user might want to reduce\nthis false positive rate at the expensive of more memory usage. For this purpose, we provide a generic type\n(`NewBinaryFuse[T]`). \n\n```Go\nfilter8, _ := xorfilter.NewBinaryFuse[uint8](keys) // 0.39% false positive rate, uses about 9 bits per key\nfilter16, _ := xorfilter.NewBinaryFuse[uint16](keys) // 0.0015% false positive rate, uses about 18 bits per key\nfilter32, _ := xorfilter.NewBinaryFuse[uint32](keys) // 2e-08% false positive rate, uses about 36 bits per key\n```\nThe 32-bit fingerprints are provided but not recommended. Most users will want to use either the 8-bit or 16-bit fingerprints.\n\nThe Binary Fuse filters have memory usages of about 9 bits per key in the 8-bit case, 18 bits per key in the 16-bit case,\nfor sufficiently large sets (hundreds of thousands of keys). There is more per-key memory usage when the set is smaller.\n\n\n# Implementations of xor filters in other programming languages\n\n* [Erlang](https://github.com/mpope9/exor_filter)\n* Rust: [1](https://github.com/bnclabs/xorfilter), [2](https://github.com/codri/xorfilter-rs), [3](https://github.com/Polochon-street/rustxorfilter), [4](https://github.com/ayazhafiz/xorf)\n* [C++](https://github.com/FastFilter/fastfilter_cpp)\n* [Java](https://github.com/FastFilter/fastfilter_java)\n* [C](https://github.com/FastFilter/xor_singleheader)\n* [C99](https://github.com/skeeto/xf8)\n* [Python](https://github.com/GreyDireWolf/pyxorfilter)\n* [C#](https://github.com/jonmat/FastIndex)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFastFilter%2Fxorfilter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FFastFilter%2Fxorfilter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FFastFilter%2Fxorfilter/lists"}