{"id":13411417,"url":"https://github.com/bits-and-blooms/bloom","last_synced_at":"2025-12-23T00:03:20.367Z","repository":{"id":1521552,"uuid":"1780633","full_name":"bits-and-blooms/bloom","owner":"bits-and-blooms","description":"Go package implementing Bloom filters, used by Milvus and Beego.","archived":false,"fork":false,"pushed_at":"2024-12-10T01:33:59.000Z","size":129,"stargazers_count":2570,"open_issues_count":18,"forks_count":244,"subscribers_count":41,"default_branch":"master","last_synced_at":"2025-05-05T22:56:21.041Z","etag":null,"topics":["bloom","bloom-filters","go"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-2-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bits-and-blooms.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"patreon":null,"open_collective":null,"ko_fi":null,"custom":"https://donate.mcc.org/"}},"created_at":"2011-05-21T14:18:41.000Z","updated_at":"2025-05-05T14:32:28.000Z","dependencies_parsed_at":"2024-12-17T03:00:29.576Z","dependency_job_id":"c7b25113-9323-4c10-a895-18eee6b4deae","html_url":"https://github.com/bits-and-blooms/bloom","commit_stats":{"total_commits":126,"total_committers":30,"mean_commits":4.2,"dds":0.8015873015873016,"last_synced_commit":"1b8b697ca6b5ad9d5599ab8926fb997ccca0c23c"},"previous_names":["willf/bloom"],"tags_count":15,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bits-and-blooms%2Fbloom","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bits-and-blooms%2Fbloom/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bits-and-blooms%2Fbloom/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bits-and-blooms%2Fbloom/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bits-and-blooms","download_url":"https://codeload.github.com/bits-and-blooms/bloom/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252590552,"owners_count":21772936,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloom","bloom-filters","go"],"created_at":"2024-07-30T20:01:13.525Z","updated_at":"2025-12-23T00:03:20.360Z","avatar_url":"https://github.com/bits-and-blooms.png","language":"Go","funding_links":["https://donate.mcc.org/"],"categories":["Data Structures and Algorithms","开源类库","数据结构与算法","Go","Uncategorized","Generators","Data Integration Frameworks"],"sub_categories":["Bloom and Cuckoo Filters","数据结构","布隆和布谷鸟过滤器"],"readme":"Bloom filters\n-------------\n[![Test](https://github.com/bits-and-blooms/bloom/actions/workflows/test.yml/badge.svg)](https://github.com/bits-and-blooms/bloom/actions/workflows/test.yml)\n[![Go Report Card](https://goreportcard.com/badge/github.com/bits-and-blooms/bloom)](https://goreportcard.com/report/github.com/bits-and-blooms/bloom)\n[![Go Reference](https://pkg.go.dev/badge/github.com/bits-and-blooms/bloom.svg)](https://pkg.go.dev/github.com/bits-and-blooms/bloom/v3)\n\nThis library is used by popular systems such as [Milvus](https://github.com/milvus-io/milvus) and [beego](https://github.com/beego/Beego).\n\nA Bloom filter is a concise/compressed representation of a set, where the main\nrequirement is to make membership queries; _i.e._, whether an item is a\nmember of a set. A Bloom filter will always correctly report the presence\nof an element in the set when the element is indeed present. A Bloom filter \ncan use much less storage than the original set, but it allows for some 'false positives':\nit may sometimes report that an element is in the set whereas it is not.\n\nWhen you construct, you need to know how many elements you have (the desired capacity), and what is the desired false positive rate you are willing to tolerate. A common false-positive rate is 1%. The\nlower the false-positive rate, the more memory you are going to require. Similarly, the higher the\ncapacity, the more memory you will use.\nYou may construct the Bloom filter capable of receiving 1 million elements with a false-positive\nrate of 1% in the following manner. \n\n```Go\n    filter := bloom.NewWithEstimates(1000000, 0.01) \n```\n\nYou should call `NewWithEstimates` conservatively: if you specify a number of elements that it is\ntoo small, the false-positive bound might be exceeded. A Bloom filter is not a dynamic data structure:\nyou must know ahead of time what your desired capacity is.\n\nOur implementation accepts keys for setting and testing as `[]byte`. Thus, to\nadd a string item, `\"Love\"`:\n\n```Go\n    filter.Add([]byte(\"Love\"))\n```\n\nSimilarly, to test if `\"Love\"` is in bloom:\n\n```Go\n    if filter.Test([]byte(\"Love\"))\n```\n\nFor numerical data, we recommend that you look into the encoding/binary library. But, for example, to add a `uint32` to the filter:\n\n```Go\n    i := uint32(100)\n    n1 := make([]byte, 4)\n    binary.BigEndian.PutUint32(n1, i)\n    filter.Add(n1)\n```\n\nGodoc documentation:  https://pkg.go.dev/github.com/bits-and-blooms/bloom/v3 \n\n\n## Installation\n\n```bash\ngo get -u github.com/bits-and-blooms/bloom/v3\n```\n\n## Verifying the False Positive Rate\n\n\nSometimes, the actual false positive rate may differ (slightly) from the\ntheoretical false positive rate. We have a function to estimate the false positive rate of a\nBloom filter with _m_ bits and _k_ hashing functions for a set of size _n_:\n\n```Go\n    if bloom.EstimateFalsePositiveRate(20*n, 5, n) \u003e 0.001 ...\n```\n\nYou can use it to validate the computed m, k parameters:\n\n```Go\n    m, k := bloom.EstimateParameters(n, fp)\n    ActualfpRate := bloom.EstimateFalsePositiveRate(m, k, n)\n```\n\nor\n\n```Go\n    f := bloom.NewWithEstimates(n, fp)\n    ActualfpRate := bloom.EstimateFalsePositiveRate(f.m, f.k, n)\n```\n\nYou would expect `ActualfpRate` to be close to the desired false-positive rate `fp` in these cases.\n\nThe `EstimateFalsePositiveRate` function creates a temporary Bloom filter. It is\nalso relatively expensive and only meant for validation.\n\n## Serialization\n\nYou can read and write the Bloom filters as follows:\n\n\n```Go\n\tf := New(1000, 4)\n\tvar buf bytes.Buffer\n\tbytesWritten, err := f.WriteTo(\u0026buf)\n\tif err != nil {\n\t\tt.Fatal(err.Error())\n\t}\n\tvar g BloomFilter\n\tbytesRead, err := g.ReadFrom(\u0026buf)\n\tif err != nil {\n\t\tt.Fatal(err.Error())\n\t}\n\tif bytesRead != bytesWritten {\n\t\tt.Errorf(\"read unexpected number of bytes %d != %d\", bytesRead, bytesWritten)\n\t}\n```\n\n*Performance tip*: \nWhen reading and writing to a file or a network connection, you may get better performance by \nwrapping your streams with `bufio` instances.\n\nE.g., \n```Go\n\tf, err := os.Create(\"myfile\")\n\tw := bufio.NewWriter(f)\n```\n```Go\n\tf, err := os.Open(\"myfile\")\n\tr := bufio.NewReader(f)\n```\n\n## Contributing\n\nIf you wish to contribute to this project, please branch and issue a pull request against master (\"[GitHub Flow](https://guides.github.com/introduction/flow/)\")\n\nThis project includes a Makefile that allows you to test and build the project with simple commands.\nTo see all available options:\n```bash\nmake help\n```\n\n## Running all tests\n\nBefore committing the code, please check if it passes all tests using (note: this will install some dependencies):\n```bash\nmake deps\nmake qa\n```\n\n## Design\n\nA Bloom filter has two parameters: _m_, the number of bits used in storage, and _k_, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by a [BitSet](https://github.com/bits-and-blooms/bitset); a key is represented in the filter by setting the bits at each value of the  hashing functions (modulo _m_). Set membership is done by _testing_ whether the bits at each value of the hashing functions (again, modulo _m_) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose _k_ and _m_ correctly.\n\nIn this implementation, the hashing functions used is [murmurhash](github.com/twmb/murmur3), a non-cryptographic hashing function.\n\n\nGiven the particular hashing scheme, it's best to be empirical about this. Note\nthat estimating the FP rate will clear the Bloom filter.\n\n\n\n\n### Goroutine safety\n\nIn general, it not safe to access\nthe same filter using different goroutines--they are\nunsynchronized for performance. Should you want to access\na filter from more than one goroutine, you should\nprovide synchronization. Typically this is done by using channels (in Go style; so there is only ever one owner),\nor by using `sync.Mutex` to serialize operations. Exceptionally, you may access the same filter from different\ngoroutines if you never modify the content of the filter.\n\n## Stars\n\n\n[![Star History Chart](https://api.star-history.com/svg?repos=bits-and-blooms/bloom\u0026type=Date)](https://www.star-history.com/#bits-and-blooms/bloom\u0026Date)\n\n## Further reading\n\n\u003cp\u003eMastering Programming: From Testing to Performance in Go\u003c/p\u003e\n\u003cdiv\u003e\u003ca href=\"https://www.amazon.com/dp/B0FMPGSWR5\"\u003e\u003cimg style=\"margin-left: auto; margin-right: auto;\" src=\"https://m.media-amazon.com/images/I/61feneHS7kL._SL1499_.jpg\" alt=\"\" width=\"250px\" /\u003e\u003c/a\u003e\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbits-and-blooms%2Fbloom","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbits-and-blooms%2Fbloom","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbits-and-blooms%2Fbloom/lists"}