{"id":46688514,"url":"https://github.com/phrozen/bloom","last_synced_at":"2026-04-06T18:04:11.348Z","repository":{"id":342222251,"uuid":"1007326882","full_name":"phrozen/bloom","owner":"phrozen","description":"An ultra fast, lightweight, concurrent-safe Bloom filter for Go","archived":false,"fork":false,"pushed_at":"2026-03-11T02:22:18.000Z","size":44,"stargazers_count":34,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-23T20:15:13.720Z","etag":null,"topics":["bloom-filter","data-structures","go","production"],"latest_commit_sha":null,"homepage":"https://gestrada.dev/posts/bloom-filter/","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/phrozen.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-06-23T20:25:13.000Z","updated_at":"2026-03-13T14:03:39.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/phrozen/bloom","commit_stats":null,"previous_names":["phrozen/bloom"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/phrozen/bloom","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phrozen%2Fbloom","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phrozen%2Fbloom/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phrozen%2Fbloom/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phrozen%2Fbloom/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/phrozen","download_url":"https://codeload.github.com/phrozen/bloom/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/phrozen%2Fbloom/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31483387,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-06T17:22:55.647Z","status":"ssl_error","status_checked_at":"2026-04-06T17:22:54.741Z","response_time":112,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloom-filter","data-structures","go","production"],"created_at":"2026-03-09T03:00:36.872Z","updated_at":"2026-04-06T18:04:11.336Z","avatar_url":"https://github.com/phrozen.png","language":"Go","readme":"# bloom\n\n[![Go Reference](https://pkg.go.dev/badge/github.com/phrozen/bloom.svg)](https://pkg.go.dev/github.com/phrozen/bloom)\n[![Go Report Card](https://goreportcard.com/badge/github.com/phrozen/bloom)](https://goreportcard.com/report/github.com/phrozen/bloom)\n[![Build Status](https://github.com/phrozen/bloom/actions/workflows/go.yml/badge.svg)](https://github.com/phrozen/bloom/actions)\n[![Go Coverage](https://github.com/phrozen/bloom/wiki/coverage.svg)](https://raw.githack.com/wiki/phrozen/bloom/coverage.html)\n[![LICENSE](https://img.shields.io/badge/license-Apache_2.0-green)](https://github.com/phrozen/bloom/blob/main/LICENSE)\n\nAn ultra fast, lightweight, concurrent-safe [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) for Go.\n\n**Zero dependencies** — only the Go standard library. Zero allocations per operation. Lock-free concurrent reads and writes via `atomic.Uint64`. Built for production use, small enough to understand (just over 100 LOC).\n\n\u003e This library is based on my blog post: [The Magic of Bloom Filters](https://gestrada.dev/posts/bloom-filter/), which covers the theory, math, benchmarks, and design decisions in depth.\n\n## Features\n\n- **Zero dependencies** — uses only the Go standard library (`hash/fnv`, `sync/atomic`, `sync`, `math`)\n- **Zero allocations** — hashers are recycled via `sync.Pool` (0 B/op, 0 allocs/op)\n- **Concurrent-safe with no locks** — `atomic.Uint64` bitset for lock-free `Add` and `Contains`\n- **Kirsch-Mitzenmacher optimization** — one hash call simulates `k` independent hashes\n- **Configurable** — create from explicit `(m, k)` or from `(n, p)` probability parameters\n- **Pluggable hash function** — default FNV-1a, swappable via `WithHashFunc` option\n- **Binary serialization** — `MarshalBinary` / `UnmarshalBinary` for persistence and network transfer\n\n## Install\n\n```sh\ngo get github.com/phrozen/bloom\n```\n\n## Usage\n\nBloom filters answer one question: *\"Is this item in the set?\"* The answer is either a definitive **No** or a probabilistic **Probably**. There are no false negatives — if the filter says no, the item was never added. This makes them ideal for guarding expensive lookups: check the filter first, skip the database/disk/network call when the answer is no.\n\n### Create from probability parameters\n\nMost of the time, you know how many items you expect and what false positive rate you can tolerate. The constructor handles the math for you — it computes the optimal number of bits and hash functions automatically:\n\n```go\n// Filter for 1M items with a 1% false positive rate.\n// Internally this allocates ~1.14 MB (9.58M bits) and uses 7 hash functions.\nf := bloom.NewFilterFromProbability(1_000_000, 0.01)\n```\n\n### Create with explicit parameters\n\nIf you already know the exact bit size and hash count you want (e.g. reproducing a filter from known parameters), you can specify them directly:\n\n```go\n// 10 million bits, 7 hash functions\nf := bloom.NewFilter(10_000_000, 7)\n```\n\n### Add and query\n\nBoth functions are concurrent-safe and cannot fail (no error). Passing an empty `[]byte` is valid — it simply hashes like any other input.\n\n```go\nid := \"550e8400-e29b-41d4-a716-446655440000\"\nf.Add([]byte(id))\n\nif !f.Contains([]byte(id)) {\n    // Definitively NOT in the set — skip the database entirely\n    return ErrNotFound\n}\n// Probably in the set — check the database to confirm\n```\n\n### Custom hash function\n\nThe default hasher is FNV-1a from the standard library. If you need a different algorithm, swap it via the `WithHashFunc` option. Any `hash.Hash64` implementation works:\n\n```go\nimport \"github.com/cespare/xxhash/v2\"\n\nf := bloom.NewFilterFromProbability(1_000_000, 0.01,\n    bloom.WithHashFunc(func() hash.Hash64 { return xxhash.New() }),\n)\n```\n\n### Serialize / Deserialize\n\nFilters implement `encoding.BinaryMarshaler` and `encoding.BinaryUnmarshaler`, so you can persist them to disk, send them over the network, or embed them in any encoding that supports those interfaces:\n\n```go\n// Save\ndata, err := f.MarshalBinary()\n\n// Restore\nvar restored bloom.Filter\nerr = restored.UnmarshalBinary(data)\n```\n\nThe binary format is compact: 16 bytes of header (`m` + `k`) followed by the raw bitset. A 1M-item filter at 1% FPR serializes to ~1.14 MB.\n\n## Benchmarks\n\nAll benchmarks on AMD Ryzen 7 5800X (16 threads), 16-byte UUID keys, `go test -bench=. -benchmem`:\n\n### Per-operation\n\n| Operation | ns/op | B/op | allocs/op |\n|---|---|---|---|\n| `Add` | ~36 ns | 0 | 0 |\n| `Contains` | ~25 ns | 0 | 0 |\n\n### At scale (1M items, 1% FPR, 16 goroutines)\n\n| Benchmark | ns/op | B/op | allocs/op |\n|---|---|---|---|\n| Concurrent Add | ~12 ns | 0 | 0 |\n| Concurrent Contains | ~9 ns | 0 | 0 |\n| Concurrent Add+Contains (50/50) | ~14 ns | 0 | 0 |\n| **Filter memory footprint** | **1.14 MB** | | |\n| **Actual false positive rate** | **~1.00%** | | |\n\n### Hash function comparison\n\nThe `sync.Pool` optimization eliminates the interface allocation penalty entirely. All hashers now perform within the same ballpark, despite wildly different internal struct sizes:\n\n| Hasher | Add ns/op | Contains ns/op | allocs/op |\n|---|---|---|---|\n| FNV-1a (default, std lib) | ~36 ns | ~25 ns | 0 |\n| [xxHash](https://github.com/cespare/xxhash) | ~40 ns | ~26 ns | 0 |\n| [Murmur3](https://github.com/twmb/murmur3) | ~36 ns | ~29 ns | 0 |\n| [XXH3](https://github.com/zeebo/xxh3) | ~36 ns | ~29 ns | 0 |\n\nFNV-1a is the default because it ships with Go — zero dependencies — and with Kirsch-Mitzenmacher splitting a 64-bit hash into two 32-bit halves, all four hashers produce identical false positive rates (~1% for a 1% target). Hash \"quality\" is a non-factor for this use case.\n\n## How it works\n\n1. **Bit array** — a `[]atomic.Uint64` slice aligned to CPU word size for single-instruction bit operations\n2. **Kirsch-Mitzenmacher** — hash once, split into two 32-bit halves `h1` and `h2`, simulate `k` hashes via `h1 + i*h2` ([paper](https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf))\n3. **sync.Pool** — recycles hasher instances so `Add`/`Contains` do zero heap allocations\n4. **Atomic operations** — `Or` to set bits, `Load` to read them — fully concurrent, no mutex\n\n## API\n\n```go\n// Constructors\nfunc NewFilter(m int, k int, opts ...Option) *Filter\nfunc NewFilterFromProbability(n int, p float64, opts ...Option) *Filter\n\n// Options\nfunc WithHashFunc(h HashFunc) Option\n\n// Operations\nfunc (f *Filter) Add(data []byte)\nfunc (f *Filter) Contains(data []byte) bool\n\n// Serialization (encoding.BinaryMarshaler / encoding.BinaryUnmarshaler)\nfunc (f *Filter) MarshalBinary() ([]byte, error)\nfunc (f *Filter) UnmarshalBinary(data []byte) error\n```\n\n## License\n\n[Apache 2.0](LICENSE)","funding_links":[],"categories":["Go"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphrozen%2Fbloom","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fphrozen%2Fbloom","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fphrozen%2Fbloom/lists"}