{"id":33937320,"url":"https://github.com/yankun1992/fastbloom","last_synced_at":"2026-04-08T13:31:14.259Z","repository":{"id":44376337,"uuid":"495430973","full_name":"yankun1992/fastbloom","owner":"yankun1992","description":"A fast bloom filter implemented  by Rust for Python! 10x faster than pybloom!","archived":false,"fork":false,"pushed_at":"2025-09-01T12:05:34.000Z","size":536,"stargazers_count":111,"open_issues_count":5,"forks_count":20,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-18T09:45:02.184Z","etag":null,"topics":["bloom-filter","bloomfilter","counting-bloom-filter","counting-bloom-filters","murmur3","pyo3","python","rust"],"latest_commit_sha":null,"homepage":"","language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/yankun1992.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2022-05-23T13:50:19.000Z","updated_at":"2026-03-10T08:29:38.000Z","dependencies_parsed_at":"2025-09-01T14:25:52.674Z","dependency_job_id":null,"html_url":"https://github.com/yankun1992/fastbloom","commit_stats":null,"previous_names":[],"tags_count":34,"template":false,"template_full_name":null,"purl":"pkg:github/yankun1992/fastbloom","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yankun1992%2Ffastbloom","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yankun1992%2Ffastbloom/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yankun1992%2Ffastbloom/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yankun1992%2Ffastbloom/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/yankun1992","download_url":"https://codeload.github.com/yankun1992/fastbloom/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/yankun1992%2Ffastbloom/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31558380,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T10:21:54.569Z","status":"ssl_error","status_checked_at":"2026-04-08T10:21:38.171Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloom-filter","bloomfilter","counting-bloom-filter","counting-bloom-filters","murmur3","pyo3","python","rust"],"created_at":"2025-12-12T14:57:43.853Z","updated_at":"2026-04-08T13:31:14.250Z","avatar_url":"https://github.com/yankun1992.png","language":"Rust","funding_links":[],"categories":["Performance \u0026 Caching","Data Structures"],"sub_categories":[],"readme":"\u003ch1\u003efastbloom\u003c/h1\u003e\n\n[![OSCS Status](https://www.oscs1024.com/platform/badge/yankun1992/fastbloom.svg?size=small)](https://www.oscs1024.com/project/yankun1992/fastbloom?ref=badge_small)\n[![docs.rs](https://img.shields.io/docsrs/fastbloom-rs/latest)](https://docs.rs/fastbloom-rs)\n[![Test Rust](https://github.com/yankun1992/fastbloom/actions/workflows/test_rust.yml/badge.svg)](https://github.com/yankun1992/fastbloom/actions/workflows/test_rust.yml)\n[![Test Python](https://github.com/yankun1992/fastbloom/actions/workflows/test_python.yml/badge.svg)](https://github.com/yankun1992/fastbloom/actions/workflows/test_python.yml)\n[![Benchmark](https://github.com/yankun1992/fastbloom/actions/workflows/benchmark.yml/badge.svg)](https://github.com/yankun1992/fastbloom/actions/workflows/benchmark.yml)\n[![Crates Latest Release](https://img.shields.io/crates/v/fastbloom-rs)](https://crates.io/crates/fastbloom-rs)\n[![PyPI Latest Release](https://img.shields.io/pypi/v/fastbloom-rs)](https://pypi.org/project/fastbloom-rs/)\n![Sonatype Nexus (Snapshots)](https://img.shields.io/nexus/s/io.github.yankun1992/fastbloom?server=https%3A%2F%2Fs01.oss.sonatype.org)\n\nA fast [bloom filter](#BloomFilter) | [counting bloom filter](#countingbloomfilter) implemented by Rust for Rust and\nPython!\n\nLanguage: [简体中文](./docs/README.zh_cn.md)\n\n- [setup](#setup)\n    - [Python](#python)\n        - [requirements](#requirements)\n        - [install](#install)\n    - [Rust](#rust)\n    - [Java](#java)\n- [Examples](#examples)\n    - [BloomFilter](#bloomfilter)\n        - [Python](#python-1)\n        - [Rust](#rust-1)\n    - [CountingBloomFilter](#countingbloomfilter)\n        - [Python](#python-2)\n        - [Rust](#rust-2)\n- [benchmark](#benchmark)\n    - [computer info](#computer-info)\n    - [add one str to bloom filter](#add-one-str-to-bloom-filter)\n    - [add one million to bloom filter](#add-one-million-to-bloom-filter)\n    - [check one contains in bloom filter](#check-one-contains-in-bloom-filter)\n    - [check one not contains in bloom filter](#check-one-not-contains-in-bloom-filter)\n    - [add one str to counting bloom filter](#add-one-str-to-counting-bloom-filter)\n    - [add one million to counting bloom filter](#add-one-million-to-counting-bloom-filter)\n\n# setup\n\n## Python\n\n### requirements\n\n```\nPython \u003e= 3.7\n```\n\n### install\n\nInstall the latest fastbloom version with:\n\n```bash\npip install fastbloom-rs\n```\n\n## Rust\n\n```toml\nfastbloom-rs = \"{latest}\"\n```\n\n## Java\nmaven\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003eio.github.yankun1992\u003c/groupId\u003e\n    \u003cartifactId\u003efastbloom\u003c/artifactId\u003e\n    \u003cversion\u003e{latest-version}\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n# Examples\n\n## BloomFilter\n\nA Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard\nBloom in 1970, that is used to test whether an element is a member of a set. False positive\nmatches are possible, but false negatives are not.\n\n**Reference**: Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors.\nCommunications of the ACM, 13(7), 422-426.\n[Full text article](http://crystal.uta.edu/~mcguigan/cse6350/papers/Bloom.pdf)\n\n### Python\n\nbasic usage\n\n```python\nfrom fastbloom_rs import BloomFilter\n\nbloom = BloomFilter(100_000_000, 0.01)\n\nbloom.add_str('hello')\nbloom.add_bytes(b'world')\nbloom.add_int(9527)\n\nassert bloom.contains('hello')\nassert bloom.contains(b'world')\nassert bloom.contains(9527)\n\nassert not bloom.contains('hello world')\n```\n\nbuild bloom filter from bytes or list\n\n```python\nfrom fastbloom_rs import BloomFilter\n\nbloom = BloomFilter(100_000_000, 0.01)\nbloom.add_str('hello')\nassert bloom.contains('hello')\n\nbloom2 = BloomFilter.from_bytes(bloom.get_bytes(), bloom.hashes())\nassert bloom2.contains('hello')\n\nbloom3 = BloomFilter.from_int_array(bloom.get_int_array(), bloom.hashes())\nassert bloom3.contains('hello')\n\n```\n\nthere are some bulk api for python to reduce ffi cost between python and rust\n\n```python\nbloom = BloomFilter(100_000_000, 0.01)\ninserts = [1, 2, 3, 4, 5, 6, 7, 9, 18, 68, 90, 100]\nchecks = [1, 2, 3, 4, 5, 6, 7, 9, 18, 68, 90, 100, 190, 290, 390]\nresults = [True, True, True, True, True, True, True, True, True, True, True, True, False, False, False]\n\nbloom.add_int_batch(inserts)\ncontains = bloom.contains_int_batch(checks)\nassert contains == results\n\nbloom.add_str_batch(list(map(lambda x: str(x), inserts)))\nassert bloom.contains_str_batch(list(map(lambda x: str(x), checks))) == results\n\nbloom.add_bytes_batch(list(map(lambda x: bytes(x), inserts)))\nassert bloom.contains_bytes_batch(list(map(lambda x: bytes(x), checks))) == results\n```\n\nmore examples at [py_tests](py_tests/test_bloom.py).\n\n### Rust\n\n```rust\nuse fastbloom_rs::{BloomFilter, FilterBuilder};\n\nlet mut bloom = FilterBuilder::new(100_000_000, 0.01).build_bloom_filter();\n\nbloom.add(b\"helloworld\");\nassert_eq!(bloom.contains(b\"helloworld\"), true);\nassert_eq!(bloom.contains(b\"helloworld!\"), false);\n```\n\nmore examples at [docs.rs](https://docs.rs/fastbloom-rs)\n\n## CountingBloomFilter\n\nA Counting Bloom filter works in a similar manner as a regular Bloom filter; however, it is\nable to keep track of insertions and deletions. In a counting Bloom filter, each entry in the\nBloom filter is a small counter associated with a basic Bloom filter bit.\n\n**Reference**: F. Bonomi, M. Mitzenmacher, R. Panigrahy, S. Singh, and G. Varghese, “An Improved\nConstruction for Counting Bloom Filters,” in 14th Annual European Symposium on\nAlgorithms, LNCS 4168, 2006\n\n### Python\n\n```python\nfrom fastbloom_rs import CountingBloomFilter\n\ncbf = CountingBloomFilter(1000_000, 0.01)\ncbf.add('hello')\ncbf.add('hello')\nassert 'hello' in cbf\ncbf.remove('hello')\nassert 'hello' in cbf  # because 'hello' added twice. \n# If add same element larger than 15 times, then remove 15 times the filter will not contain the element.\ncbf.remove('hello')\nassert 'hello' not in cbf\n```\n\nA CountingBloomFilter has a four bits counter to save hash index, so when insert an\nelement repeatedly, the counter will spill over quickly. So, you can set\n`enable_repeat_insert` to `False` to check whether the element has added.\nif it has added, it will not add again. `enable_repeat_insert` default set to `True`.\n\n```python\nfrom fastbloom_rs import CountingBloomFilter\n\ncbf = CountingBloomFilter(1000_000, 0.01, False)\ncbf.add('hello')\ncbf.add('hello')  # because enable_repeat_insert=False, this addition will not take effect. \nassert 'hello' in cbf\ncbf.remove('hello')\nassert 'hello' not in cbf \n```\n\nmore examples at [py_tests](py_tests/test_counting_bloom_filter.py).\n\n### Rust\n\n```rust\nuse fastbloom_rs::{CountingBloomFilter, FilterBuilder};\n\nlet mut builder = FilterBuilder::new(100_000, 0.01);\nlet mut cbf = builder.build_counting_bloom_filter();\ncbf.add(b\"helloworld\");\nassert_eq!(bloom.contains(b\"helloworld\"), true);\n```\n\n# benchmark\n\nFor detailed performance comparisons between fastbloom-rs and other Python bloom filter libraries, see the [library comparison benchmark](benches/lib_comparison/). This benchmark compares fastbloom-rs against pyprobables and pybloomfilter3 across various configurations and provides comprehensive performance metrics.\n\n## computer info\n\n| CPU                                    | Memory | OS         |\n|----------------------------------------|--------|------------|\n| AMD Ryzen 7 5800U with Radeon Graphics | 16G    | Windows 10 |\n\n## add one str to bloom filter\n\nBenchmark insert one str to bloom filter:\n\n```text\nbloom_add_test          time:   [41.168 ns 41.199 ns 41.233 ns]\n                        change: [-0.4891% -0.0259% +0.3417%] (p = 0.91 \u003e 0.05)\n                        No change in performance detected.\nFound 13 outliers among 100 measurements (13.00%)\n  1 (1.00%) high mild\n  12 (12.00%) high severe\n```\n\n## add one million to bloom filter\n\nBenchmark loop insert `(1..1_000_000).map(|n| { n.to_string() })` to bloom filter:\n\n```text\nbloom_add_all_test      time:   [236.24 ms 236.86 ms 237.55 ms]\n                        change: [-3.4346% -2.9050% -2.3524%] (p = 0.00 \u003c 0.05)\n                        Performance has improved.\nFound 5 outliers among 100 measurements (5.00%)\n  4 (4.00%) high mild\n  1 (1.00%) high severe\n```\n\n## check one contains in bloom filter\n\n```text\nbloom_contains_test     time:   [42.065 ns 42.102 ns 42.156 ns]\n                        change: [-0.7830% -0.5901% -0.4029%] (p = 0.00 \u003c 0.05)\n                        Change within noise threshold.\nFound 15 outliers among 100 measurements (15.00%)\n  1 (1.00%) low mild\n  5 (5.00%) high mild\n  9 (9.00%) high severe\n```\n\n## check one not contains in bloom filter\n\n```text\nbloom_not_contains_test time:   [22.695 ns 22.727 ns 22.773 ns]\n                        change: [-3.1948% -2.9695% -2.7268%] (p = 0.00 \u003c 0.05)\n                        Performance has improved.\nFound 12 outliers among 100 measurements (12.00%)\n  4 (4.00%) high mild\n  8 (8.00%) high severe\n```\n\n## add one str to counting bloom filter\n\n```text\ncounting_bloom_add_test time:   [60.822 ns 60.861 ns 60.912 ns]\n                        change: [+0.2427% +0.3772% +0.5579%] (p = 0.00 \u003c 0.05)\n                        Change within noise threshold.\nFound 10 outliers among 100 measurements (10.00%)\n  1 (1.00%) low severe\n  4 (4.00%) low mild\n  1 (1.00%) high mild\n  4 (4.00%) high severe\n```\n\n## add one million to counting bloom filter\n\nBenchmark loop insert `(1..1_000_000).map(|n| { n.to_string() })` to counting bloom filter:\n\n```text\ncounting_bloom_add_million_test\n                        time:   [272.48 ms 272.58 ms 272.68 ms]\nFound 2 outliers among 100 measurements (2.00%)\n  1 (1.00%) low mild\n  1 (1.00%) high mild\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyankun1992%2Ffastbloom","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fyankun1992%2Ffastbloom","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fyankun1992%2Ffastbloom/lists"}