{"id":13484531,"url":"https://github.com/igrigorik/bloomfilter-rb","last_synced_at":"2025-05-15T09:09:17.279Z","repository":{"id":472199,"uuid":"97241","full_name":"igrigorik/bloomfilter-rb","owner":"igrigorik","description":"BloomFilter(s) in Ruby: Native counting filter + Redis counting/non-counting filters ","archived":false,"fork":false,"pushed_at":"2024-03-26T22:22:14.000Z","size":102,"stargazers_count":474,"open_issues_count":7,"forks_count":59,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-04-15T01:59:18.600Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/","language":"C","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/igrigorik.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2008-12-27T18:07:07.000Z","updated_at":"2025-02-21T23:58:27.000Z","dependencies_parsed_at":"2024-05-01T13:19:56.487Z","dependency_job_id":"993eaba0-2635-4172-8636-3abe2f2ddb9d","html_url":"https://github.com/igrigorik/bloomfilter-rb","commit_stats":{"total_commits":97,"total_committers":20,"mean_commits":4.85,"dds":0.4948453608247423,"last_synced_commit":"2775674d78d8f2af358982464a849c499b1475dc"},"previous_names":[],"tags_count":4,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/igrigorik%2Fbloomfilter-rb","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/igrigorik%2Fbloomfilter-rb/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/igrigorik%2Fbloomfilter-rb/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/igrigorik%2Fbloomfilter-rb/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/igrigorik","download_url":"https://codeload.github.com/igrigorik/bloomfilter-rb/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254283314,"owners_count":22045141,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T17:01:25.728Z","updated_at":"2025-05-15T09:09:12.224Z","avatar_url":"https://github.com/igrigorik.png","language":"C","readme":"# BloomFilter(s) in Ruby\n\n- Native (MRI/C) counting bloom filter\n- Redis-backed getbit/setbit non-counting bloom filter\n- Redis-backed set-based counting (+TTL) bloom filter\n\nBloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. For more detail, check the [wikipedia article](http://en.wikipedia.org/wiki/Bloom_filter). Instead of using k different hash functions, this implementation seeds the CRC32 hash with k different initial values (0, 1, ..., k-1). This may or may not give you a good distribution, it all depends on the data.\n\nPerformance of the Bloom filter depends on a number of variables:\n\n- size of the bit array\n- size of the counter bucket\n- number of hash functions\n\n## Resources\n\n- Determining parameters: [Scalable Datasets: Bloom Filters in Ruby](http://www.igvita.com/2008/12/27/scalable-datasets-bloom-filters-in-ruby/)\n- Applications \u0026 reasons behind bloom filter: [Flow analysis: Time based bloom filter](http://www.igvita.com/2010/01/06/flow-analysis-time-based-bloom-filters/)\n\n***\n\n## MRI/C API Example\n\nMRI/C implementation which creates an in-memory filter which can be saved and reloaded from disk.\n\n```ruby\nrequire 'bloomfilter-rb'\n\nbf = BloomFilter::Native.new(:size =\u003e 100, :hashes =\u003e 2, :seed =\u003e 1, :bucket =\u003e 3, :raise =\u003e false)\nbf.insert(\"test\")\nbf.include?(\"test\")     # =\u003e true\nbf.include?(\"blah\")     # =\u003e false\n\nbf.delete(\"test\")\nbf.include?(\"test\")     # =\u003e false\n\n# Hash with a bloom filter!\nbf[\"test2\"] = \"bar\"\nbf[\"test2\"]             # =\u003e true\nbf[\"test3\"]             # =\u003e false\n\nbf.stats\n# =\u003e Number of filter bits (m): 10\n# =\u003e Number of filter elements (n): 2\n# =\u003e Number of filter hashes (k) : 2\n# =\u003e Predicted false positive rate = 10.87%\n```\n\n***\n\n## Redis-backed setbit/getbit bloom filter\n\nUses [getbit](http://redis.io/commands/getbit)/[setbit](http://redis.io/commands/setbit) on Redis strings - efficient, fast, can be shared by multiple/concurrent processes.\n\n```ruby\nbf = BloomFilter::Redis.new\n\nbf.insert('test')\nbf.include?('test')     # =\u003e true\nbf.include?('blah')     # =\u003e false\n\nbf.delete('test')\nbf.include?('test')     # =\u003e false\n```\n\n### Memory footprint\n\n- 1.0% error rate for 1M items, 10 bits/item: *2.5 mb*\n- 1.0% error rate for 150M items, 10 bits per item: *358.52 mb*\n- 0.1% error rate for 150M items, 15 bits per item: *537.33 mb*\n\n***\n\n## Redis-backed counting bloom filter with TTLs\nUses regular Redis get/set counters to implement a counting filter with optional TTL expiry. Because each \"bit\" requires its own key in Redis, you do incur a much larger memory overhead.\n\n```ruby\nbf = BloomFilter::CountingRedis.new(:ttl =\u003e 2)\n\nbf.insert('test')\nbf.include?('test')     # =\u003e true\n\nsleep(2)\nbf.include?('test')     # =\u003e false\n```\n\n## Credits\n\nTatsuya Mori \u003cvaldzone@gmail.com\u003e (Original C implementation: http://vald.x0.com/sb/)\n\n## License\n\nMIT License - Copyright (c) 2011 Ilya Grigorik\n","funding_links":[],"categories":["Scientific"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Figrigorik%2Fbloomfilter-rb","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Figrigorik%2Fbloomfilter-rb","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Figrigorik%2Fbloomfilter-rb/lists"}