{"id":24182904,"url":"https://github.com/codesteak/flower","last_synced_at":"2025-09-21T04:32:31.046Z","repository":{"id":62429636,"uuid":"127530944","full_name":"CodeSteak/Flower","owner":"CodeSteak","description":"Bloom Filters in Elixir using Rust NIFs","archived":false,"fork":false,"pushed_at":"2018-04-07T18:47:37.000Z","size":751,"stargazers_count":11,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-12-29T11:40:23.548Z","etag":null,"topics":["bloom-filter","elixir","rustler"],"latest_commit_sha":null,"homepage":null,"language":"Elixir","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/CodeSteak.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-03-31T12:41:12.000Z","updated_at":"2023-09-01T10:45:38.000Z","dependencies_parsed_at":"2022-11-01T20:03:19.524Z","dependency_job_id":null,"html_url":"https://github.com/CodeSteak/Flower","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodeSteak%2FFlower","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodeSteak%2FFlower/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodeSteak%2FFlower/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/CodeSteak%2FFlower/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/CodeSteak","download_url":"https://codeload.github.com/CodeSteak/Flower/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":233713194,"owners_count":18718373,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bloom-filter","elixir","rustler"],"created_at":"2025-01-13T08:29:13.779Z","updated_at":"2025-09-21T04:32:25.749Z","avatar_url":"https://github.com/CodeSteak.png","language":"Elixir","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Flower\n[![Build Status](https://travis-ci.org/CodeSteak/Flower.svg?branch=master)](https://travis-ci.org/CodeSteak/Flower)\n[![Hex.pm](https://img.shields.io/hexpm/v/flower.svg)](https://hex.pm/packages/flower)\n\n\nThis is an implementation of __Bloom Filters for Elixir__. It uses __NIFs__ written __in Rust__ for better performance, since Bloom Filters rely highly on mutability.\n\n#### What are Bloom Filter?\n__TL;DR__: *Huge amount of data __➜__ small Bloom Filter __:__ Was X not in Huge amount of data?*\n\nFor more about this topic consider, checking out:\n* [Bloom Filters by Jason Davis](https://www.jasondavies.com/bloomfilter/)\n* [Bloom Filters, Mining of Massive Datasets, Stanford University on Youtube](https://www.youtube.com/watch?v=qBTdukbzc78)\n\n## Installation\n\nThe package can be installed by adding `flower` to your list of dependencies in `mix.exs`:\n\n```elixir\ndef deps do\n  [\n    {:flower, \"~\u003e 0.1.4\"},\n  ]\nend\n```\n\nAlso, you need to have __Rust__ installed for development, since this uses NIFs.\n\nDocs can be found at [https://hexdocs.pm/flower](https://hexdocs.pm/flower).\n\nDocumentation can be generated with [ExDoc](https://github.com/elixir-lang/ex_doc).\n\n## Usage\nThere is also a small walk through in the [Example Section](#example)\n```elixir\nalias Flower.Bloom, as: Bloom\n```\n##### Create new Bloom Filter\n```elixir\nexpected_number_of_unique_elements = 1000\n# With a size atom\niex\u003e filter = Bloom.new(:\"8 KB\", expected_number_of_unique_elements)\n...\n# Or by a maximum byte size:\niex\u003e filter = Bloom.new_by_byte_size(8 * 1024, expected_number_of_unique_elements)\n...\n# Or by a bit address length:\niex\u003e filter = Bloom.new(16, expected_number_of_unique_elements)\n...\n```\n| Bit Address Length|Size Atom| Bit Address Length|Size Atom| Bit Address Length|Size Atom|\n|-------:|---------:|-------:|---------:|-------:|---------:|\n|        |          | __13__ | 1 KB     | __23__ | 1 MB     |\n|        |          | __14__ | 2 KB     | __24__ | 2 MB     |\n|        |          | __15__ | 4 KB     | __25__ | 4 MB     |\n|  __6__ | 8 Byte   | __16__ | 8 KB     | __26__ | 8 MB     |\n|  __7__ | 16 Byte  | __17__ | 16 KB    | __27__ | 16 MB    |\n|  __8__ | 32 Byte  | __18__ | 32 KB    | __28__ | 32 MB    |\n|  __9__ | 64 Byte  | __19__ | 64 KB    | __29__ | 64 MB    |\n| __10__ | 128 Byte | __20__ | 128 KB   | __30__ | 128 MB   |\n| __11__ | 256 Byte | __21__ | 256 KB   | __31__ | 256 MB   |\n| __12__ | 512 Byte | __22__ | 512 KB   | __32__ | 512 MB   |\n\n##### Insert Elements\n```elixir\niex\u003e Bloom.insert(filter, 42)\n:ok\niex\u003e Bloom.insert(filter, [1,2,{:atom, \u003c\u003c1,2,3,4\u003e\u003e}])\n:ok\niex\u003e Bloom.insert(filter, \"Hello!\")\n:ok\n```\n\n##### Check for Elements\n```elixir\niex\u003e Bloom.has_not?(filter, \"Hello!\")\nfalse\niex\u003e Bloom.has?(filter, [1,2,{:atom, \u003c\u003c1,2,3,4\u003e\u003e}])\ntrue\niex\u003e Bloom.has?(filter, :atom)\nfalse\n# `has?` is always the opposite of `has_not?`.\niex\u003e Bloom.has?(filter, 42) != Bloom.has_not?(filter, 42)\ntrue\n```\n\n|Was actually inserted?|                  has?  |                    has_not?  |\n|:--------------------:|:----------------------:|:----------------------------:|\n| yes                  | yes                    |                       no     |\n| no                   | most of the times: no  |   most of the times: yes     |\n\n\n##### Write To Disk\n\n```elixir\nfilename = \"prime_filter.bloom\"\nfile = File.stream!(filename, [:delayed_write, :binary], 8096)\n\n(Bloom.stream(filter) |\u003e Stream.into(file) |\u003e Stream.run)\n```\n\n##### Read From Disk\n\n```elixir\nfilename = \"prime_filter.bloom\"\nfile = File.stream!(filename, [:read_ahead, :binary], 8096)\n\n{:ok, new_filter} = Bloom.from_stream(file)\n```\n\n##### Funky Stuff\n```elixir\niex\u003e Bloom.estimate_count(filter)\n3\niex\u003e Bloom.false_positive_probability(filter)\n3.2348227494719115e-28\n# This number is only that small, because we have just 3 elements in 8 KBs.\n```\n## \u003ca name=\"example\"\u003e\u003c/a\u003eExample\n### Prime Numbers\nChecking if a number is not a prime and below `100_000`.\n```elixir\n# helper module\ndefmodule Step do\n    def throught(from, to, step, func) when from \u003c to do\n        func.(from)\n        throught(from+step, to, step, func)\n    end\n    def throught(_,_,_,_), do: :ok\nend \u0026\u0026 :ok\n\n# alias so we need to write less\nalias Flower.Bloom, as: Bloom\n\n\n# Select appropriate size.\n# We will put about 100_000 elements in.\n# 64 KB = 512 KBit\n# 512 KBit / 100_000 ~= 5.25 Bits per element.\n# Meeh. Let's see...\nnon_primes = Bloom.new(:\"64 KB\", 100_000)\n# We can also write\nnon_primes = Bloom.new(19, 100_000)\n# or (it will choose the next smaller size)\nnon_primes = Bloom.new_by_byte_size(64 * 1024, 100_000)\n\n# Let's put some non primes in:\nBloom.insert(non_primes, 42)\nBloom.insert(non_primes, \"one hundred\")\nBloom.insert(non_primes, [:ok, {Anything, 1.0e9}])\n\n# Let's double check:\nBloom.has?(non_primes, 42)\nBloom.has?(non_primes, \"one hundred\")\nBloom.has?(non_primes, 7)\nBloom.has?(non_primes, [:ok, {Anything, 1.0e9}])\nBloom.has?(non_primes, 52)\n# Works!\n\n\n# Now we can get a estimate of\n# the number of items we\n# put in.\nBloom.estimate_count(non_primes)\n\n\n# Apply Sieve of Eratosthenes.\n# This may take a few seconds.\n# At least we can do it with\n# constant memory.\nfor x \u003c- 2..50_000 do\n  # Skip multiples of previous numbers.\n  # This is actually not safe to do\n  # with a bloom filter since they are\n  # approximations. But let's don't care for now.\n  if(Bloom.has_not?(non_primes, x)) do\n    Step.throught(x*2, 100_000, x, fn non_prime -\u003e\n        # Much of the time is used to calculate\n        # hashes.\n        Bloom.insert(non_primes, non_prime)\n    end)\n end\nend \u0026\u0026 :ok\n\nnon_primes |\u003e Bloom.has_not?(12) # not a prime\nnon_primes |\u003e Bloom.has_not?(11) # prime\nnon_primes |\u003e Bloom.has_not?(6719) # no prime\nnon_primes |\u003e Bloom.has_not?(4245) # prime\nnon_primes |\u003e Bloom.has_not?(9973) # no prime\nnon_primes |\u003e Bloom.has_not?(3549) # prime\nnon_primes |\u003e Bloom.has_not?(89591) # prime\nnon_primes |\u003e Bloom.has_not?(84949) # no prime\n\n# Let's write it to disk:\nfilename = \"prime_filter.bloom\"\nfile = File.stream!(filename, [:delayed_write, :read_ahead, :binary], 8096)\n(Bloom.stream(non_primes)\n|\u003e Stream.into(file)\n|\u003e Stream.run)\n# Let's inspect the size of the filter:\nFile.lstat!(filename).size\n\n# We can also read from disk:\n{:ok, new_non_primes} = Bloom.from_stream(file)\nnew_non_primes |\u003e Bloom.has_not?(12) # not a prime\nnew_non_primes |\u003e Bloom.has_not?(11) # prime\n# Works!\n\n# Maybe a bit high, 64 KB for 100_000 Numbers\n# is not that much.\nBloom.false_positive_probability(non_primes)\n\n# Let's reestimate .\n# There are 9_592 primes below 100_000, so this\n# should yield about 100_000 - 9_592 = 90_408.\nBloom.estimate_count(non_primes)\n```\nSide note: If you want to check primes, ~~google~~\nsearch for *Miller–Rabin primality test*.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodesteak%2Fflower","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodesteak%2Fflower","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodesteak%2Fflower/lists"}