Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/yosriady/bloom_filter
:sunflower: Bloom Filter implementation in pure Elixir
https://github.com/yosriady/bloom_filter
Last synced: about 1 month ago
JSON representation
:sunflower: Bloom Filter implementation in pure Elixir
- Host: GitHub
- URL: https://github.com/yosriady/bloom_filter
- Owner: yosriady
- Created: 2016-05-18T15:49:55.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2021-01-08T15:42:13.000Z (almost 4 years ago)
- Last Synced: 2024-10-29T08:40:44.878Z (about 2 months ago)
- Language: Elixir
- Homepage: http://hex.pm/packages/bloom_filter
- Size: 186 KB
- Stars: 13
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# BloomFilter
Bloom Filter implementation in Elixir. Bloom filters are probabilistic data structures designed
to efficiently tell you whether an element is present in a set.[![Travis](https://travis-ci.org/yosriady/bloom_filter.svg)](https://travis-ci.org/yosriady/bloom_filter)
[![Coverage Status](https://coveralls.io/repos/github/yosriady/bloom_filter/badge.svg?branch=master)](https://coveralls.io/github/yosriady/bloom_filter?branch=master)
[![Hex.pm](https://img.shields.io/hexpm/v/bloom_filter.svg?maxAge=2592000)](https://hex.pm/packages/bloom_filter)### [Hex](http://hex.pm/packages/bloom_filter)
### [API Documentation](https://hexdocs.pm/bloom_filter/)## Installation
Add bloom_filter to your list of dependencies in `mix.exs`:
def deps do
[{:bloom_filter, "~> 1.0.0"}]
end## Usage
iex> f = BloomFilter.new 100, 0.001 # Create a bloom filter with an expected capacity 100 and desired false positive rate < 0.001
iex> f = BloomFilter.add(f, 42)
iex> BloomFilter.has?(f, 42)
true## Running Tests
```
mix test
```## Background
A [Bloom filter](https://en.wikipedia.org/wiki/Bloom_filter) is a space-efficient data structure designed to efficiently tell you whether an element is present in a set. Both insertion and membership operations theoretically cost a constant time `O(k)`, where `k` is the number of hash functions used in the filter.
The price paid for this efficiency is that a Bloom filter is a probabilistic data structure: it tells us that the element either is *definitely NOT in* the set or *MAYBE in* the set.
![](https://upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Bloom_filter.svg/360px-Bloom_filter.svg.png)
The filter is essentially a vector of bits (`0` or `1`) of some size `m`. When we `add` a new `item` to the filter we map `item` to `k` number of hash functions, which gives us `k` indices. We then set the bits on these indices to `1`.
When we want to check if our filter `has?` a particular `item`, we feed it to our hash functions again, and check if `any?` of the bits are `0` or `all?` the bits are `1`. If there are bits that are not set, `item` is **definitely** not in the set. If all the bits are set, `item` is **probably** in the set, since false positives can happen due to [collision](https://en.wikipedia.org/wiki/Collision_(computer_science)).
Bloom filters are best suited for applications where the amount of source data would require an impractically large amount of memory if "conventional" error-free hashing techniques were applied.
## Implementation Details
`bloom_filter` uses two hash functions [`:erlang.phash2`](http://erlang.org/doc/man/erlang.html#phash2-2), [`Fowler–Noll–Vo`](https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function), and the [Double Hashing](https://en.wikipedia.org/wiki/Double_hashing) technique to generate an arbitrary number of independent hash functions.
`bloom_filter` also automatically optimizes the optimal size of the bit vector and the number of hash functions required to attain the user's desired error rate.
## Running Type Checker
> You need to have [dialyxir](https://github.com/jeremyjh/dialyxir) installed.
```
mix dialyzer
```## Contributing
1. Fork it ( http://github.com/Leventhan/bloom_filter/fork )
2. Create your feature branch (`git checkout -b feature/my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin feature/my-new-feature`)
5. Create new Pull Request (Remember to squash your commits!)> Report any found bugs or errors using [the issue tracker](https://github.com/Leventhan/bloom_filter/issues).
## Thanks
**bloom_filter** © 2016+, Yos Riady. Released under the [MIT] License.
Authored and maintained by Yos Riady with help from contributors ([list][contributors]).> [yos.io](http://yos.io) ·
> GitHub [@yosriady](https://github.com/yosriady)[MIT]: http://mit-license.org/
[contributors]: http://github.com/yosriady/bloom_filter/contributors