https://github.com/crystal-community/bloom_filter

Bloom filter implementation in Crystal lang
https://github.com/crystal-community/bloom_filter

bloom-filter crystal

Last synced: 6 months ago
JSON representation

Bloom filter implementation in Crystal lang

Host: GitHub
URL: https://github.com/crystal-community/bloom_filter
Owner: crystal-community
License: mit
Created: 2016-02-06T11:02:48.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2023-08-28T12:42:43.000Z (almost 2 years ago)
Last Synced: 2024-10-30T16:40:44.252Z (8 months ago)
Topics: bloom-filter, crystal
Language: Crystal
Size: 28.3 KB
Stars: 34
Watchers: 5
Forks: 6
Open Issues: 2
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE

Awesome Lists containing this project

awesome-crystal - bloom_filter - Implementation of Bloom filter (Caching)
awesome-crystal - bloom_filter - Implementation of Bloom filter (Caching)

README

        # Bloom Filter [![Build Status](https://travis-ci.org/crystal-community/bloom_filter.svg?branch=master)](https://travis-ci.org/crystal-community/bloom_filter)

Implementation of [Bloom Filter](https://en.wikipedia.org/wiki/Bloom_filter) in [Crystal lang](http://crystal-lang.org/).

* [Installation](#installation)

* [Usage](#usage)

  * [Basic](#basic)

  * [Creating a filter with optimal parameters](#creating-a-filter-with-optimal-parameters)

  * [Dumping to file and loading](#dumping-into-a-file-and-loading)

  * [Union and intersection](#union-and-intersection)

  * [Visualization](#visualization)

* [Benchmark](#benchmark)

* [Contributors](#contributors)

## Installation

Add this to your application's `shard.yml`:

```yaml

dependencies:

  bloom_filter:

    github: crystal-community/bloom_filter

```

## Usage

### Basic

```crystal

require "bloom_filter"

# Create filter with bitmap size of 32 bytes and 3 hash functions.

filter = BloomFilter.new(bytesize = 32, hash_num = 3)

# Insert elements

filter.insert("Esperanto")

filter.insert("Toki Pona")

# Check elements presence

filter.has?("Esperanto")  # => true

filter.has?("Toki Pona")  # => true

filter.has?("Englsh")     # => false

```

### Creating a filter with optimal parameters

Based on your needs(expected number of items and desired probability of false positives),

your can create an optimal bloom filter:

```crystal

# Create a filter, that with one million inserted items, gives 2% of false positives for #has? method

filter = BloomFilter.new_optimal(1_000_000, 0.02)

filter.bytesize # => 1017796 (993Kb)

filter.hash_num # => 6

```

### Dumping into a file and loading

It's possible to save existing bloom filter as a binary file and then load it back.

```crystal

filter = BloomFilter.new_optimal(2, 0.01)

filter.insert("Esperanto")

filter.dump_file("/tmp/bloom_languages")

loaded_filter = BloomFilter.load_file("/tmp/bloom_languages")

loaded_filter.has?("Esperanto") # => true

loaded_filter.has?("English")   # => false

```

### Union and intersection

Having two filters of the same size and number of hash functions, it's possible

to perform union and intersection operations:

```crystal

f1 = BloomFilter.new(32, 3)

f1.insert("Esperanto")

f1.insert("Spanish")

f2 = BloomFilter.new(32, 3)

f2.insert("Esperanto")

f2.insert("English")

# Union

f3 = f1 | f2

f3.has?("Esperanto") # => true

f3.has?("Spanish")   # => true

f3.has?("English")   # => true

# Intersection

f4 = f1 & f2

f4.has?("Esperanto") # => true

f4.has?("Spanish")   # => false

f4.has?("English")   # => false

```

### Visualization

If you want to see how your filter looks like, you can visualize it:

```crystal

f1 = BloomFilter.new(16, 2)

f1.insert("Esperanto")

puts "f1 = (Esperanto)"

puts f1.visualize

f2 = BloomFilter.new(16, 2)

f2.insert("Spanish")

puts "f2 = (Spanish)"

puts f2.visualize

f3 = f1 | f2

puts "f3 = f1 | f2 = (Esperanto, Spanish)"

puts f3.visualize

```

Output:

```

f1 = (Esperanto)

░░░░░░░░ ░░░░░░█░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░

░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░█ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░

f2 = (Spanish)

░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░

░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░█░ ░█░░░░░░

f3 = f1 | f2 = (Esperanto, Spanish)

░░░░░░░░ ░░░░░░█░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░

░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░░ ░░░░░░░█ ░░░░░░░░ ░░░░░░█░ ░█░░░░░░

```

In this way, you can actually see which bits are set:)

## Benchmark

Performance of Bloom filter depends on the following parameters:

* Size of the filter

* Number of hash functions

* Length of the input string

To run benchmark from `./samples/benchmark.cr`, simply run make task:

```

$ make benchmark

Number of items: 100000000

Filter size: 117005Kb

Hash functions: 7

String size: 13

                     user     system      total        real

insert           0.004227   0.000000   0.004227 (  2.769349)

has? (present)   0.007980   0.000000   0.007980 (  5.223778)

has? (missing)   0.004318   0.000000   0.004318 (  2.829521)

```

## Contributors

- [greyblake](https://github.com/greyblake) Potapov Sergey - creator, maintainer

- [funny-falcon](https://github.com/funny-falcon) Sokolov Yura - better hash algorithms

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/crystal-community/bloom_filter

Awesome Lists containing this project

README