https://github.com/jaybaird/python-bloomfilter

Scalable Bloom Filter implemented in Python
https://github.com/jaybaird/python-bloomfilter

Last synced: 6 months ago
JSON representation

Scalable Bloom Filter implemented in Python

Host: GitHub
URL: https://github.com/jaybaird/python-bloomfilter
Owner: jaybaird
License: mit
Archived: true
Created: 2008-12-12T00:46:27.000Z (over 17 years ago)
Default Branch: master
Last Pushed: 2021-07-01T08:40:04.000Z (almost 5 years ago)
Last Synced: 2025-03-01T01:47:28.072Z (over 1 year ago)
Language: Python
Homepage:
Size: 333 KB
Stars: 1,620
Watchers: 50
Forks: 330
Open Issues: 25
Metadata Files:
- Readme: README.rst
- Changelog: CHANGES.txt
- License: LICENSE.txt

Awesome Lists containing this project

awesome-algorithms - python-bloomfilter - Scalable Bloom Filter implemented in Python (Awesome Algorithms / bloom - Bloom Filter (布隆过滤器))
awesome-python-learning - [jaybaird/python-bloomfilter: Scalable Bloom Filter implemented in Python

README

          pybloom

=======

.. image:: https://travis-ci.org/jaybaird/python-bloomfilter.svg?branch=master

    :target: https://travis-ci.org/jaybaird/python-bloomfilter

``pybloom`` is a module that includes a Bloom Filter data structure along with

an implmentation of Scalable Bloom Filters as discussed in:

P. Almeida, C.Baquero, N. Preguiça, D. Hutchison, Scalable Bloom Filters,

(GLOBECOM 2007), IEEE, 2007.

Bloom filters are great if you understand what amount of bits you need to set

aside early to store your entire set. Scalable Bloom Filters allow your bloom

filter bits to grow as a function of false positive probability and size.

A filter is "full" when at capacity: M * ((ln 2 ^ 2) / abs(ln p)), where M

is the number of bits and p is the false positive probability. When capacity

is reached a new filter is then created exponentially larger than the last

with a tighter probability of false positives and a larger number of hash

functions.

.. code-block:: python

    >>> from pybloom import BloomFilter

    >>> f = BloomFilter(capacity=1000, error_rate=0.001)

    >>> [f.add(x) for x in range(10)]

    [False, False, False, False, False, False, False, False, False, False]

    >>> all([(x in f) for x in range(10)])

    True

    >>> 10 in f

    False

    >>> 5 in f

    True

    >>> f = BloomFilter(capacity=1000, error_rate=0.001)

    >>> for i in xrange(0, f.capacity):

    ...     _ = f.add(i)

    >>> (1.0 - (len(f) / float(f.capacity))) <= f.error_rate + 2e-18

    True

    >>> from pybloom import ScalableBloomFilter

    >>> sbf = ScalableBloomFilter(mode=ScalableBloomFilter.SMALL_SET_GROWTH)

    >>> count = 10000

    >>> for i in xrange(0, count):

    ...     _ = sbf.add(i)

    ...

    >>> (1.0 - (len(sbf) / float(count))) <= sbf.error_rate + 2e-18

    True

    # len(sbf) may not equal the entire input length. 0.01% error is well

    # below the default 0.1% error threshold. As the capacity goes up, the

    # error will approach 0.1%.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/jaybaird/python-bloomfilter

Awesome Lists containing this project

README