https://github.com/daedalus/fastbloomfilter
A pure, simple and fast pythonic bloom filter
https://github.com/daedalus/fastbloomfilter
blake2b bloom-filter bloomfilter bz2 cryptography fast hash logging lz4 lzma probabilistic sha256 sha3 zlib
Last synced: 2 months ago
JSON representation
A pure, simple and fast pythonic bloom filter
- Host: GitHub
- URL: https://github.com/daedalus/fastbloomfilter
- Owner: daedalus
- License: gpl-3.0
- Created: 2017-03-22T02:42:51.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2025-03-11T16:05:49.000Z (7 months ago)
- Last Synced: 2025-04-14T10:36:55.441Z (6 months ago)
- Topics: blake2b, bloom-filter, bloomfilter, bz2, cryptography, fast, hash, logging, lz4, lzma, probabilistic, sha256, sha3, zlib
- Language: Python
- Homepage:
- Size: 126 KB
- Stars: 10
- Watchers: 2
- Forks: 8
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://libraries.io/pypi/fastBloomFilter)




[](https://github.com/daedalus/fastBloomFilter/issues)
[](https://github.com/daedalus/fastBloomFilter/network)
[](https://github.com/daedalus/fastBloomFilter/stargazers)
[](https://github.com/daedalus/fastBloomFilter)# Simple and fast pythonic bloomfilter
From wikipedia: "A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". Elements can be added to the set, but not removed (though this can be addressed with a "counting" filter); the more elements that are added to the set, the larger the probability of false positives."
### This filter supports: ###
```
- Saving, reloading with pickle.
- Stats
- Entropy analysis
- Internal and external hashing of data.
- raw filter merging
```### Installing: ###
```
sudo pip install fastbloomfilter
```### External creation of the bloom filter file: ###
```
python mkbloom.py /tmp/filter.blf
```### Importing: ###
```
>>> from fastBloomFilter import bloom
>>> bf = bloom.BloomFilter(array_size=1024**3)
```Or
```
>>> from fastBloomFilter import bloom
>>> bf = bloom.BloomFilter(filename='/tmp/filter.blf')
```### Adding data to it: ###
```
>>> bf.add('30000')
>>> bf.add('1230213')
>>> bf.add('1')
```### Printing stats: ###
```
>>> bf.stat()
```
Or:```
>>> bf.info()
```### Querying data: ###
```
>>> print(bf.query('1'))
True
>>> print(bf.query('1230213'))
True
>>> print(bf.query('12'))
False
>>> print(bf['1'])
True
```### Querying data and at the same time adding it: ###
```
>>> print(bf.update('1'))
False
# False means the object wasn't existing and was added.
>>> print(bf.update('1'))
True
# True means the object existed and nothing new was added.
>>> print(bf.update('2'))
False
>>> print(bf.update('2'))
True
```### Merging two filters: ###
Create first filter:
```
>>> from fastBloomFilter import bloom
>>> bf1 = bloom.BloomFilter(array_size=1024**3)
>>> bf1.add("1")
```
Create second filter:
```
>>> from fastBloomFilter import bloom
>>> bf2 = bloom.BloomFilter(array_size=1024**3)
>>> bf2.add("2")
```
Merge the two filters into a third filter:
```
>>> bf3 = bf1 + bf2
```
Check the elements in the third filter:
```
>>> print(bf3["1"])
True
>>> print(bf3["2"])
True
```### Contributing ###
```
Contributons:
Are welcome!
Criteria: - They should not include hidden folders or files of any ide environment.
- They should not delete big portions of the project.
- They should not include files that does not have anything to do with the project.
- They should not change the API. (API changes should be proposed with Issues as enhancements)
- They should not include any obfuscated code.
- They should not include binaries.
- They should be in small PRs for faster reviewing process.
- They should include a small testcase.
- Any contribution not hornoring this criteria will be rejected until it does.
```