Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jberryman/unagi-bloomfilter
A fast, cache-efficient, concurrent bloom filter in Haskell
https://github.com/jberryman/unagi-bloomfilter
Last synced: about 2 months ago
JSON representation
A fast, cache-efficient, concurrent bloom filter in Haskell
- Host: GitHub
- URL: https://github.com/jberryman/unagi-bloomfilter
- Owner: jberryman
- License: bsd-3-clause
- Created: 2016-01-10T00:58:14.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2018-04-11T02:42:26.000Z (over 6 years ago)
- Last Synced: 2024-03-15T13:47:21.796Z (10 months ago)
- Language: Haskell
- Size: 271 KB
- Stars: 19
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# unagi-bloomfilter [![Build Status](https://travis-ci.org/jberryman/unagi-bloomfilter.svg)](https://travis-ci.org/jberryman/unagi-bloomfilter)
This library implements a fast concurrent bloom filter, based on bloom-1 from
"Fast Bloom Filters and Their Generalization" by Y Qiao, et al.It's [on hackage](https://hackage.haskell.org/package/unagi-bloomfilter) and
can be installed with$ cabal install unagi-bloomfilter
A bloom filter is a probabilistic, constant-space, set-like data structure
supporting insertion and membership queries. This implementation is backed by
SipHash so can safely consume untrusted inputs.The implementation here compares favorably with traditional set implementations
in a single-threaded context, e.g. here are 10 inserts or lookups compared
across some sets of different sizes:![single-threaded](http://i.imgur.com/gei1LW4.png)
With the llvm backend benchmarks take around 75-85% of the runtime of the
native code gen.Unfortunately writes in particular don't seem to scale currently; i.e.
distributing writes across multiple threads may be _slower_ than in a
single-threaded context, because of memory effects. We plan to export
functionality that would support using the filter here in a concurrent context
with better memory behavior (e.g. a server that shards to a thread-pool which
handles only a portion of the bloom array).![concurrent](http://i.imgur.com/RaUSmZB.png)