https://github.com/lorenzhs/burr
Bumped Ribbon Retrieval and Approximate Membership Query
https://github.com/lorenzhs/burr
amq approximate-membership approximate-membership-query bloom-filter bloom-filter-alternative retrieval
Last synced: 6 months ago
JSON representation
Bumped Ribbon Retrieval and Approximate Membership Query
- Host: GitHub
- URL: https://github.com/lorenzhs/burr
- Owner: lorenzhs
- License: apache-2.0
- Created: 2021-07-19T09:10:32.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2024-12-03T08:30:50.000Z (6 months ago)
- Last Synced: 2024-12-11T04:10:55.565Z (6 months ago)
- Topics: amq, approximate-membership, approximate-membership-query, bloom-filter, bloom-filter-alternative, retrieval
- Language: C++
- Homepage: https://arxiv.org/abs/2109.01892
- Size: 146 KB
- Stars: 39
- Watchers: 4
- Forks: 4
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
## BuRR: Bumped Ribbon Retrieval (and Filters)
BuRR is a static retrieval and approximate membership query data structure with extremely low overhead and fast queries. Our paper introducing BuRR, ["Fast Succinct Retrieval and Approximate Membership Using Ribbon"](https://drops.dagstuhl.de/opus/volltexte/2022/16538/), won the best paper award at the 20th International Symposium on Experimental Algorithms 2022 and is available in full as an open-access publication. For additional details and measurements, please [refer to the preprint on arxiv.org](https://arxiv.org/abs/2109.01892).
"Retrieval" means that you have a set of key-value pairs that you want to represent very compactly. The difference to a hash table is that the data structure may return garbage values when queried with keys not in the set. It's also typically far more compact: a BuRR representation of the data is typically not more than 0.1-0.5% larger than the *values* it represents (no keys are stored).
"Approximate Membership Query" (AMQ) means that you have a set and want to check whether an element is likely in the set. Some well-known examples are Bloom filters, Xor filters, Cuckoo filters, and Quotient filters. A query for an item in the set will always return `true`, while a query for an item *not* in the set *usually* returns `false`, but may return `true` with some small probability known as the *false-positive probability f*. The space required to represent this set depends on *f*, and there is a lower bound of *log2(1/f)* bits per item of the set. Classic Bloom filters use *1.44 log2(1/f)* bits per key, meaning they use 44% more space than needed. With Xor filters, overhead on the order of 20% can be achieved, and Xor+ filters reduce this to approximately 10% for very small values of *f*. BuRR can achieve overheads as low as 0.1%, and even configurations that trade overhead for speed achieve overheads of below 0.5%.
## Building and running
Make sure to fetch all submodules with `git submodule update --init --recursive`, then type `make bench` to compile a benchmark runner that includes a wide range of configurations, or `make tests` to compile the test suite. The scripts used in the evaluation are located in the `scripts` folder. You may also want to refer to [the fastfilter_cpp repository](https://github.com/lorenzhs/fastfilter_cpp) for a comparison to other filter data structures and more benchmarks used in our paper.
## Parallel implementation
You can find a parallel implementation on the [`parallel` branch](https://github.com/lorenzhs/BuRR/tree/parallel) and its brief announcement paper on [arXiv](https://arxiv.org/abs/2411.12365).
## Citation
If you use BuRR in the context of an academic publication, we ask that you please cite our paper:
```bibtex
@inproceedings{BuRR2022,
author={Peter C. Dillinger, Lorenz Hübschle-Schneider, Peter Sanders, and Stefan Walzer},
title={Fast Succinct Retrieval and Approximate Membership using Ribbon},
booktitle={20th International Symposium on Experimental Algorithms (SEA 2022)},
pages={4:1--4:20},
year={2022},
doi={10.4230/LIPIcs.SEA.2022.4}
}
```## License
BuRR is licensed under the Apache 2.0 license. Copyright is held by Lorenz Hübschle-Schneider and Facebook, Inc. It is based on [Peter C. Dillinger's implementation of Standard Ribbon](https://github.com/pdillinger/fastfilter_cpp/tree/dev/src/ribbon), which is copyright Facebook, Inc. and also licensed under the Apache 2.0 license.