Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bytehamster/pachash
Packed and Compressed Hash Tables
https://github.com/bytehamster/pachash
Last synced: 2 months ago
JSON representation
Packed and Compressed Hash Tables
- Host: GitHub
- URL: https://github.com/bytehamster/pachash
- Owner: ByteHamster
- License: gpl-3.0
- Created: 2021-12-15T14:10:26.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2023-11-21T16:20:50.000Z (about 1 year ago)
- Last Synced: 2024-06-12T17:57:35.150Z (6 months ago)
- Language: C++
- Homepage:
- Size: 933 KB
- Stars: 14
- Watchers: 5
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Citation: CITATION.cff
Awesome Lists containing this project
README
# PaCHash
An object store for variable-sized objects, which has small internal-memory space usage
and still guarantees a limited number of external IO operations.
For a given parameter *a*, the internal memory space usage is *2 + log(a)* bits per block.
Queries for objects of size *|x|* take constant time and fetch *1 + |x|/B + 1/a* blocks from external memory.[](https://arxiv.org/pdf/2205.04745)
### Library usage
Add the following to your `CMakeLists.txt`.
```
add_subdirectory(path/to/PaCHash)
target_link_libraries(YourTarget PRIVATE PaCHash)
```### Building the examples
```
git clone --recursive [email protected]:ByteHamster/PaCHash.git
mkdir PaCHash/build
cd PaCHash/build
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j
```### Benchmarks
Alternative methods are implemented for benchmarking.
You can find the full benchmark scripts in an independent repository: https://github.com/ByteHamster/PaCHash-Experiments| Method | External memory utilization | Internal memory usage | IOs per query |
|-------------------|-----------------------------|-----------------------|-------------------|
| PaCHash | 100% | ~6 bits/page | 1 (variable size) |
| Separator Hashing | Up to 98%¹ | ~6 bits/page | 1 |
| Cuckoo Hashing | Up to 98%¹ | Constant | 2 parallel |¹ Depending on the input distribution. The methods are originally designed for fixed size objects.
Adversarial input can therefore bring the utilization down to 51% or even make construction impossible.### License
This code is licensed under the [GPLv3](/LICENSE).
If you use the project in an academic context or publication, please cite [our paper](https://doi.org/10.1137/1.9781611977561.ch14):```
@inproceedings{kurpicz2023pachash,
author = {Florian Kurpicz and
Hans{-}Peter Lehmann and
Peter Sanders},
title = {PaCHash: Packed and Compressed Hash Tables},
booktitle = {{ALENEX}},
pages = {162--175},
publisher = {{SIAM}},
year = {2023},
doi = {10.1137/1.9781611977561.CH14}
}
```