https://github.com/spiraldb/onpair
The rust implementation of the onpair string compression encoding
https://github.com/spiraldb/onpair
Last synced: 18 days ago
JSON representation
The rust implementation of the onpair string compression encoding
- Host: GitHub
- URL: https://github.com/spiraldb/onpair
- Owner: spiraldb
- License: apache-2.0
- Created: 2026-05-28T11:04:40.000Z (about 1 month ago)
- Default Branch: develop
- Last Pushed: 2026-06-14T23:44:46.000Z (19 days ago)
- Last Synced: 2026-06-15T01:14:58.324Z (18 days ago)
- Language: Rust
- Size: 613 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# onpair
OnPair is a dictionary-based string compression algorithm designed for on-disk and in-memory database workloads that need both strong compression ratios and fast random access to individual values.
It builds its dictionary in a single sequential pass by incrementally merging frequent adjacent substrings, achieving compression comparable to BPE while being substantially faster and more memory-efficient.
## Interchange format
OnPair defines a shared in-memory representation — the *plain interchange form*
that independent implementations exchange so a column produced by one is
readable by another. It fixes the buffers (dictionary bytes, dictionary
offsets, codes, and row offsets) and their invariants; denser internal
encodings and on-disk serialization are out of scope. See
[docs/interchange-format.md](docs/interchange-format.md).
## References
- Paper: Francesco Gargiulo et al., *OnPair: Short Strings Compression for Fast Random Access* — [arXiv:2508.02280](https://arxiv.org/abs/2508.02280)
- Reference C++ implementation: [gargiulofrancesco/onpair_cpp](https://github.com/gargiulofrancesco/onpair_cpp)