https://github.com/searchivarius/pyfastpfor
Python bindings for the fast integer compression library FastPFor.
https://github.com/searchivarius/pyfastpfor
compression-algorithm compression-schemes simd-compression sorted-lists
Last synced: 2 months ago
JSON representation
Python bindings for the fast integer compression library FastPFor.
- Host: GitHub
- URL: https://github.com/searchivarius/pyfastpfor
- Owner: searchivarius
- License: apache-2.0
- Created: 2018-02-16T03:59:23.000Z (about 7 years ago)
- Default Branch: master
- Last Pushed: 2023-10-23T09:23:13.000Z (over 1 year ago)
- Last Synced: 2025-02-18T04:32:54.729Z (3 months ago)
- Topics: compression-algorithm, compression-schemes, simd-compression, sorted-lists
- Language: C++
- Homepage:
- Size: 267 KB
- Stars: 58
- Watchers: 3
- Forks: 9
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://pypi.python.org/pypi/pyfastpfor/)
[](https://pepy.tech/project/pyfastpfor)
# PyFastPFor
Python bindings for the fast **light-weight** integer compression library [FastPFor](https://github.com/lemire/FastPFor): A research library with integer compression schemes. FastPFor is broadly applicable to the compression of arrays of 32-bit integers where most integers are small. The library seeks to exploit SIMD instructions (SSE) whenever possible. This library can decode at least 4 billions of compressed integers per second on most desktop or laptop processors. That is, it can decompress data at a rate of 15 GB/s. This is significantly faster than generic codecs like gzip, LZO, Snappy or LZ4.# Authors
Daniel Lemire, Leonid Boytsov, Owen Kaser, Maxime Caron, Louis Dionne, Michel Lemay, Erik Kruus, Andrea Bedini, Matthias Petri, Robson Braga Araujo, Patrick Damme. Bindings are created by Leonid Boytsov.
# Installation
Bindings can be installed locally:
```
cd python_bindings
pip install -r requirements.txt
sudo setup.py build install
```
or via pip:
```
pip install pyfastpfor
```
Due to some compilation quirks this currently seem to work with GCC only. I will fix it in some not so distant future. You may also need to install Python dev-files. On Ubuntu, for Python 3 you can do it as follows:```
sudo apt-get install python3-dev
```# Documentation
The library supports all the codecs implemented in the original [FastPFor](https://github.com/lemire/FastPFor) library by July 2023. To get a list of codecs, use the function ``getCodecList``.
Typical light-weight compression does not take context into account and, consequently, works well only for small integers. When integers are large, data differencing is a common trick to make integers small. In particular, we often deal with sorted lists of integers, which can be represented by differences between neighboring numbers.
The smallest differences (**fine** deltas) are between adjacent numbers. Respective differencing and difference inverting functions are ``delta1'' and ``prefixSum1''.
However, we can do reasonably well, we compute differences between numbers that are four positions apart (**coarse** deltas). Such differences can be computed and inverted more efficiently. Respective differencing and difference inverting functions are ``delta4'' and ``prefixSum4''.
Examples of three common use scenarios (no differencing, coarse and fine deltas) are outlined in [this Python notebook](python_bindings/examples.ipynb).