https://github.com/akaliutau/timeseries-compressor
Double-Delta Time Series compressor (prototype)
https://github.com/akaliutau/timeseries-compressor
compression-algorithm delta-compression gorilla-db python3 time-series-analysis time-series-compression
Last synced: 9 months ago
JSON representation
Double-Delta Time Series compressor (prototype)
- Host: GitHub
- URL: https://github.com/akaliutau/timeseries-compressor
- Owner: akaliutau
- License: mit
- Created: 2023-02-05T22:47:12.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-01-07T22:04:30.000Z (over 2 years ago)
- Last Synced: 2025-02-28T10:55:22.237Z (over 1 year ago)
- Topics: compression-algorithm, delta-compression, gorilla-db, python3, time-series-analysis, time-series-compression
- Language: Python
- Homepage:
- Size: 4.3 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Time-series Data Compressor
This is a repository with prototype of time-series compressor loosely based on ideas described in the paper
[Gorilla: A Fast, Scalable, In-Memory Time Series Database](https://www.vldb.org/pvldb/vol8/p1816-teller.pdf)
(a copy included into this repository)
The main motivation was to investigate the effect of:
* double delta compression
* XOR compression for distant timestamps
Current implementation is a research _prototype_ (compresses data using double deltas and XOR binary compression for
float numbers)
# Installation
```shell
python3 -m venv venv
source ./venv/bin/activate
pip install -r requirements.txt
```
# Testing
```shell
pytest
python3 compress.py -i data/stock_data.json -o data/stock_data.bin
```
The output will show some compression statistics similar to the next one (stat on number of key and delta records,
plus memory overhead for string cache and schema):
```shell
string cache : 120.0 bytes
schema block : 3144.0 bytes
key record : 1682.125 bytes
delta record : 26319.375 bytes
total 31265.5 bytes
string cache : 7 block(s), avg size = 17.142857142857142 bytes/block
schema block : 2 block(s), avg size = 1572.0 bytes/block
key record : 30 block(s), avg size = 56.07083333333333 bytes/block
delta record : 677 block(s), avg size = 38.876477104874446 bytes/block
```
One block is roughly equivalent to one line of original data in the input file.
The average size of json line is 305 bytes.