https://github.com/akaliutau/timeseries-compressor

Double-Delta Time Series compressor (prototype)
https://github.com/akaliutau/timeseries-compressor

compression-algorithm delta-compression gorilla-db python3 time-series-analysis time-series-compression

Last synced: 9 months ago
JSON representation

Double-Delta Time Series compressor (prototype)

Host: GitHub
URL: https://github.com/akaliutau/timeseries-compressor
Owner: akaliutau
License: mit
Created: 2023-02-05T22:47:12.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2024-01-07T22:04:30.000Z (over 2 years ago)
Last Synced: 2025-02-28T10:55:22.237Z (over 1 year ago)
Topics: compression-algorithm, delta-compression, gorilla-db, python3, time-series-analysis, time-series-compression
Language: Python
Homepage:
Size: 4.3 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Time-series Data Compressor

This is a repository with prototype of time-series compressor loosely based on ideas described in the paper
[Gorilla: A Fast, Scalable, In-Memory Time Series Database](https://www.vldb.org/pvldb/vol8/p1816-teller.pdf)
(a copy included into this repository)

The main motivation was to investigate the effect of:

* double delta compression
* XOR compression for distant timestamps

Current implementation is a research _prototype_ (compresses data using double deltas and XOR binary compression for
float numbers)

# Installation

```shell
python3 -m venv venv
source ./venv/bin/activate
pip install -r requirements.txt
```

# Testing

```shell
pytest
python3 compress.py -i data/stock_data.json -o data/stock_data.bin
```

The output will show some compression statistics similar to the next one (stat on number of key and delta records,
plus memory overhead for string cache and schema):

```shell
string cache : 120.0 bytes
schema block : 3144.0 bytes
key record : 1682.125 bytes
delta record : 26319.375 bytes
total 31265.5 bytes
string cache : 7 block(s), avg size = 17.142857142857142 bytes/block
schema block : 2 block(s), avg size = 1572.0 bytes/block
key record : 30 block(s), avg size = 56.07083333333333 bytes/block
delta record : 677 block(s), avg size = 38.876477104874446 bytes/block
```

One block is roughly equivalent to one line of original data in the input file.
The average size of json line is 305 bytes.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/akaliutau/timeseries-compressor

Awesome Lists containing this project

README