Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/kootenpv/shrynk
Using Machine Learning to learn how to Compress :zap:
https://github.com/kootenpv/shrynk
Last synced: about 2 months ago
JSON representation
Using Machine Learning to learn how to Compress :zap:
- Host: GitHub
- URL: https://github.com/kootenpv/shrynk
- Owner: kootenpv
- Created: 2019-08-31T07:47:17.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2023-05-01T20:36:19.000Z (over 1 year ago)
- Last Synced: 2024-07-28T17:38:48.871Z (about 2 months ago)
- Language: Python
- Size: 3.07 MB
- Stars: 109
- Watchers: 4
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
[![Build Status](https://travis-ci.org/kootenpv/shrynk.svg?branch=master)](https://travis-ci.org/kootenpv/shrynk)
[![PyPI](https://img.shields.io/pypi/pyversions/shrynk.svg?style=flat-square&logo=python)](https://pypi.python.org/pypi/shrynk/)
[![PyPI](https://img.shields.io/pypi/v/shrynk.svg?style=flat-square&logo=pypi)](https://pypi.python.org/pypi/shrynk/)
[![HitCount](http://hits.dwyl.io/kootenpv/shrynk.svg)](http://hits.dwyl.io/kootenpv/shrynk)You can read the [introductory blog post](https://vks.ai/2019-12-05-shrynk-using-machine-learning-to-learn-how-to-compress) or try it live at https://shrynk.ai
### Features
- ✓ Compress your data smartly based on **Machine Learning**
- ✓ Takes **User Requirements** in the form of weights for `size`, `write_time` and `read_time`
- ✓ Trains & caches a model based on **compression methods available** in the system, using packaged data
- ✓ **CLI** for compressing and decompressing
- ✓ Works with `CSV`, `JSON` and `Bytes` in general### CLI
shrynk compress myfile.json # will yield e.g. myfile.json.gz or myfile.json.bz2
shrynk decompress myfile.json.gz # will yield myfile.jsonshrynk compress myfile.csv --size 0 --write 1 --read 0
shrynk benchmark myfile.csv # shows benchmark results
shrynk benchmark --predict myfile.csv # will also show the current prediction
shrynk benchmark --save --predict myfile.csv # will add the result to the training data too### Usage in Docker
To test shrynk out quickly yourself, you can use the official docker image from DockerHub. It is great not to interfere with an existing python installation.
You can also build the image from scratch by going to [the docker folder here](./docker/) and doing `docker build -t shrynk .` and use `shrynk` instead of `kootenpv/shrynk` above.
In the following commands, replace `~/Downloads` with the folder you want to share with the container (where the file you want to compress is).
```bash
# To see help
docker run --rm -v ~/.shrynk:/root/.shrynk -v ~/Downloads:/data kootenpv/shrynk shrynk --help# To compress a file called train.csv in your ~/Downloads folder
docker run --rm -v ~/.shrynk:/root/.shrynk -v ~/Downloads:/data kootenpv/shrynk \
shrynk compress /data/train.csv# To benchmark and predict the train.csv file in your ~/Downloads folder
docker run --rm -v ~/.shrynk:/root/.shrynk -v ~/Downloads:/data kootenpv/shrynk \
shrynk benchmark --predict /data/train.csv
```### Usage in Python
Installation:
pip install shrynk
Then in Python:
```python
import pandas as pd
from shrynk import save, load# save dataframe compressed
my_df = pd.DataFrame({"a": [1]})
file_path = save(my_df, "mypath.csv")
# e.g. mypath.csv.bz2# load compressed file
loaded_df = load(file_path)
```If you just want the prediction, you can also:
```python
import pandas as pd
from shrynk import inferinfer(pd.DataFrame({"a": [1]}))
# {"engine": "csv", "compression": "bz2"}
```### Add your own data
If you want more control you can do the following:
```python
import pandas as pd
from shrynk import PandasCompressordf = pd.DataFrame({"a": [1, 2, 3]})
pdc = PandasCompressor("default")
pdc.run_benchmarks(df) # adds data to the defaultpdc.train_model(size=3, write=1, read=1)
pdc.predict(df)
```