https://github.com/yanndubs/lossyless

Generic image compressor for machine learning. Pytorch code for our paper "Lossy compression for lossless prediction".
https://github.com/yanndubs/lossyless
compression deep-learning machine-learning python pytorch self-supervised-learning
Last synced: 7 months ago
JSON representation
Generic image compressor for machine learning. Pytorch code for our paper "Lossy compression for lossless prediction".
Host: GitHub
URL: https://github.com/yanndubs/lossyless
Owner: YannDubs
License: mit
Created: 2020-11-19T20:15:06.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2022-08-19T08:54:14.000Z (about 3 years ago)
Last Synced: 2025-02-27T18:01:03.135Z (8 months ago)
Topics: compression, deep-learning, machine-learning, python, pytorch, self-supervised-learning
Language: Python
Homepage:
Size: 3.27 MB
Stars: 117
Watchers: 8
Forks: 9
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # Lossy Compression for Lossless Prediction [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://github.com/YannDubs/lossyless/blob/main/LICENSE) [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/release/python-380/)

**Using:** [![Using](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YannDubs/lossyless/blob/main/notebooks/Hub.ipynb) 

**Training:** [![Training](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YannDubs/lossyless/blob/main/notebooks/minimal_code.ipynb)

This repostiory contains our implementation of the paper: [**Lossy Compression for Lossless Prediction**](https://arxiv.org/abs/2106.10800). That formalizes and empirically inverstigates unsupervised training for task-specific compressors.

## Using the compressor 

[![Using](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YannDubs/lossyless/blob/main/notebooks/Hub.ipynb)

If you want to use our compressor directly the easiest is to use the model from torch hub as seen in the google colab (or `notebooks/Hub.ipynb`) or th example below.

  Installation details

  ```bash

  pip install torch torchvision tqdm numpy compressai sklearn git+https://github.com/openai/CLIP.git

  ```

  Using pytorch`>1.7.1` : CLIP forces pytorch version `1.7.1`, this is because it needs this version to use JIT. If you don't need JIT (no JIT by default) you can alctually use more recent versions of torch and torchvision `pip install -U torch torchvision`. Make sure to update after having isntalled CLIP.

----------------------

```python

import time

import torch

from sklearn.svm import LinearSVC

from torchvision.datasets import STL10

DATA_DIR = "data/"

# list available compressors. b01 compresses the most (b01 > b005 > b001)

torch.hub.list('YannDubs/lossyless:main') 

# ['clip_compressor_b001', 'clip_compressor_b005', 'clip_compressor_b01']

# Load the desired compressor and transformation to apply to images (by default on GPU if available)

compressor, transform = torch.hub.load('YannDubs/lossyless:main','clip_compressor_b005')

# Load some data to compress and apply transformation

stl10_train = STL10(

    DATA_DIR, download=True, split="train", transform=transform

)

stl10_test = STL10(

    DATA_DIR, download=True, split="test", transform=transform

)

# Compresses the datasets and save them to file (this requires GPU)

# Rate: 1506.50 bits/img | Encoding: 347.82 img/sec

compressor.compress_dataset(

    stl10_train,

    f"{DATA_DIR}/stl10_train_Z.bin",

    label_file=f"{DATA_DIR}/stl10_train_Y.npy",

)

compressor.compress_dataset(

    stl10_test,

    f"{DATA_DIR}/stl10_test_Z.bin",

    label_file=f"{DATA_DIR}/stl10_test_Y.npy",

)

# Load and decompress the datasets from file the datasets (does not require GPU)

# Decoding: 1062.38 img/sec

Z_train, Y_train = compressor.decompress_dataset(

    f"{DATA_DIR}/stl10_train_Z.bin", label_file=f"{DATA_DIR}/stl10_train_Y.npy"

)

Z_test, Y_test = compressor.decompress_dataset(

    f"{DATA_DIR}/stl10_test_Z.bin", label_file=f"{DATA_DIR}/stl10_test_Y.npy"

)

# Downstream STL10 evaluation. Accuracy: 98.65% | Training time: 0.5 sec

clf = LinearSVC(C=7e-3)

start = time.time()

clf.fit(Z_train, Y_train)

delta_time = time.time() - start

acc = clf.score(Z_test, Y_test)

print(

    f"Downstream STL10 accuracy: {acc*100:.2f}%.  \t Training time: {delta_time:.1f} "

)

```

## Minimal training code

[![Training](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/YannDubs/lossyless/blob/main/notebooks/minimal_code.ipynb)

If your goal is to look at a minimal version of the code to simply understand what is going on, I would highly recommend starting from `notebooks/minimal_compressor.ipynb` (or google colab link above). This is a notebook version of the code provided in Appendix E.7. of the paper, to quickly train and evaluate our compressor. 

  Installation details

  1. `pip install git+https://github.com/openai/CLIP.git`

  2. `pip uninstall -y torchtext` (probably not necessary but can cause issues if got installed as wrong pytorch version)

  3. `pip install scikit-learn==0.24.2 lightning-bolts==0.3.4 compressai==1.1.5 pytorch-lightning==1.3.8`

  Using pytorch`>1.7.1` : CLIP forces pytorch version `1.7.1` you should be able to use a more recent versions.  E.g.:

  1. `pip install git+https://github.com/openai/CLIP.git`

  2. `pip install -U torch torchvision scikit-learn lightning-bolts compressai pytorch-lightning`

## Results from the paper

We provide scripts to essentially replicate some results from the paper. The exact results will be a little different as we simplified and cleaned some of the code to help readability. All scripts can be found in `bin` and run using the command `bin/*/.sh`.

Installation details

1. Clone repository

2. Install [PyTorch](https://pytorch.org/) >=  1.7

3. `pip install -r requirements.txt`

### Other installation

- For the bare minimum packages: use `pip install -r requirements_mini.txt` instead.

- For conda: use  `conda env update --file requirements/environment.yaml`.

- For docker: we provide a dockerfile at `requirements/Dockerfile`.

### Notes 

- CLIP forces pytorch version `1.7.1`, this is because it needs this version to use JIT. We don't use JIT so you can alctually use more recent versions of torch and torchvision `pip install -U torch torchvision`.

- For better logging: `hydra` and `pytorch lightning` logging don't work great together, to have a better logging experience you should comment out the folowing lines in `pytorch_lightning/__init__.py` :

```python

if not _root_logger.hasHandlers():

     _logger.addHandler(logging.StreamHandler())

     _logger.propagate = False

```

### Test installation

To test your installation and that everything works as desired you can run `bin/test.sh`, which will run an epoch of BICNE and VIC on MNIST.

----------------------

Scripts details

All scripts can be found in `bin` and run using the command `bin/*/.sh`. This will save all results, checkpoints, logs... The most important results (including summary resutls and figures) will be saved at `results/exp_`. Most important are the summarized metrics `results/exp_*/summarized_metrics_merged.csv` and any figures `results/exp_*/*.png`.

The key experiments that that do not require very large compute are:

- VIC/VAE on rotation invariant Banana distribution: `bin/banana/banana_viz_VIC.sh`

- VIC/VAE on augmentation invariant MNIST: `bin/mnist/augmist_viz_VIC.sh`

- CLIP experiments: `bin/clip/main_linear.sh`

By default all scripts will log results on [weights and biases](https://wandb.ai/site). If you have an account (or make one) you should set your username in `conf/user.yaml` after `wandb_entity:`, the passwod should be set directly in your environment variables. If you prefer not logging, you can use the command `bin/*/.sh -a logger=csv` which changes (`-a` is for append) the default `wandb` logger to a `csv` logger.

Generally speaking you can change any of the parameters either directly in `conf/**/.yaml` or by adding `-a` to the script. We are using [Hydra](https://hydra.cc/) to manage our configurations, refer to their documentation if something is unclear.

If you are using [Slurm](https://slurm.schedmd.com/documentation.html) you can submit directly the script on servers by adding a config file under `conf/slurm/.yaml`, and then running the script as `bin/*/.sh -s `. For example configurations files for slurm see `conf/slurm/vector.yaml` or `conf/slurm/learnfair.yaml`. For more information check the documentation from [submitit's plugin](https://hydra.cc/docs/plugins/submitit_launcher) which we are using.

----------------------

### VIC/VAE on rotation invariant Banana

Command: 

```bash

bin/banana/banana_viz_VIC.sh

``` 

The following figures are saved automatically at `results/exp_banana_viz_VIC/**/quantization.png`. On the left we see the quantization of the Banana distribution by a standard compressor (called `VAE` in code but VC in paper). On the right, by our (rotation) invariant compressor (`VIC`).



  

   



### VIC/VAE on augmentend MNIST

Command: 

```bash

bin/banana/augmnist_viz_VIC.sh

``` 

The following figure is saved automatically at `results/exp_augmnist_viz_VIC/**/rec_imgs.png`. It shows source augmented MNIST images as well as the reconstructions using our invariant compressor.

![Invariant compression of augmented MNIST](/results/exp_augmnist_viz_VIC/datafeat_mnist_aug/feat_neural_rec/dist_VIC/enc_resnet18/rate_H_hyper/optfeat_AdamW_lr1.0e-03_w1.0e-05/schedfeat_expdecay100/zdim_128/zs_1/beta_1.0e-01/seed_123/addfeat_None/rec_imgs.png

)

### CLIP compressor

Command: 

```bash

bin/clip/main_small.sh

``` 

The following table comes directly from the results which are automatically saved at `results/exp_clip_bottleneck_linear_eval/**/datapred_*/**/results_predictor.csv`. It shows the result of compression from our CLIP compressor on many datasets.

|               | Cars196 | STL10 | Caltech101 | Food101 | PCam | Pets37 | CIFAR10 | CIFAR100 |

|---------------|:-------:|:-----:|:----------:|:-------:|:----:|:------:|:-------:|:--------:|

| Rate [bits]   |   1471  |  1342 |    1340    |   1266  | 1491 |  1209  |   1407  |   1413   |

| Test Acc. [%] |   80.3  |  98.5 |    93.3    |   83.8  | 81.1 |  88.8  |   94.6  |   79.0   |

Note: ImageNet is too large for training a SVM using SKlearn. You need to run MLP evaluation with `bin/clip/clip_bottleneck_mlp_eval`. Also you have to download ImageNet manually.

## Cite

You can read the full paper [here](https://arxiv.org/abs/2106.10800). Please cite our paper if you use our model:

```bibtex

@inproceedings{

    dubois2021lossy,

    title={Lossy Compression for Lossless Prediction},

    author={Yann Dubois and Benjamin Bloem-Reddy and Karen Ullrich and Chris J. Maddison},

    booktitle={Advances in Neural Information Processing Systems (NeurIPS)},

    year={2021}

}

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/yanndubs/lossyless

Awesome Lists containing this project

README