https://github.com/aimaster-dev/image-compression-by-autoencoder

Last synced: 11 months ago
JSON representation
Host: GitHub
URL: https://github.com/aimaster-dev/image-compression-by-autoencoder
Owner: aimaster-dev
Created: 2024-08-21T03:00:23.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-08-21T03:02:31.000Z (over 1 year ago)
Last Synced: 2025-03-29T23:22:06.442Z (12 months ago)
Language: Jupyter Notebook
Size: 9.48 MB
Stars: 6
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # Image compression using neural auto-encoder and quantization

This project is a simple implementation of auto-encoder neural network for image compression.

The auto-encoder neural network is trained on the ImageNet dataset. The trained model is then used to compress and

decompress the images.

## Navigation:

* [Model architecture](#model-architecture)

* [Download pretrained models](#download-pretrained-models)

* [Quantization](#quantization)

* [Quick start](#quick-start)

* [Compression](#compression)

* [Decompression](#decompression)

* [Training from scratch](#training-from-scratch)

* [Results](#results)

* [Notebooks](#notebooks)

## Model architecture

Model represents a variational auto-encoder with residual blocks and skip connections.

* Encoder: _ResNet-18 architecture with fully connected layers_

* Decoder: _ResNet-18 architecture with transposed convolution layers_

* Loss: _VGG loss + MSE loss_

* Optimizer: _Adam optimizer_

## Download pretrained models

Models were trained

on [130k Images (512x512) - Universal Image Embeddings](https://www.kaggle.com/datasets/rhtsingh/130k-images-512x512-universal-image-embeddings)

dataset from Kaggle.

Here are the links to download the pretrained models:

_B = number of quantization levels_

* [B=2, resnet18](https://drive.google.com/drive/folders/1FaeWzeRW3BMqqZwGsHUjhf7PuAOsiY6E?usp=sharing)

* [B=8, resnet18](https://drive.google.com/drive/folders/1fYDc0e43cUR7xsIYatpz8fdJ_6KMJmSs?usp=sharing)

Put downloaded models in `models` directory.

## Quantization

Model outputs feature maps with 512 channels and 8 x 8 spatial dimensions. Then the feature map are flattened and

become a vector of size 32768. The vector is then quantized into `B` quantization levels.

### Train quantization

In training phase `noise` is appended to the input image. The `noise` is sampled from N(-0.5, 0.5) and then noise scaled

by

`B` quantization levels. So the final noise vector is

```python

scale = 2 ** -B

noise = (torch.randn(n) * 0.5 - 0.5) * scale

```

### Inference quantization

In inference mode vector is quantized using `torch.clamp(0, 1)` and then scaled by `B` quantization levels.

So the final quantized vector is

```python

quantized = torch.clamp(vector, 0, 1) * 2 ** B + 0.5

quantized = quantized.int()

```

## Quick start

[compress_all.sh](scripts/compress_all.sh) compresses all images from `assets/images` directory and saves them

in `assets/compressed` directory.

`compress_all.sh` takes 3 arguments:

* `qb` - number of quantization levels

* `resnet-model` - resnet model architecture

* `device` - torch device to evaluate on

```shell

# Compress all images from assets/images directory

bash scripts/compress_all.sh 8 resnet18 cpu

```

[decompress_all.sh](./scripts/decompress_all.sh) decompresses all images from `assets/compressed` directory and saves

them in `assets/decompressed` directory.

`decompress_all.sh` takes 3 arguments:

* `qb` - number of quantization levels

* `resnet-model` - resnet model architecture

* `device` - torch device to evaluate on

```shell

# Decompress all images from assets/compressed directory

bash scripts/decompress_all.sh 8 resnet18 cpu

```

## Compression

In compression phase the encoder encodes the image into a vector of size 32768 (this is flattened feature map from last

convolutional layer of the encoder of size 512 x 8 x 8).

Then the vector is quantized into `B` quantization levels. And finally the quantized vector is compressed

using `Adaptive Arithmetic Coding`. Arithmetic encoder takes quantized vector with values in range _[0; 2^B]_ as the input and outputs binary sequence. Encoding is performed using arithmetic-compressor python package. SimpleAdaptiveModel was used for probabilities update. This model gradually forgets old statistics with exponential moving average.

Final compressed file consists of:

* `vector` - quantized vector

* `shape` - feature map shape

```shell

# Compress the `baboon` image from assets/images directory

python compress.py \

  --image=assets/images/baboon.png \

  --output=assets/compressed/baboon.bin \

  --models-dir=models \

  --resnet-model=resnet18 \

  --qb=8 \

  --device=cuda

```

## Decompression

In decompression phase the compressed file is decompressed using `Adaptive Arithmetic Coding`. Then the decompressed

vector is dequantized and decoded by the decoder. The decoder outputs the decompressed image.

dequantized vector = `vector / (2 ** qb)`

```shell

# Decompress the compressed image

python decompress.py \

  --file=assets/compressed/baboon.bin \

  --output=assets/decompressed/baboon.png \

  --qb=8 \

  --resnet-model=resnet18 \

  --models-dir=models \

  --device=cuda

```

## Training from scratch

```shell

python train.py \

  --root [path to images] \

  --test-root [path to test images] \

  --resnet-model [resnet model architecture] \

  --qb [number of quantization levels] \

  --epochs [number of epochs] \

  --batch-size [batch size] \

  --lr [learning rate] \

  --device [torch device to train on] \

  --save-results-every [save results every n epochs] \

  --save-models-dir [path to save models] \

  --use-checkpoint [use checkpoint to resume training]

```

## Results

### Images

#### B=2

| (Jpeg QF, BPP) |                   Jpeg                   |                  Auto-Encoder                   |

|---------------:|:----------------------------------------:|:-----------------------------------------------:|

|      12, 0.605 |  ![baboon](assets/jpegs/B=2/baboon.jpg)  |  ![baboon](assets/decompressed/B=2/baboon.png)  |

|      35, 0.605 |    ![lena](assets/jpegs/B=2/lena.jpg)    |    ![lena](assets/decompressed/B=2/lena.png)    |

|      33, 0.605 | ![peppers](assets/jpegs/B=2/peppers.jpg) | ![peppers](assets/decompressed/B=2/peppers.png) |

#### B=8

| (Jpeg QF, BPP) |                   Jpeg                   |                  Auto-Encoder                   |

|---------------:|:----------------------------------------:|:-----------------------------------------------:|

|       72, 2.28 |  ![baboon](assets/jpegs/B=8/baboon.jpg)  |  ![baboon](assets/decompressed/B=8/baboon.png)  |

|       90, 2.28 |    ![lena](assets/jpegs/B=8/lena.jpg)    |    ![lena](assets/decompressed/B=8/lena.png)    |

|       89, 2.28 | ![peppers](assets/jpegs/B=8/peppers.jpg) | ![peppers](assets/decompressed/B=8/peppers.png) |

### PSNR / BPP

![psnr-bpp](assets/graphs/psnr-bpp.png)

## Notebooks

* [Kaggle training notebook](notebooks/kaggle-cuda-training.ipynb)

* [Analysis notebook](notebooks/analysis.ipynb)
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/aimaster-dev/image-compression-by-autoencoder

Awesome Lists containing this project

README