https://github.com/aimaster-dev/image-compression-by-autoencoder
https://github.com/aimaster-dev/image-compression-by-autoencoder
Last synced: 9 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/aimaster-dev/image-compression-by-autoencoder
- Owner: aimaster-dev
- Created: 2024-08-21T03:00:23.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-21T03:02:31.000Z (over 1 year ago)
- Last Synced: 2025-03-29T23:22:06.442Z (10 months ago)
- Language: Jupyter Notebook
- Size: 9.48 MB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Image compression using neural auto-encoder and quantization
This project is a simple implementation of auto-encoder neural network for image compression.
The auto-encoder neural network is trained on the ImageNet dataset. The trained model is then used to compress and
decompress the images.
## Navigation:
* [Model architecture](#model-architecture)
* [Download pretrained models](#download-pretrained-models)
* [Quantization](#quantization)
* [Quick start](#quick-start)
* [Compression](#compression)
* [Decompression](#decompression)
* [Training from scratch](#training-from-scratch)
* [Results](#results)
* [Notebooks](#notebooks)
## Model architecture
Model represents a variational auto-encoder with residual blocks and skip connections.
* Encoder: _ResNet-18 architecture with fully connected layers_
* Decoder: _ResNet-18 architecture with transposed convolution layers_
* Loss: _VGG loss + MSE loss_
* Optimizer: _Adam optimizer_
## Download pretrained models
Models were trained
on [130k Images (512x512) - Universal Image Embeddings](https://www.kaggle.com/datasets/rhtsingh/130k-images-512x512-universal-image-embeddings)
dataset from Kaggle.
Here are the links to download the pretrained models:
_B = number of quantization levels_
* [B=2, resnet18](https://drive.google.com/drive/folders/1FaeWzeRW3BMqqZwGsHUjhf7PuAOsiY6E?usp=sharing)
* [B=8, resnet18](https://drive.google.com/drive/folders/1fYDc0e43cUR7xsIYatpz8fdJ_6KMJmSs?usp=sharing)
Put downloaded models in `models` directory.
## Quantization
Model outputs feature maps with 512 channels and 8 x 8 spatial dimensions. Then the feature map are flattened and
become a vector of size 32768. The vector is then quantized into `B` quantization levels.
### Train quantization
In training phase `noise` is appended to the input image. The `noise` is sampled from N(-0.5, 0.5) and then noise scaled
by
`B` quantization levels. So the final noise vector is
```python
scale = 2 ** -B
noise = (torch.randn(n) * 0.5 - 0.5) * scale
```
### Inference quantization
In inference mode vector is quantized using `torch.clamp(0, 1)` and then scaled by `B` quantization levels.
So the final quantized vector is
```python
quantized = torch.clamp(vector, 0, 1) * 2 ** B + 0.5
quantized = quantized.int()
```
## Quick start
[compress_all.sh](scripts/compress_all.sh) compresses all images from `assets/images` directory and saves them
in `assets/compressed` directory.
`compress_all.sh` takes 3 arguments:
* `qb` - number of quantization levels
* `resnet-model` - resnet model architecture
* `device` - torch device to evaluate on
```shell
# Compress all images from assets/images directory
bash scripts/compress_all.sh 8 resnet18 cpu
```
[decompress_all.sh](./scripts/decompress_all.sh) decompresses all images from `assets/compressed` directory and saves
them in `assets/decompressed` directory.
`decompress_all.sh` takes 3 arguments:
* `qb` - number of quantization levels
* `resnet-model` - resnet model architecture
* `device` - torch device to evaluate on
```shell
# Decompress all images from assets/compressed directory
bash scripts/decompress_all.sh 8 resnet18 cpu
```
## Compression
In compression phase the encoder encodes the image into a vector of size 32768 (this is flattened feature map from last
convolutional layer of the encoder of size 512 x 8 x 8).
Then the vector is quantized into `B` quantization levels. And finally the quantized vector is compressed
using `Adaptive Arithmetic Coding`. Arithmetic encoder takes quantized vector with values in range _[0; 2^B]_ as the input and outputs binary sequence. Encoding is performed using arithmetic-compressor python package. SimpleAdaptiveModel was used for probabilities update. This model gradually forgets old statistics with exponential moving average.
Final compressed file consists of:
* `vector` - quantized vector
* `shape` - feature map shape
```shell
# Compress the `baboon` image from assets/images directory
python compress.py \
--image=assets/images/baboon.png \
--output=assets/compressed/baboon.bin \
--models-dir=models \
--resnet-model=resnet18 \
--qb=8 \
--device=cuda
```
## Decompression
In decompression phase the compressed file is decompressed using `Adaptive Arithmetic Coding`. Then the decompressed
vector is dequantized and decoded by the decoder. The decoder outputs the decompressed image.
dequantized vector = `vector / (2 ** qb)`
```shell
# Decompress the compressed image
python decompress.py \
--file=assets/compressed/baboon.bin \
--output=assets/decompressed/baboon.png \
--qb=8 \
--resnet-model=resnet18 \
--models-dir=models \
--device=cuda
```
## Training from scratch
```shell
python train.py \
--root [path to images] \
--test-root [path to test images] \
--resnet-model [resnet model architecture] \
--qb [number of quantization levels] \
--epochs [number of epochs] \
--batch-size [batch size] \
--lr [learning rate] \
--device [torch device to train on] \
--save-results-every [save results every n epochs] \
--save-models-dir [path to save models] \
--use-checkpoint [use checkpoint to resume training]
```
## Results
### Images
#### B=2
| (Jpeg QF, BPP) | Jpeg | Auto-Encoder |
|---------------:|:----------------------------------------:|:-----------------------------------------------:|
| 12, 0.605 |  |  |
| 35, 0.605 |  |  |
| 33, 0.605 |  |  |
#### B=8
| (Jpeg QF, BPP) | Jpeg | Auto-Encoder |
|---------------:|:----------------------------------------:|:-----------------------------------------------:|
| 72, 2.28 |  |  |
| 90, 2.28 |  |  |
| 89, 2.28 |  |  |
### PSNR / BPP

## Notebooks
* [Kaggle training notebook](notebooks/kaggle-cuda-training.ipynb)
* [Analysis notebook](notebooks/analysis.ipynb)