{"id":21490023,"url":"https://github.com/aimaster-dev/image-compression-by-autoencoder","last_synced_at":"2025-04-23T10:25:43.195Z","repository":{"id":254045420,"uuid":"845322721","full_name":"aimaster-dev/image-compression-by-autoencoder","owner":"aimaster-dev","description":null,"archived":false,"fork":false,"pushed_at":"2024-08-21T03:02:31.000Z","size":9945,"stargazers_count":6,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-29T23:22:06.442Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aimaster-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-21T03:00:23.000Z","updated_at":"2024-12-16T13:50:55.000Z","dependencies_parsed_at":"2024-08-21T06:15:48.510Z","dependency_job_id":null,"html_url":"https://github.com/aimaster-dev/image-compression-by-autoencoder","commit_stats":null,"previous_names":["aimaster-dev/image-compression-by-autoencoder"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aimaster-dev%2Fimage-compression-by-autoencoder","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aimaster-dev%2Fimage-compression-by-autoencoder/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aimaster-dev%2Fimage-compression-by-autoencoder/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aimaster-dev%2Fimage-compression-by-autoencoder/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aimaster-dev","download_url":"https://codeload.github.com/aimaster-dev/image-compression-by-autoencoder/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250414268,"owners_count":21426559,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-23T14:30:05.569Z","updated_at":"2025-04-23T10:25:43.170Z","avatar_url":"https://github.com/aimaster-dev.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Image compression using neural auto-encoder and quantization\n\nThis project is a simple implementation of auto-encoder neural network for image compression.\nThe auto-encoder neural network is trained on the ImageNet dataset. The trained model is then used to compress and\ndecompress the images.\n\n## Navigation:\n\n* [Model architecture](#model-architecture)\n* [Download pretrained models](#download-pretrained-models)\n* [Quantization](#quantization)\n* [Quick start](#quick-start)\n* [Compression](#compression)\n* [Decompression](#decompression)\n* [Training from scratch](#training-from-scratch)\n* [Results](#results)\n* [Notebooks](#notebooks)\n\n## Model architecture\n\nModel represents a variational auto-encoder with residual blocks and skip connections.\n\n* Encoder: _ResNet-18 architecture with fully connected layers_\n* Decoder: _ResNet-18 architecture with transposed convolution layers_\n* Loss: _VGG loss + MSE loss_\n* Optimizer: _Adam optimizer_\n\n## Download pretrained models\n\nModels were trained\non [130k Images (512x512) - Universal Image Embeddings](https://www.kaggle.com/datasets/rhtsingh/130k-images-512x512-universal-image-embeddings)\ndataset from Kaggle.\n\nHere are the links to download the pretrained models:\n_B = number of quantization levels_\n\n* [B=2, resnet18](https://drive.google.com/drive/folders/1FaeWzeRW3BMqqZwGsHUjhf7PuAOsiY6E?usp=sharing)\n* [B=8, resnet18](https://drive.google.com/drive/folders/1fYDc0e43cUR7xsIYatpz8fdJ_6KMJmSs?usp=sharing)\n\nPut downloaded models in `models` directory.\n\n## Quantization\n\nModel outputs feature maps with 512 channels and 8 x 8 spatial dimensions. Then the feature map are flattened and\nbecome a vector of size 32768. The vector is then quantized into `B` quantization levels.\n\n### Train quantization\n\nIn training phase `noise` is appended to the input image. The `noise` is sampled from N(-0.5, 0.5) and then noise scaled\nby\n`B` quantization levels. So the final noise vector is\n\n```python\nscale = 2 ** -B\nnoise = (torch.randn(n) * 0.5 - 0.5) * scale\n```\n\n### Inference quantization\n\nIn inference mode vector is quantized using `torch.clamp(0, 1)` and then scaled by `B` quantization levels.\nSo the final quantized vector is\n\n```python\nquantized = torch.clamp(vector, 0, 1) * 2 ** B + 0.5\nquantized = quantized.int()\n```\n\n## Quick start\n\n[compress_all.sh](scripts/compress_all.sh) compresses all images from `assets/images` directory and saves them\nin `assets/compressed` directory.\n\n`compress_all.sh` takes 3 arguments:\n\n* `qb` - number of quantization levels\n* `resnet-model` - resnet model architecture\n* `device` - torch device to evaluate on\n\n```shell\n# Compress all images from assets/images directory\nbash scripts/compress_all.sh 8 resnet18 cpu\n```\n\n[decompress_all.sh](./scripts/decompress_all.sh) decompresses all images from `assets/compressed` directory and saves\nthem in `assets/decompressed` directory.\n\n`decompress_all.sh` takes 3 arguments:\n\n* `qb` - number of quantization levels\n* `resnet-model` - resnet model architecture\n* `device` - torch device to evaluate on\n\n```shell\n# Decompress all images from assets/compressed directory\nbash scripts/decompress_all.sh 8 resnet18 cpu\n```\n\n## Compression\n\nIn compression phase the encoder encodes the image into a vector of size 32768 (this is flattened feature map from last\nconvolutional layer of the encoder of size 512 x 8 x 8).\nThen the vector is quantized into `B` quantization levels. And finally the quantized vector is compressed\nusing `Adaptive Arithmetic Coding`. Arithmetic encoder takes quantized vector with values in range _[0; 2^B]_ as the input and outputs binary sequence. Encoding is performed using arithmetic-compressor python package. SimpleAdaptiveModel was used for probabilities update. This model gradually forgets old statistics with exponential moving average.\n\nFinal compressed file consists of:\n\n* `vector` - quantized vector\n* `shape` - feature map shape\n\n```shell\n# Compress the `baboon` image from assets/images directory\npython compress.py \\\n  --image=assets/images/baboon.png \\\n  --output=assets/compressed/baboon.bin \\\n  --models-dir=models \\\n  --resnet-model=resnet18 \\\n  --qb=8 \\\n  --device=cuda\n```\n\n## Decompression\n\nIn decompression phase the compressed file is decompressed using `Adaptive Arithmetic Coding`. Then the decompressed\nvector is dequantized and decoded by the decoder. The decoder outputs the decompressed image.\n\ndequantized vector = `vector / (2 ** qb)`\n\n```shell\n# Decompress the compressed image\npython decompress.py \\\n  --file=assets/compressed/baboon.bin \\\n  --output=assets/decompressed/baboon.png \\\n  --qb=8 \\\n  --resnet-model=resnet18 \\\n  --models-dir=models \\\n  --device=cuda\n```\n\n## Training from scratch\n\n```shell\npython train.py \\\n  --root [path to images] \\\n  --test-root [path to test images] \\\n  --resnet-model [resnet model architecture] \\\n  --qb [number of quantization levels] \\\n  --epochs [number of epochs] \\\n  --batch-size [batch size] \\\n  --lr [learning rate] \\\n  --device [torch device to train on] \\\n  --save-results-every [save results every n epochs] \\\n  --save-models-dir [path to save models] \\\n  --use-checkpoint [use checkpoint to resume training]\n```\n\n## Results\n\n### Images\n\n#### B=2\n\n| (Jpeg QF, BPP) |                   Jpeg                   |                  Auto-Encoder                   |\n|---------------:|:----------------------------------------:|:-----------------------------------------------:|\n|      12, 0.605 |  ![baboon](assets/jpegs/B=2/baboon.jpg)  |  ![baboon](assets/decompressed/B=2/baboon.png)  |\n|      35, 0.605 |    ![lena](assets/jpegs/B=2/lena.jpg)    |    ![lena](assets/decompressed/B=2/lena.png)    |\n|      33, 0.605 | ![peppers](assets/jpegs/B=2/peppers.jpg) | ![peppers](assets/decompressed/B=2/peppers.png) |\n\n#### B=8\n\n| (Jpeg QF, BPP) |                   Jpeg                   |                  Auto-Encoder                   |\n|---------------:|:----------------------------------------:|:-----------------------------------------------:|\n|       72, 2.28 |  ![baboon](assets/jpegs/B=8/baboon.jpg)  |  ![baboon](assets/decompressed/B=8/baboon.png)  |\n|       90, 2.28 |    ![lena](assets/jpegs/B=8/lena.jpg)    |    ![lena](assets/decompressed/B=8/lena.png)    |\n|       89, 2.28 | ![peppers](assets/jpegs/B=8/peppers.jpg) | ![peppers](assets/decompressed/B=8/peppers.png) |\n\n### PSNR / BPP\n\n![psnr-bpp](assets/graphs/psnr-bpp.png)\n\n## Notebooks\n\n* [Kaggle training notebook](notebooks/kaggle-cuda-training.ipynb)\n* [Analysis notebook](notebooks/analysis.ipynb)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faimaster-dev%2Fimage-compression-by-autoencoder","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faimaster-dev%2Fimage-compression-by-autoencoder","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faimaster-dev%2Fimage-compression-by-autoencoder/lists"}