Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/xiaosu-zhu/mcquic

Repository of CVPR'22 paper "Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression"
https://github.com/xiaosu-zhu/mcquic

computer-vision cvpr2022 image-compression image-processing pytorch

Last synced: about 1 month ago
JSON representation

Repository of CVPR'22 paper "Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression"

Awesome Lists containing this project

README

        



McQuic


McQuic




a.k.a. Multi-codebook Quantizers for neural image compression



Python


PyTorch


Github stars


Github forks


Github license



All tests


Conda package


Downloads


Demo



🥳Our paper will be presented at CVPR 2022!🥳






Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression


Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression


CVF Open Access | arXiv | BibTex | Demo






**Mc*****Quic*** is a deep image compressor.

**Features**:
* Solid performance and super-fast coding speed (See [Reference Models](#reference-models)).
* Cross-platform support (Linux-64, Windows-64 and macOS-64, macOS-arm64).
* You could try the interactive demo in the [HuggingFace Space](https://huggingface.co/spaces/xiaosu-zhu/McQuic)!

**Techs**:

The **Mc*****Quic*** hold rich multi-codebooks to quantize visual features and restore images by these quantized features. Similar ideas are presented in SHA [[1](#SHA)], VQ-VAE [[2](#VQ-VAE)], VQ-GAN [[3](#VQ-GAN)], *etc*. We summarize these as vectorized priors, and our method extends these ideas to a ***unified multivariate Gaussian mixture***, to perform high-quality, low-latency image compression.


Vectorized prior
Vectorized prior
Figure 1. Operational diagrams of different methods.


kodim24.png
kodim24.png
Figure 2. Comparisons with traditional codecs on an image from Kodak dataset.

* [Quick Start](#quick-start)
* [Requirements](#requirements)
* [Conda (Recommended)](#conda-recommended)
* [Docker](#docker)
* [Install Manually (for dev)](#install-manually-for-dev)
* [(Optional) Install NVIDIA/Apex](#optional-install-nvidiaapex)
* [Reference Models](#reference-models)
* [Train a New Model](#train-a-new-model)
* [Requirements](#requirements-1)
* [Configs](#configs)
* [Prepare a Dataset](#prepare-a-dataset)
* [Training](#training)
* [Test](#test)
* [Implement MCQ by yourself](#implement-mcq-by-yourself)
* [Contribute to this Repository](#contribute-to-this-repository)
* [To-do List](#to-do-list)
* [Detailed framework](#detailed-framework)
* [References and License](#references-and-license)
* [References](#references)
* [Citation](#citation)
* [Copyright](#copyright)

# Quick Start
It is easy (with a GPU, or CPU if you like) to try our model. I would give a quick guide to help you compress an image and restore it.

## Requirements
To run the model, your device needs to meet following requirements.

* Hardware
* a CUDA-enabled GPU (`≥ 8GiB VRAM`, Driver version `≥ 450.80.02`)
* If you don't have GPU, running models on CPU may be slower.
* `≥ 8GiB RAM`
* OS
* I've tested all features on `Ubuntu`, other platforms should also work. If not, please [file bugs](#contribute-to-this-repository).

## Conda (Recommended)
Install this package is very easy with a `conda` environment installed, *e.g.* [Miniconda](https://docs.conda.io/en/latest/miniconda.html). I recommend you to install it to a new virtual environment directly by:
```bash
# Install a clean pytorch with CUDA support
conda create -n [ENV_NAME] python=3.9 "pytorch>=1.11,<2" "torchvision>=0.12,<1" cudatoolkit -c pytorch
# Install mcquic and other dependencies
conda install -n [ENV_NAME] mcquic -c xiaosu-zhu -c conda-forge
conda activate [ENV_NAME]
```



> Above command install packages with `CUDA` support. If you just want to run it on CPU, please use `cpuonly` other than `cudatoolkit` in the first command.



> Since there is no proper version of torchvision now for Apple M1, you need to change channel from `pytorch` to `conda-forge` in the first command.

* Compress images
```bash
mcquic
```
```console
Usage: mcquic [OPTIONS] COMMAND [ARGS]...

Options:
-v, --version Print version info.
-h, --help Show this message and exit.

Commands:
-* Compress/restore a file.
dataset Create training set from `images` dir to `output` dir.
train Train a model.
validate Validate a trained model from `path` by images from `images`...

```
```bash
mcquic --help
```
```console
Usage: mcquic - [OPTIONS] INPUT [OUTPUT]

Compress/restore a file.

Args:

input (str): Input file path. If input is an image, compress it. If
input is a `.mcq` file, restore it.

output (optional, str): Output file path or dir. If not provided, this
program will only print compressor information of input file.

Options:
-D, --debug Set logging level to DEBUG to print verbose messages.
-q, --quiet Silence all messages, this option has higher priority to
`-D/--debug`.
-qp INTEGER RANGE Quantization parameter. Higher means better image quality
and larger size. [default: 2; 1<=x<=13]
--local FILE Use a local model path instead of download by `qp`.
--disable-gpu Use pure CPU to perform compression. This will be slow.
--mse Use model optimized for PSNR other than MsSSIM.
--crop Crop the image to align feature patches. Edges of image
are cutted though, compressed binary will be smaller.
-h, --help Show this message and exit.

```
```bash
mcquic -qp 2 path/to/an/image path/to/output.mcq
```
* Decompress images
```bash
# `-qp` is not necessary. Since this arg is written to `output.mcq`.
mcquic path/to/output.mcq path/to/restored.png
```

## Docker
I also build [`docker` images](https://github.com/xiaosu-zhu/McQuic/pkgs/container/mcquic) for you to get away from environment issues.

Try with the latest docker image:
```bash
docker pull ghcr.io/xiaosu-zhu/mcquic:latest
# or nightly build
# docker pull ghcr.io/xiaosu-zhu/mcquic:nightly
```

The entrypoint of this container is set to `mcquic` itself. So, you can directly use it as `mcquic` main program to execute.
```bash
docker run ghcr.io/xiaosu-zhu/mcquic:latest --help
```

To compress/restore images, you need to mount native files into the container. Therefore, a working example forms as follows:
```bash
# `someimage.png` is located in `path/to/some/folder`. And this folder will be mounted at `/workspace/workdir`.
docker run -v path/to/some/folder:/workspace/workdir ghcr.io/xiaosu-zhu/mcquic:latest /workspace/workdir/someimage.png /workspace/workdir/output.mcq
docker run -v path/to/some/folder:/workspace/workdir ghcr.io/xiaosu-zhu/mcquic:latest /workspace/workdir/output.mcq /workspace/workdir/restored.png
```

## Install Manually (for dev)
This way enables your full access to this repo for modifying. Also, if you want to go on, a `conda` environment is needed, *e.g.* [Miniconda](https://docs.conda.io/en/latest/miniconda.html).

* Clone this repository
```bash
git clone https://github.com/xiaosu-zhu/McQuic.git && cd McQuic
```
* Create a virtual env `mcquic` and install all packages by
```powershell
./install.sh # for POSIX with bash
.\install.ps1 # for Windows with Anaconda PowerShell
```

Now you should in the `mcquic` virtual environment. If not, please activate it by `conda activate mcquic`.

* Compress images
```bash
mcquic --help
mcquic -qp 2 assets/sample.png assets/compressed.mcq
```
* Decompress images
```bash
# `-qp` is not necessary. Since this arg is written to `output.mcq`.
mcquic assets/compressed.mcq assets/restored.png
```
And check outputs: [`assets/compressed.mcq`](https://raw.githubusercontent.com/xiaosu-zhu/McQuic/main/assets/compressed.mcq) and [`assets/restored.png`](https://raw.githubusercontent.com/xiaosu-zhu/McQuic/main/assets/restored.png).

## (***Optional***) Install `NVIDIA/Apex`

[`NVIDIA/Apex`](https://github.com/NVIDIA/apex) is an additional package **required** for training. If you want to [**develop, contribute**](#contribute-to-this-repository), or [**train a new model**](#train-a-new-model), please ensure you've installed `NVIDIA/Apex` by following snippets.
```bash
git clone https://github.com/NVIDIA/apex && cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
```



> If you are using [Docker images](#docker), this step is not necessary.



> Please make sure you've installed it in the correct virtual environment.



> For more information such as building toolchains, please refer to [their repository](https://github.com/NVIDIA/apex).

# Reference Models
I've released one pretrained model (Sorry, currently I don't have much free GPUs to train models). You could fetch them by specifying `-qp [Model_NO]`. Following is the pretrained model list (Others ***TBA***):

| Model No. | Channel | M | K | Throughput (Encode/Decode) | Avg.BPP |
|:---------: |:-------: |:-: |:---------------: |:--------------------------: |:-------: |
| - | - | - | - | - | - |
| 2 | 128 | 2 | [8192,2048,512] | 25.45 Mpps / 22.03 Mpps | 0.1277 |
| - | - | - | - | - | - |
| 12 | 192 | 12 | [8192,2048,512] | 11.07 Mpps / 10.21 Mpps | - |

The coding throughput is tested on a NVIDIA RTX 3090. Image file I/O, model loading, *etc.* are not included in the test. Throughput will be further increased by `5%~15%` if you convert models to `TorchScript`. However, it is not trivial since conversion involves entropy coder, which is a cpp extension. So, I'm not going to implement it.

The main slow-down from small models to large models is caused by channel `128 -> 192`.
- **`Mpps = Mega-pixels per second`**
- **`BPP = Bits per pixel`**

# Train a New Model
Please ensure you've installed [`NVIDIA/Apex`](#optional-install-nvidiaapex). To train models, here are minimal and recommended system requirements.

## Requirements
* Minimal
* `RAM ≥ 64GiB`
* `VRAM ≥ 12GiB`
* Recommended
* `VRAM ≥ 24GiB`
* Better if you have `≥4-way` NVIDIA RTX 3090s or faster GPUs.

## Configs
The folder [configs](configs) provides example config `example.yaml` to train models. Please refer to [configs/README.md](configs/README.md) for more info.

## Prepare a Dataset
Before training models, you need to prepare an image dataset. It is free to pick any images to form dataset, as long as the image-size is `≥512x512`.

* To build a training dataset, please put all images in a folder (allow for sub-folders), then run
```bash
mcquic dataset --help
```
```console
Usage: mcquic dataset [OPTIONS] IMAGES OUTPUT

Create training set from `images` dir to `output` dir.

Args:

images (str): All training images folder, allow sub-folders.

output (str): Output dir to create training set.

Options:
-D, --debug Set logging level to DEBUG to print verbose messages.
-q, --quiet Silence all messages, this option has higher priority to
`-D/--debug`.
-h, --help Show this message and exit.

```
```bash
mcquic dataset train_images mcquic_dataset
```
to build a `lmdb` dataset for `mcquic` to read.

* Then, you could prepare a training config *e.g.* `configs/train.yaml`, and don't forget to speify dataset path.
```yaml
# `configs/train.yaml`
...
trainSet: mcquic_dataset # path to the training dataset.
valSet: val_images # path to a folder of validation images.
savePath: saved # path to a folder to save checkpoints.
...
```
where `trainSet` and `valSet` can be any relative or absolute paths, and `savePath` is a folder for saving checkpoints and logs.

In this example, the final folder structure is shown below:

```yaml
. # A nice folder
├─ 📂configs
│ ...
│ └── 📄train.yaml
├── 📄README.md # this readme
├── 📂saved # saved models apprear here
├── 📂train_images # a lot of training images
│ ├── 📂ImageNet
│ | ├── 📂folder1 # a lot of images
│ | ├── 🖼️image1.png
│ | ...
│ ├── 📂COCO
│ | ├── 🖼️image1.png
│ | ├── 🖼️image2.png
│ | ...
| ...
├── 📂mcquic_dataset # generated training dataset
| ├── 📀data.mdb
| ├── 📀lock.mdb
| └── 📄metadata.json
└── 📂val_images # a lot of validation images
├── 🖼️image1.png
├── 🖼️image2.png
...
```

## Training
* To train a new model, run
```bash
mcquic train --help
```
```console
Usage: mcquic train [OPTIONS] [CONFIG]

Train a model.

Args:

config (str): Config file (yaml) path. If `-r/--resume` is present but
config is still given, then this config will be used to update the
resumed training.

Options:
-D, --debug Set logging level to DEBUG to print verbose messages.
-q, --quiet Silence all messages, this option has higher priority to
`-D/--debug`.
-r, --resume FILE `.ckpt` file path to resume training.
-h, --help Show this message and exit.

```
```bash
mcquic train configs/train.yaml
```
and saved model is located in `saved/mcquic_dataset/latest`.
* To resume an interuptted training, run
```bash
mcquic train -r
```
, or
```bash
mcquic train -r configs/train.yaml
```
if you want to use an updated config (e.g. tuned learning rate, modified hyper-parameters) to resume training.

## Test
You could use any save checkpoints (usually located in above `savePath`) to validate the performance. For example
```bash
mcquic validate --help
```
```console
Usage: python -m mcquic.validate [OPTIONS] PATH IMAGES [OUTPUT]

Validate a trained model from `path` by images from `images` dir, and
publish a final state_dict to `output` path.

Args:

path (str): Saved checkpoint path.

images (str): Validation images folder.

output (str): Dir to save all restored images.

Options:
-D, --debug Set logging level to DEBUG to print verbose messages.
-q, --quiet Silence all messages, this option has higher priority to
`-D/--debug`.
-e, --export PATH Path to export the final model that is compatible with
main program.
-h, --help Show this message and exit.

```
```bash
mcquic validate -e path/to/final/model path/to/a/checkpoint path/to/images/folder path/to/output/folder
```

And the output "final/model" is compatible with the main program `mcquic`, you could directly use this local model to perform compression. Try:
```bash
mcquic --local path/to/final/model assets/sample.png assets/compressed.mcq
# `--local` is not necessary. Since this arg is written to `output.mcq`.
mcquic assets/compressed.mcq assets/restored.png
```
If you think your model is awesome, please don't hasitate to [Contribute to this Repository](#contribute-to-this-repository)!

# Implement MCQ by yourself
A minimal implementation of the multi-codebook quantizer comes up with (please refer to [quantizer.py](./mcquic/modules/quantizer.py#L61) for notes):

```python
class Quantizer(nn.Module):
"""
Quantizer with `m` sub-codebooks,
`k` codewords for each, and
`n` total channels.
Args:
m (int): Number of sub-codebooks.
k (int): Number of codewords for each sub-codebook.
n (int): Number of channels of latent variables.
"""
def __init__(self, m: int, k: int, n: int):
super().__init__()
# A codebook, feature dim `d = n // m`.
self._codebook = nn.Parameter(torch.empty(m, k, n // m))
self._initParameters()

def _initParameters(self):
nn.init.normal_(self._codebook, std=math.sqrt(2 / (5 * n / m)))

def forward(self, x: Tensor, t: float = 1.0) -> (Tensor, Tensor):
"""
Module forward.
Args:
x (Tensor): Latent variable with shape [b, n, h, w].
t (float, 1.0): Temperature for Gumbel softmax.
Return:
Tensor: Quantized latent with shape [b, n, h, w].
Tensor: Binary codes with shape [b, m, h, w].
"""
b, _, h, w = x.shape
# [b, m, d, h, w]
x = x.reshape(b, len(self._codebook), -1, h, w)
# [b, m, 1, h, w], square of x
x2 = (x ** 2).sum(2, keepdim=True)
# [m, k, 1, 1], square of codebook
c2 = (self._codebook ** 2).sum(-1, keepdim=True)[..., None]
# [b, m, d, h, w] * [m, k, d] -sum-> [b, m, k, h, w], dot product between x and codebook
inter = torch.einsum("bmdhw,mkd->bmkhw", x, self._codebook)
# [b, m, k, h, w], pairwise L2-distance
distance = x2 + c2 - 2 * inter
# [b, m, k, h, w], distance as logits to sample
sample = F.gumbel_softmax(-distance, t, hard=True, dim=2)
# [b, m, d, h, w], use sample to find codewords
quantized = torch.einsum("bmkhw,mkd->bmdhw", sample, self._codebook)
# back to [b, n, h, w]
quantized = quantized.reshape(b, -1, h, w)
# [b, n, h, w], [b, m, h, w], quantizeds and binaries
return quantized, sample.argmax(2)
```

# Contribute to this Repository
It will be very nice if you want to check your new ideas or add new functions 😊. You will need to install `mcquic` by [**Docker**](#docker-recommended) or [**manually (with optional step)**](#install-manually-for-dev). Just like other git repos, before raising issues or pull requests, please take a thorough look at [issue templates](https://github.com/xiaosu-zhu/McQuic/issues/new/choose).

# To-do List
* `mcquic service`
* More pretrained model

# Detailed framework
Thanks for your attention!❤️ Here are details in the paper.

Following previous works, we build the compression model as an AutoEncoder. Bottleneck of encoder (analysis transform) outputs a small feature map and is quantized by *multi-codebook vector-quantization* other than scalar-quantization. Quantizers are cascaded to effectively estimate latent distribution.


Framework
Framework
Figure 3. Left: Overall framework. Right: Structure of a quantizer.

Right part of above figure shows detailed structure of our proposed quantizer.

# References and License
## References
[1] Agustsson, Eirikur, et al. "Soft-to-hard vector quantization for end-to-end learning compressible representations." NeurIPS 2017.

[2] Van Den Oord, Aaron, and Oriol Vinyals. "Neural discrete representation learning." NeurIPS 2017.

[3] Esser, Patrick, Robin Rombach, and Bjorn Ommer. "Taming transformers for high-resolution image synthesis." CVPR 2021.

## Citation
To cite our paper, please use following BibTex:
```plain
@inproceedings{McQuic,
author = {Xiaosu Zhu and
Jingkuan Song and
Lianli Gao and
Feng Zheng and
Heng Tao Shen},
title = {Unified Multivariate Gaussian Mixture for Efficient Neural Image Compression},
booktitle = {CVPR},
pages = {17612--17621}
year = {2022}
}
```

## Copyright

**Fonts**:
* [**Source Sans Pro**](https://fonts.adobe.com/fonts/source-sans). © 2010, 2012 Adobe Systems Incorporated, SIL Open Font License.
* [**Flash Rogers 3D**](https://www.iconian.com/index.html). © 2007 Iconian Fonts, donationware.
* [**Cambria Math**](https://docs.microsoft.com/en-us/typography/font-list/cambria-math). © 2017 Microsoft Corporation. All rights reserved.
* [**Times New Roman**](https://docs.microsoft.com/en-us/typography/font-list/times-new-roman). © 2017 The Monotype Corporation. All Rights Reserved.
* [**Caramel and Vanilla**](http://www.foundmyfont.com/). © 2017 FOUND MY FONT LTD. All Rights Reserved.

**Pictures**:
* [**kodim24.png**](http://r0k.us/graphics/kodak/kodim24.html) by Alfons Rudolph, Kodak Image Dataset.
* [**assets/sample.png**](https://unsplash.com/photos/hLxqYJspAkE) by Ales Krivec, CLIC Professional valid set.

**Third-party repos**:

| Repos | License |
|-------------------------------------------------------------------------------:|---------|
| [PyTorch](https://pytorch.org/) | [BSD-style](https://github.com/pytorch/pytorch/blob/master/LICENSE) |
| [Torchvision](https://pytorch.org/vision/stable/index.html) | [BSD-3-Clause](https://github.com/pytorch/vision/blob/main/LICENSE) |
| [Apex](https://nvidia.github.io/apex/) | [BSD-3-Clause](https://github.com/NVIDIA/apex/blob/master/LICENSE) |
| [Tensorboard](https://www.tensorflow.org/tensorboard) | [Apache-2.0](https://github.com/tensorflow/tensorboard/blob/master/LICENSE) |
| [Kornia](https://kornia.github.io/) | [Apache-2.0](https://github.com/kornia/kornia/blob/master/LICENSE) |
| [rich](https://rich.readthedocs.io/en/latest/) | [MIT](https://github.com/Textualize/rich/blob/master/LICENSE) |
| [python-lmdb](https://lmdb.readthedocs.io/en/release/) | [OpenLDAP Version 2.8](https://github.com/jnwatson/py-lmdb/blob/master/LICENSE) |
| [PyYAML](https://pyyaml.org/) | [MIT](https://github.com/yaml/pyyaml/blob/master/LICENSE) |
| [marshmallow](https://marshmallow.readthedocs.io/en/stable/) | [MIT](https://github.com/marshmallow-code/marshmallow/blob/dev/LICENSE) |
| [click](https://click.palletsprojects.com/) | [BSD-3-Clause](https://github.com/pallets/click/blob/main/LICENSE.rst) |
| [vlutils](https://github.com/VL-Group/vlutils) | [Apache-2.0](https://github.com/VL-Group/vlutils/blob/main/LICENSE) |
| [MessagePack](https://msgpack.org/) | [Apache-2.0](https://github.com/msgpack/msgpack-python/blob/main/COPYING) |
| [pybind11](https://pybind11.readthedocs.io/en/stable/) | [BSD-style](https://github.com/pybind/pybind11/blob/master/LICENSE) |
| [CompressAI](https://interdigitalinc.github.io/CompressAI/) | [BSD 3-Clause Clear](https://github.com/InterDigitalInc/CompressAI/blob/master/LICENSE) |
| [Taming-transformer](https://compvis.github.io/taming-transformers/) | [MIT](https://github.com/CompVis/taming-transformers/blob/master/License.txt) |
| [marshmallow-jsonschema](https://github.com/fuhrysteve/marshmallow-jsonschema) | [MIT](https://github.com/fuhrysteve/marshmallow-jsonschema/blob/master/LICENSE) |
| [json-schema-for-humans](https://coveooss.github.io/json-schema-for-humans/#/) | [Apache-2.0](https://github.com/coveooss/json-schema-for-humans/blob/main/LICENSE.md) |
| [CyclicLR](https://github.com/bckenstler/CLR) | [MIT](https://github.com/bckenstler/CLR/blob/master/LICENSE) |
| [batch-transforms](https://github.com/pratogab/batch-transforms) | [MIT](https://github.com/pratogab/batch-transforms/blob/master/LICENSE) |
| [pytorch-msssim](https://github.com/VainF/pytorch-msssim) | [MIT](https://github.com/VainF/pytorch-msssim/blob/master/LICENSE) |
| [Streamlit](https://streamlit.io/) | [Apache-2.0](https://github.com/streamlit/streamlit/blob/develop/LICENSE) |
| [conda](https://docs.conda.io/projects/conda/en/latest/) | [BSD 3-Clause](https://docs.conda.io/en/latest/license.html) |







This repo is licensed under




The Apache Software Foundation


The Apache Software Foundation




Apache License
Version 2.0