Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/ai-forever/movqgan

MoVQGAN - model for the image encoding and reconstruction
https://github.com/ai-forever/movqgan

gan generative-adversarial-network image-compression image-encoding image-reconstruction

Last synced: 25 days ago
JSON representation

MoVQGAN - model for the image encoding and reconstruction

Host: GitHub
URL: https://github.com/ai-forever/movqgan
Owner: ai-forever
Created: 2023-05-15T15:19:10.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-10-31T06:21:47.000Z (over 1 year ago)
Last Synced: 2025-01-02T23:11:06.766Z (about 1 month ago)
Topics: gan, generative-adversarial-network, image-compression, image-encoding, image-reconstruction
Language: Jupyter Notebook
Homepage:
Size: 27.5 MB
Stars: 211
Watchers: 4
Forks: 14
Open Issues: 9
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # SBER-MoVQGAN

[![Framework: PyTorch](https://img.shields.io/badge/Framework-PyTorch-orange.svg)](https://pytorch.org/) [![Huggingface space](https://img.shields.io/badge/🤗-Huggingface-yello.svg)](https://huggingface.co/ai-forever/MoVQGAN)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EVKDFsa17VgdyiaPdbKShBIm4N_18Xlj?usp=sharing) 

[Habr post](https://habr.com/ru/companies/sberbank/articles/740624/)

![](./pics/example.png)

SBER-MoVQGAN (Modulated Vector Quantized GAN) is a new SOTA model in the image reconstruction problem. This model is based on code from the [VQGAN](https://github.com/CompVis/taming-transformers) repository and modifications from the original [MoVQGAN](https://arxiv.org/pdf/2209.09002.pdf) paper. The architecture of SBER-MoVQGAN is shown below in the [figure](https://arxiv.org/pdf/2209.09002.pdf). 

![](./pics/architecture.png)

SBER-MoVQGAN was successfully implemented in [Kandinsky 2.1](https://github.com/ai-forever/Kandinsky-2), and became one of the architecture blocks that allowed to significantly improve the quality of image generation from text.

## Models

+ [67M SBER-MoVQGAN](https://huggingface.co/ai-forever/MoVQGAN/resolve/main/movqgan_67M.ckpt)

+ [102M SBER-MoVQGAN](https://huggingface.co/ai-forever/MoVQGAN/resolve/main/movqgan_102M.ckpt)

+ [270M SBER-MoVQGAN](https://huggingface.co/ai-forever/MoVQGAN/resolve/main/movqgan_270M.ckpt)

The following table shows a comparison of the models on the Imagenet dataset in terms of FID, SSIM, and PSNR metrics. A more detailed description of the experiments and a comparison with other models can be found in the [Habr post](https://habr.com/ru/companies/sberbank/articles/740624/).

|Model|Latent size|Num Z|Train steps|FID|SSIM|PSNR|L1|

|:----|:----|:----|:----|:----|:----|:----|:----|

|[ViT-VQGAN\*](https://arxiv.org/pdf/2110.04627.pdf)|32x32|8192|500000|1,28|\-|\-|\-|

|[RQ-VAE\*](https://arxiv.org/pdf/2203.01941.pdf)|8x8x16|16384|10 epochs|1,83|\-|\-|\-|

|[Mo-VQGAN\*](https://arxiv.org/pdf/2209.09002.pdf)|16x16x4|1024|40 epochs|1,12|0,673|22,42|\-|

| [VQ CompVis](https://github.com/CompVis/latent-diffusion)| 32x32| 16384 | 971043| 1,34| 0,65| 23,847| 0,053|

| [KL CompVis](https://github.com/CompVis/latent-diffusion)| 32x32| \- | 246803| 0,968| 0,692| 25,112| 0,047|

| [SBER-VQGAN (from pretrain)](https://habr.com/ru/companies/sberbank/articles/581738/)| 32x32| 8192| 1 epoch| 1,439| 0,682| 24,314| 0,05|

| [SBER-MoVQGAN 67M](https://huggingface.co/ai-forever/MoVQGAN/resolve/main/movqgan_67M.ckpt) | 32x32 | 16384 | 2M | 0,965| 0,725| 26,449| 0,042

| [SBER-MoVQGAN 102M](https://huggingface.co/ai-forever/MoVQGAN/resolve/main/movqgan_102M.ckpt)|32x32|16384|2360k|0,776|0,737 | 26,889| 0,04|

|[SBER-MoVQGAN 270M](https://huggingface.co/ai-forever/MoVQGAN/resolve/main/movqgan_270M.ckpt)|32x32|16384|1330k| **0,686**💥| **0,741**💥| **27,037**💥| **0,039**💥|

## How to use

### Install

```

pip install "git+https://github.com/ai-forever/MoVQGAN.git"

```

### Train

```

python main.py --config configs/movqgan_270M.yaml

```

### Inference

Check jupyter notebook with example in `./notebooks` folder or [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1EVKDFsa17VgdyiaPdbKShBIm4N_18Xlj?usp=sharing)

## Examples

This section provides examples of image reconstruction for all versions of SBER-MoVQGAN on hard-to-recover domains such as faces, text, and other complex scenes.

![](./pics/examples.png)

## Authors

+ Anastasia Maltseva: [Github](https://github.com/NastyaMittseva)

+ Arseniy Shakhmatov: [Github](https://github.com/cene555), [Blog](https://t.me/gradientdip)

+ Andrey Kuznetsov: [Github](https://github.com/kuznetsoffandrey), [Blog](https://t.me/complete_ai)

+ Denis Dimitrov: [Github](https://github.com/denndimitrov), [Blog](https://t.me/dendi_math_ai)