https://github.com/vvvm23/vqvae-2

PyTorch implementation of VQ-VAE-2 from "Generating Diverse High-Fidelity Images with VQ-VAE-2"
https://github.com/vvvm23/vqvae-2

generative-model pytorch vae vq-vae vq-vae-2

Last synced: 9 months ago
JSON representation

PyTorch implementation of VQ-VAE-2 from "Generating Diverse High-Fidelity Images with VQ-VAE-2"

Host: GitHub
URL: https://github.com/vvvm23/vqvae-2
Owner: vvvm23
License: mit
Created: 2021-03-23T20:18:37.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2023-02-15T08:50:24.000Z (about 3 years ago)
Last Synced: 2025-04-15T18:08:50.848Z (11 months ago)
Topics: generative-model, pytorch, vae, vq-vae, vq-vae-2
Language: Python
Homepage:
Size: 79.6 MB
Stars: 152
Watchers: 3
Forks: 20
Open Issues: 3
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Generating Diverse High-Fidelity Images with VQ-VAE-2 [Work in Progress]
PyTorch implementation of Hierarchical, Vector Quantized, Variational Autoencoders (VQ-VAE-2)
from the paper "Generating Diverse High-Fidelity Images with VQ-VAE-2"

Original paper can be found [here](https://arxiv.org/abs/1906.00446)

Vector Quantizing layer based off implementation by [@rosinality](https://github.com/rosinality)
found [here](https://github.com/rosinality/vq-vae-2-pytorch).

Aiming for a focus on supporting an arbitrary number of VQ-VAE "levels". Most implementations in
PyTorch typically only use 2 which is limiting at higher resolutions. This repository contains
checkpoints for a 3-level and 5-level VQ-VAE-2, trained on FFHQ1024.

This project will not only contain the VQ-VAE-2 architecture, but also an example autoregressive
prior and latent dataset extraction.

> This project is very much Work-in-Progress.
> VQ-VAE-2 model is mostly complete. PixelSnail prior models are still experimental
> and most definitely do not work.

## Usage
### VQ-VAE-2 Usage
Run VQ-VAE-2 training using the config `task_name` found in `hps.py`. Defaults to `cifar10`:
```
python main-vqvae.py --task task_name
```

Evaluate VQ-VAE-2 from parameters `state_dict_path` on task `task_name`. Defaults to `cifar10`:
```
python main-vqvae.py --task task_name --load-path state_dict_path --evaluate
```

Other useful flags:
```
--no-save # disables saving of files during training
--cpu # do not use GPU
--batch-size # overrides batch size in cfg.py, useful for evaluating on larger batch size
--no-tqdm # disable tqdm status bars
--no-save # disables saving of files
--no-amp # disables using native AMP (Automatic Mixed Precision) operations
--save-jpg # save all images as jpg instead of png, useful for extreme resolutions
```

### Latent Dataset Generation
Run latent dataset generation using VQ-VAE-2 saved at `path` that was trained on task `task_name`. Defaults to `cifar10`:
```
python main-latents.py path --task task_name
```
Result is saved in `latent-data` directory.

Other useful flags:
```
--cpu # do not use GPU
--batch-size # overrides batch size in cfg.py, useful for evaluating on larger batch size
--no-tqdm # disable tqdm status bars
--no-save # disables saving of files
--no-amp # disables using native AMP (Automatic Mixed Precision) operations
```

### Discrete Prior Usage
Run level `level` PixelSnail discrete prior training using the config `task_name` found in `hps.py` using latent dataset saved at path `latent_dataset.pt` and VQ-VAE `vqvae_path` to dequantize conditioning variables. Defaults to `cifar10`:
```
python main-pixelsnail.py latent_dataset.pt vqvae_path.pt level --task task_name
```

Other useful flags:
```
--cpu # do not use GPU
--load-path # resume from saved state on disk
--batch-size # overrides batch size in cfg.py, useful for evaluating on larger batch size
--save-jpg # save all images as jpg instead of png, useful for extreme resolutions
--no-tqdm # disable tqdm status bars
--no-save # disables saving of files
```

### Sample Generation
Run sampling script on trained VQ-VAE-2 and PixelSnail priors using the config `task_name` (default `cifar10`) found in `hps.py`.
The first positional argument is the path to the VQ-VAE-2 checkpoint. The remaining `L` positional arguments are the PixelSnail
prior checkpoints from level `0` to `L`.
```
python main-sample.py vq_vae_path.pt pixelsnail_0_path.pt pixel_snail_1_path.pt ... --task task_name
```

Other useful flags:
```
--cpu # do not use GPU
--batch-size # overrides batch size in cfg.py, useful for evaluating on larger batch size
--nb-samples # number of samples to generate. defaults to 1.
--no-tqdm # disable tqdm status bars
--no-save # disables saving of files
--no-amp # disables using native AMP (Automatic Mixed Precision) operations
--save-jpg # save all images as jpg instead of png, useful for extreme resolutions
--temperature # controls softmax temperature during sampling
```

## Modifications
- Replacing residual layers with ReZero layers.

## Samples
*Reconstructions from FFHQ1024 using a 3-level VQ-VAE-2*
![Reconstructions from FFHQ1024 using a 3-level VQ-VAE-2](recon-example.png)

## Checkpoints
[FFHQ1024 - 3-level VQ-VAE-2](ffhq1024-state-dict-0017.pt)

[FFHQ1024 - 5-level VQ-VAE-2](ffhq1024-large-state-dict-0010.pt)

### Roadmap
- [X] Server mode (no fancy printing)
- [X] Experiment directories (containing logs / checkpoints / etc)
- [X] Accumulated gradient training (for larger batch sizes on limited resources)
- [X] Samples and checkpoints on FFHQ1024
- [X] Latent dataset generation
- [ ] Autoregressive prior models / training scripts
- [X] Full system sampling
- [ ] Prettier outputs
- [ ] Output logging

### Citations
**Generating Diverse High-Fidelity Images with VQ-VAE-2**
```
@misc{razavi2019generating,
title={Generating Diverse High-Fidelity Images with VQ-VAE-2},
author={Ali Razavi and Aaron van den Oord and Oriol Vinyals},
year={2019},
eprint={1906.00446},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```

**PixelSNAIL: An Improved Autoregressive Generative Model**
```
@misc{chen2017pixelsnail,
title={PixelSNAIL: An Improved Autoregressive Generative Model},
author={Xi Chen and Nikhil Mishra and Mostafa Rohaninejad and Pieter Abbeel},
year={2017},
eprint={1712.09763},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```

**ReZero is All You Need: Fast Convergence at Large Depth**
```
@misc{bachlechner2020rezero,
title={ReZero is All You Need: Fast Convergence at Large Depth},
author={Thomas Bachlechner and Bodhisattwa Prasad Majumder and Huanru Henry Mao and Garrison W. Cottrell and Julian McAuley},
year={2020},
eprint={2003.04887},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vvvm23/vqvae-2

Awesome Lists containing this project

README