An open API service indexing awesome lists of open source software.

https://github.com/vishal-v/stackgan

TensorFlow implementation of "Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks" by Han Zhang, et al.
https://github.com/vishal-v/stackgan

conditioning-augmentation cub-200 gans generative-adversarial-network keras stack-gan tensorflow-2

Last synced: about 1 month ago
JSON representation

TensorFlow implementation of "Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks" by Han Zhang, et al.

Awesome Lists containing this project

README

        

# StackGAN
### Text to Photo-Realistic Image Synthesis
---
#### Dependencies
```
tensorflow==2.1.0
numpy==1.16.4
absl_py==0.7.0
matplotlib==2.2.3
pandas==0.23.4
Pillow==6.1.0
```
#### Downloads
- To download all the dependencies, simply execute
```
pip install -r requirements.txt
```
- To download the CUB 200 dataset, simply execute the `data_download.py` file
```
python data_download.py
```
- Download the Char-RNN-CNN embeddings from this link: [**download link**](https://drive.google.com/file/d/0B3y_msrWZaXLT1BZdVdycDY5TEE) and unzip it in place.
```
unzip birds.zip
```
#### Training
- The `model.py` file contains the bare minimum code to run the stage 1 and stage 2 architecture. It automatically stores the weights after the specified/default number of epochs have completed. Note that the weights will be stored at the same directory level as `model.py`.
```
python model.py
```
#### Architecture

- Stage 1
- Text Encoder Network
- Text description to a 1024 dimensional text embedding
- Learning Deep Representations of Fine-Grained Visual Descriptions [Arxiv Link](https://arxiv.org/abs/1605.05395)
- Conditioning Augmentation Network
- Adds randomness to the network
- Produces more image-text pairs
- Generator Network
- Discriminator Network
- Embedding Compressor Network
- Outputs a 64x64 image
#
- Stage 2
- Text Encoder Network
- Conditioning Augmentation Network
- Generator Network
- Discriminator Network
- Embedding Compressor Network
- Outputs a 256x256 image
---
#### Reference Papers
1. **StackGAN: Text to photo-realistic image synthesis** [[Arxiv Link](https://arxiv.org/pdf/1612.03242.pdf)]
2. **Improved Techniques for Training GANs** [[Arxiv Link](https://arxiv.org/pdf/1606.03498.pdf)]
3. **Generative Adversarial Text to Image Synthesis** [[Arxiv Link](https://arxiv.org/pdf/1605.05396.pdf)]
4. **Learning Deep Representations of Fine-Grained Visual Descriptions** [[Arxiv Link](https://arxiv.org/abs/1605.05395)]
---
#### Note
This is the code I have submitted to TensorFlow for Google Summer of Code. Hence the attributions and the License is for "TensorFlow Authors" and not "Vishal V". This code is under the MIT License.