https://github.com/vishal-v/stackgan

TensorFlow implementation of "Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks" by Han Zhang, et al.
https://github.com/vishal-v/stackgan

conditioning-augmentation cub-200 gans generative-adversarial-network keras stack-gan tensorflow-2

Last synced: about 1 month ago
JSON representation

TensorFlow implementation of "Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks" by Han Zhang, et al.

Host: GitHub
URL: https://github.com/vishal-v/stackgan
Owner: Vishal-V
License: mit
Created: 2019-08-22T18:25:20.000Z (almost 6 years ago)
Default Branch: master
Last Pushed: 2020-04-01T08:55:26.000Z (about 5 years ago)
Last Synced: 2025-03-30T20:33:53.727Z (2 months ago)
Topics: conditioning-augmentation, cub-200, gans, generative-adversarial-network, keras, stack-gan, tensorflow-2
Language: Python
Homepage: https://arxiv.org/abs/1612.03242
Size: 202 KB
Stars: 34
Watchers: 2
Forks: 9
Open Issues: 5
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# StackGAN
### Text to Photo-Realistic Image Synthesis
---
#### Dependencies
```
tensorflow==2.1.0
numpy==1.16.4
absl_py==0.7.0
matplotlib==2.2.3
pandas==0.23.4
Pillow==6.1.0
```
#### Downloads
- To download all the dependencies, simply execute
```
pip install -r requirements.txt
```
- To download the CUB 200 dataset, simply execute the `data_download.py` file
```
python data_download.py
```
- Download the Char-RNN-CNN embeddings from this link: [**download link**](https://drive.google.com/file/d/0B3y_msrWZaXLT1BZdVdycDY5TEE) and unzip it in place.
```
unzip birds.zip
```
#### Training
- The `model.py` file contains the bare minimum code to run the stage 1 and stage 2 architecture. It automatically stores the weights after the specified/default number of epochs have completed. Note that the weights will be stored at the same directory level as `model.py`.
```
python model.py
```
#### Architecture

- Stage 1
- Text Encoder Network
- Text description to a 1024 dimensional text embedding
- Learning Deep Representations of Fine-Grained Visual Descriptions [Arxiv Link](https://arxiv.org/abs/1605.05395)
- Conditioning Augmentation Network
- Adds randomness to the network
- Produces more image-text pairs
- Generator Network
- Discriminator Network
- Embedding Compressor Network
- Outputs a 64x64 image
#
- Stage 2
- Text Encoder Network
- Conditioning Augmentation Network
- Generator Network
- Discriminator Network
- Embedding Compressor Network
- Outputs a 256x256 image
---
#### Reference Papers
1. **StackGAN: Text to photo-realistic image synthesis** [[Arxiv Link](https://arxiv.org/pdf/1612.03242.pdf)]
2. **Improved Techniques for Training GANs** [[Arxiv Link](https://arxiv.org/pdf/1606.03498.pdf)]
3. **Generative Adversarial Text to Image Synthesis** [[Arxiv Link](https://arxiv.org/pdf/1605.05396.pdf)]
4. **Learning Deep Representations of Fine-Grained Visual Descriptions** [[Arxiv Link](https://arxiv.org/abs/1605.05395)]
---
#### Note
This is the code I have submitted to TensorFlow for Google Summer of Code. Hence the attributions and the License is for "TensorFlow Authors" and not "Vishal V". This code is under the MIT License.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vishal-v/stackgan

Awesome Lists containing this project

README