Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/chrispapa2000/maskgst
Official Implementation of Masked Generative Story Transformer with Character Guidance and Caption Augmentation, for Story Visualization
https://github.com/chrispapa2000/maskgst
Last synced: 3 months ago
JSON representation
Official Implementation of Masked Generative Story Transformer with Character Guidance and Caption Augmentation, for Story Visualization
- Host: GitHub
- URL: https://github.com/chrispapa2000/maskgst
- Owner: chrispapa2000
- License: mit
- Created: 2024-03-10T17:00:46.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-04-08T07:47:32.000Z (10 months ago)
- Last Synced: 2024-08-01T18:31:47.958Z (6 months ago)
- Language: Python
- Homepage:
- Size: 9.71 MB
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-diffusion-categorized - [Code
README
# Official PyTorch Implementation of MaskGST (Masked Generative Story Transformer with Character Guidance and Caption Augmentation)
![example generations for our model](/assets/example.png "example generations for our model")
## Setup
This project was developed in `Python3.8` using PyTorch `v1.8.0`Start by setting up a virtual environment:
```
virtualenv -p /usr/bin/python3.8 venv
source venv/bin/activate
```Install PyTorch:
```
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
```Install the remaining dependencies:
```
pip install -r requirements.txt
```## Prepare Data
Dowload the Pororo-SV dataset from [StoryDALL-E](https://github.com/adymaharana/storydalle/tree/main?tab=readme-ov-file) and extract it under ```../data/```## Training for Pororo-SV
### Training VQ-GAN
```
python train_vqgan.py --default_root_dir \
--max_epochs \
--config_path taming_transformers/configs/custom/f8_128.yaml
```### Training MaskGST
```
python train_pororo.py --num_nodes \
--num_workers \
--num_gpus \
--default_root_dir \
--vq_vae_config taming_transformers/configs/custom/f8_128.yaml \
--vq_vae_path \
--batch_size \
```## Inference
```
python infer.py --num_workers \
--timesteps 20 \
--outfolder \
--model_path
```## Acknowledgements
- VQ-GAN's implementation from [Taming Transformers](https://github.com/CompVis/taming-transformers) is used
- The code for the Masked Generative Transformer is adapted from this open source implementation of [MUSE](https://github.com/lucidrains/muse-maskgit-pytorch)## Citation
```
@misc{papadimitriou2024masked,
title={Masked Generative Story Transformer with Character Guidance and Caption Augmentation},
author={Christos Papadimitriou and Giorgos Filandrianos and Maria Lymperaiou and Giorgos Stamou},
year={2024},
eprint={2403.08502},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```