{"id":13744748,"url":"https://github.com/aelnouby/Text-to-Image-Synthesis","last_synced_at":"2025-05-09T03:33:12.671Z","repository":{"id":218143110,"uuid":"108480383","full_name":"aelnouby/Text-to-Image-Synthesis","owner":"aelnouby","description":"Pytorch implementation of Generative Adversarial Text-to-Image Synthesis paper","archived":false,"fork":false,"pushed_at":"2020-07-24T18:17:03.000Z","size":465,"stargazers_count":410,"open_issues_count":16,"forks_count":90,"subscribers_count":15,"default_branch":"master","last_synced_at":"2025-04-06T19:12:27.700Z","etag":null,"topics":["gans","image-generation","pytorch","text-to-image","zero-shot-learning"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/aelnouby.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-10-27T00:34:31.000Z","updated_at":"2025-04-02T02:06:12.000Z","dependencies_parsed_at":null,"dependency_job_id":"aa051c23-f81c-4bea-a695-b074f0a4b9f5","html_url":"https://github.com/aelnouby/Text-to-Image-Synthesis","commit_stats":null,"previous_names":["aelnouby/text-to-image-synthesis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aelnouby%2FText-to-Image-Synthesis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aelnouby%2FText-to-Image-Synthesis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aelnouby%2FText-to-Image-Synthesis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/aelnouby%2FText-to-Image-Synthesis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/aelnouby","download_url":"https://codeload.github.com/aelnouby/Text-to-Image-Synthesis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253183271,"owners_count":21867393,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["gans","image-generation","pytorch","text-to-image","zero-shot-learning"],"created_at":"2024-08-03T05:01:15.313Z","updated_at":"2025-05-09T03:33:12.225Z","avatar_url":"https://github.com/aelnouby.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# Text-to-Image-Synthesis \n\n## Intoduction\n\nThis is a pytorch implementation of [Generative Adversarial Text-to-Image Synthesis paper](https://arxiv.org/abs/1605.05396), we train a conditional generative adversarial network, conditioned on text descriptions, to generate images that correspond to the description. The network architecture is shown below (Image from [1]). This architecture is based on DCGAN.\n\n\u003cfigure\u003e\u003cimg src='images/pipeline.png'\u003e\u003c/figure\u003e\nImage credits [1]\n\n## Requirements\n\n- pytorch \n- visdom\n- h5py\n- PIL\n- numpy\n\nThis implementation currently only support running with GPUs.\n\n## Implementation details\n\nThis implementation follows the Generative Adversarial Text-to-Image Synthesis paper [1], however it works more on training stablization and preventing mode collapses by implementing:\n- Feature matching [2]\n- One sided label smoothing [2]\n- minibatch discrimination [2] (implemented but not used)\n- WGAN [3]\n- WGAN-GP [4] (implemented but not used)\n\n## Datasets\n\nWe used [Caltech-UCSD Birds 200](http://www.vision.caltech.edu/visipedia/CUB-200.html) and [Flowers](http://www.robots.ox.ac.uk/~vgg/data/flowers/102/) datasets, we converted each dataset (images, text embeddings) to hd5 format. \n\nWe used the [text embeddings](https://github.com/reedscot/icml2016) provided by the paper authors\n\n**To use this code you can either:**\n\n- Use the converted hd5 datasets,  [birds](https://drive.google.com/open?id=1mNhn6MYpBb-JwE86GC1kk0VJsYj-Pn5j), [flowers](https://drive.google.com/open?id=1EgnaTrlHGaqK5CCgHKLclZMT_AMSTyh8)\n- Convert the data youself\n  1. download the dataset as described [here](https://github.com/reedscot/cvpr2016)\n  2. Add the paths to the dataset to `config.yaml` file.\n  3. Use [convert_cub_to_hd5_script](convert_cub_to_hd5_script.py) or [convert_flowers_to_hd5_script](convert_flowers_to_hd5_script.py) script to convert the dataset.\n  \n**Hd5 file taxonomy**\n`\n - split (train | valid | test )\n    - example_name\n      - 'name'\n      - 'img'\n      - 'embeddings'\n      - 'class'\n      - 'txt'\n      \n## Usage\n### Training\n\n`python runtime.py\n\n**Arguments:**\n- `type` : GAN archiecture to use `(gan | wgan | vanilla_gan | vanilla_wgan)`. default = `gan`. Vanilla mean not conditional\n- `dataset`: Dataset to use `(birds | flowers)`. default = `flowers`\n- `split` : An integer indicating which split to use `(0 : train | 1: valid | 2: test)`. default = `0`\n- `lr` : The learning rate. default = `0.0002`\n- `diter` :  Only for WGAN, number of iteration for discriminator for each iteration of the generator. default = `5`\n- `vis_screen` : The visdom env name for visualization. default = `gan`\n- `save_path` : Path for saving the models.\n- `l1_coef` : L1 loss coefficient in the generator loss fucntion for gan and vanilla_gan. default=`50`\n- `l2_coef` : Feature matching coefficient in the generator loss fucntion for gan and vanilla_gan. default=`100`\n- `pre_trained_disc` : Discriminator pre-tranined model path used for intializing training.\n- `pre_trained_gen` Generator pre-tranined model path used for intializing training.\n- `batch_size`: Batch size. default= `64`\n- `num_workers`: Number of dataloader workers used for fetching data. default = `8`\n- `epochs` : Number of training epochs. default=`200`\n- `cls`: Boolean flag to whether train with cls algorithms or not. default=`False`\n\n\n## Results\n\n### Generated Images\n\n\u003cp align='center'\u003e\n\u003cimg src='images/64_flowers.jpeg'\u003e\n\u003c/p\u003e\n\n## Text to image synthesis\n| Text        | Generated Images  |\n| ------------- | -----:|\n| A blood colored pistil collects together with a group of long yellow stamens around the outside        | \u003cimg src='images/examples/a blood colored pistil collects together with a group of long yellow stamens around the outside whic.jpg'\u003e  |\n| The petals of the flower are narrow and extremely pointy, and consist of shades of yellow, blue      | \u003cimg src='images/examples/the petals of the flower are narrow and extremely pointy, and consist of shades of yellow, blue and .jpg'\u003e  |\n| This pale peach flower has a double row of long thin petals with a large brown center and coarse loo | \u003cimg src='images/examples/this pale peach flower has a double row of long thin petals with a large brown center and coarse loo.jpg'\u003e |\n| The flower is pink with petals that are soft, and separately arranged around the stamens that has pi | \u003cimg src='images/examples/the flower is pink with petals that are soft, and separately arranged around the stamens that has pi.jpg'\u003e |\n| A one petal flower that is white with a cluster of yellow anther filaments in the center | \u003cimg src='images/examples/a one petal flower that is white with a cluster of yellow anther filaments in the center.jpg'\u003e |\n\n\n## References\n[1]  Generative Adversarial Text-to-Image Synthesis https://arxiv.org/abs/1605.05396\n\n[2]  Improved Techniques for Training GANs https://arxiv.org/abs/1606.03498\n\n[3]  Wasserstein GAN https://arxiv.org/abs/1701.07875\n\n[4] Improved Training of Wasserstein GANs https://arxiv.org/pdf/1704.00028.pdf\n\n\n## Other Implementations\n\n1. https://github.com/reedscot/icml2016 (the authors version)\n2. https://github.com/paarthneekhara/text-to-image (tensorflow)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faelnouby%2FText-to-Image-Synthesis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faelnouby%2FText-to-Image-Synthesis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faelnouby%2FText-to-Image-Synthesis/lists"}