{"id":19066255,"url":"https://github.com/epfml/text_to_image_generation","last_synced_at":"2025-06-30T20:04:29.824Z","repository":{"id":43816865,"uuid":"472704124","full_name":"epfml/text_to_image_generation","owner":"epfml","description":null,"archived":false,"fork":false,"pushed_at":"2022-10-24T09:39:40.000Z","size":6354,"stargazers_count":5,"open_issues_count":0,"forks_count":5,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-06-30T20:03:24.600Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/epfml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-03-22T09:53:39.000Z","updated_at":"2023-06-22T15:50:32.000Z","dependencies_parsed_at":"2023-01-20T08:16:05.680Z","dependency_job_id":null,"html_url":"https://github.com/epfml/text_to_image_generation","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/epfml/text_to_image_generation","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Ftext_to_image_generation","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Ftext_to_image_generation/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Ftext_to_image_generation/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Ftext_to_image_generation/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/epfml","download_url":"https://codeload.github.com/epfml/text_to_image_generation/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/epfml%2Ftext_to_image_generation/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":262842896,"owners_count":23373164,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T00:55:38.278Z","updated_at":"2025-06-30T20:04:29.754Z","avatar_url":"https://github.com/epfml.png","language":"Python","readme":"# Master Thesis on Text-to-Image Generative Models\n## Implementing and Experimenting with Diffusion Models for Text-to-Image Generation\n\n*By **Robin Zbinden** under the supervision of **Luis Barba** and **Martin Jaggi***.\n\nIn this project, we **implement a text-to-image generative model** based on DALL-E 2 and conduct some experiments to understand the possibilities of this type of model. We also propose a new guidance method for diffusion models called *image guidance*. All the model specifications and results can be found in the `master_thesis_report.pdf`.\n\n### How to generate images from text?\n\n1. Download the checkpoints of the image decoder, CLIP translator, and upsampler, as well as the means and standard deviations of the embeddings [here](https://drive.google.com/drive/folders/1NEYwdgRLBx-nRvw66Td8cxr4nS5yTkgq?usp=sharing). Then put all these files into the folder named *models*.\n\n2. Write a textual description of the images you want to generate in `captions.txt`. One caption per line.\n\n3. Run the shell script to generate the images, i.e., `sh sample_from_text.sh`. Feel free to modify the number of samples generated per caption.\n\n4. (Optional) Increase the resolution with the upsampler using the shell script `sample_upsampler.sh`. You need to specify the name of the `npz` file containing the 64x64 images with the argument `base_samples` in the script.\n\n### Code\n\nThe code is divided into three folders: *guided_diffusion*, *scripts*, and *evaluations*. The other folder named *figures* contains the figures created for the master thesis report. The same seed (42) is used in all the experiments.\n\nThe code is based on [openai/guided-difusion](https://github.com/openai/guided-diffusion).\n\n#### guided_diffusion\n\nThis folder contains all the methods to build our model, as well as helper functions to handle the datasets and to train. It is based on [openai/guided-diffusion](https://github.com/openai/guided-diffusion). In particular, it consists of the following files (sorted by relevance):\n\n- `gaussian_diffusion.py`: all the methods used to create and run diffusion processes.\n- `unet.py`: the architecture definition of the U-Net diffusion model.\n- `train_util.py`: helper functions to train the different models.\n- `script_util.py`: helper functions for the scripts.\n- `mlp.py`: the architecture definition of the CLIP translator.\n- `losses.py`: the definitions of the different losses used to train the diffusion model.\n- `dataset_helpers.py`: helper functions to handle the datasets.\n- `nn.py`: basic neural network functions.\n- `logger.py`: functions to log the different steps of training and sampling.\n- `dist_util.py`: functions to distribute the training.\n- `fp16_util.py`: functions to train in a 16 float precision (not used by our model).\n- `resamples.py`: functions to change the distribution over the timesteps during training (not used by our model).\n- `respace.py`: functions to respace the timesteps (not used by our model).\n\n#### scripts\n\nThis folder contains the different scripts to train and sample from our method. A shell file is associated with each python script which requires many arguments. In particular, it consists of the following files (sorted by relevance):\n\n- `sample_from_text.py`: generate images from a set of textual captions.\n- `sample_upsampler.py`: increase the resolution of the images from 64x64 to 256x256.\n- `sample_from_image.py`: generate images from an image embedding.\n- `train_decoder.py`: train the image decoder.\n- `train_translator.py`: train the CLIP translator.\n- `clip_embeddings.py`: create the CLIP embeddings for a dataset.\n- `handling_images.py`: create a figure from a set of images.\n\n#### evaluations\n\nThis folder contains the methods to evaluate our method. Another `README.md` explaining the procedure to replicate the evaluations is available in this folder. \n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepfml%2Ftext_to_image_generation","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fepfml%2Ftext_to_image_generation","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fepfml%2Ftext_to_image_generation/lists"}