{"id":13907616,"url":"https://github.com/AssemblyAI-Community/MinImagen","last_synced_at":"2025-07-18T06:30:34.632Z","repository":{"id":45802850,"uuid":"514644612","full_name":"AssemblyAI-Community/MinImagen","owner":"AssemblyAI-Community","description":"MinImagen: A minimal implementation of the Imagen text-to-image model","archived":false,"fork":false,"pushed_at":"2023-05-08T20:04:30.000Z","size":6842,"stargazers_count":304,"open_issues_count":13,"forks_count":57,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-05-20T00:02:37.170Z","etag":null,"topics":["deep-learning","diffusion-models","imagen","pytorch","super-resolution","text-to-image"],"latest_commit_sha":null,"homepage":"https://assemblyai-examples.github.io/MinImagen/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/AssemblyAI-Community.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2022-07-16T17:34:04.000Z","updated_at":"2025-05-11T20:52:25.000Z","dependencies_parsed_at":"2024-04-09T01:59:19.031Z","dependency_job_id":null,"html_url":"https://github.com/AssemblyAI-Community/MinImagen","commit_stats":null,"previous_names":["assemblyai-community/minimagen","assemblyai-examples/minimagen"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/AssemblyAI-Community/MinImagen","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AssemblyAI-Community%2FMinImagen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AssemblyAI-Community%2FMinImagen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AssemblyAI-Community%2FMinImagen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AssemblyAI-Community%2FMinImagen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/AssemblyAI-Community","download_url":"https://codeload.github.com/AssemblyAI-Community/MinImagen/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/AssemblyAI-Community%2FMinImagen/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265710530,"owners_count":23815373,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["deep-learning","diffusion-models","imagen","pytorch","super-resolution","text-to-image"],"created_at":"2024-08-06T23:02:02.074Z","updated_at":"2025-07-18T06:30:33.894Z","avatar_url":"https://github.com/AssemblyAI-Community.png","language":"Python","funding_links":[],"categories":["HarmonyOS"],"sub_categories":["Windows Manager"],"readme":"# MinImagen\n### A Minimal implementation of the [Imagen](https://imagen.research.google/) text-to-image model.\n\n\u003cbr/\u003e\n\n\u003cp align=\"center\"\u003e\u003cimg src=\"./images/model_structure.png?raw=True\" width=\"700\"/\u003e\u003c/p\u003e\n\n\u003cbr/\u003e\n\n### See [Build Your Own Imagen Text-to-Image Model](https://www.assemblyai.com/blog/build-your-own-imagen-text-to-image-model/) for a tutorial on how to build MinImagen.\n\n### See [How Imagen Actually Works](https://www.assemblyai.com/blog/how-imagen-actually-works/) for a detailed explanation of Imagen's operating principles.\n\n\u003cbr/\u003e\n\nGiven a caption of an image, the text-to-image model **Imagen** will generate an image that reflects the scene described by the caption. The model is a [cascading diffusion model](https://arxiv.org/abs/2106.15282), using a [T5 text encoder](https://arxiv.org/abs/1910.10683) to generate a caption encoding which conditions a base image generator and then a sequence of super-resolution models through which the output of the base image generator is passed.\n\nIn particular, two notable contributions are the developments of:\n1. [**Noise Conditioning Augmentation**](https://www.assemblyai.com/blog/how-imagen-actually-works/#robust-cascaded-diffusion-models), which noises low-resolution conditioning images in the super-resolution models, and\n2. [**Dynamic Thresholding**](https://www.assemblyai.com/blog/how-imagen-actually-works/#dynamic-thresholding) which helps prevent image saturation at high [classifier-free guidance](https://www.assemblyai.com/blog/how-imagen-actually-works/#classifier-free-guidance) weights.\n\n\u003cbr/\u003e\n\n**N.B. - This project is intended only for educational purposes to demonstrate how Diffusion Models are implemented and incorporated into text-to-image models. Many components of the network that are not essential for these educational purposes have been stripped off for simplicity. For a full-fledged implementation, check out Phil Wang's repo (see attribution note below)**\n\n\u003cbr/\u003e\n\n## Table of Contents\n- [Attribution Note](#attribution-note)\n- [Installation](#installation)\n- [Documentation](#documentation)\n- [Usage - Command Line](#usage---command-line)\n    - [`main.py`](#mainpy) - training and image generation in sequence\n    - [`train.py`](#trainpy) - training a MinImagen instance\n    - [`inference.py`](#inferencepy) - generated images using a MinImagen instance\n- [Usage - Package](#usage---package)\n    - [Training](#training)\n    - [Image Generation](#image-generation)\n- [Modifying the Source Code](#modifying-the-source-code)\n- [Additional Resources](#additional-resources)\n- [Socials](#socials)\n\n\u003cbr/\u003e\n\n## Attribution Note\nThis implementation is largely based on Phil Wang's [Imagen implementation](https://github.com/lucidrains/imagen-pytorch).\n\n\u003cbr/\u003e\n\n## Installation\nTo install MinImagen, run the following command in the terminal:\n```bash\n$ pip install minimagen\n```\n**Note that MinImagen requires Python3.9 or higher**\n\n\u003cbr/\u003e\n\n## Documentation\nSee the [MinImagen Documentation](https://assemblyai-examples.github.io/MinImagen/) to learn more about the package.\n\n\u003cbr/\u003e\n\n## Usage - Command Line\nIf you have cloned this repo (as opposed to just installing the `minimagen` package), you can use the provided scripts to get started with MinImagen. This repo can be cloned by running the following command in the terminal:\n\n```bash\n$ git clone https://github.com/AssemblyAI-Examples/MinImagen.git\n```\n\n\u003cbr/\u003e\n\n### `main.py`\nFor the most basic usage, simply enter the MinImagen directory and run the following in the terminal:\n```bash\n$ python main.py\n```\nThis will create a small MinImagen instance and train it on a tiny amount of data, and then use this MinImagen instance to generate an image.\n\n\u003ca id=\"training-directory\"\u003eAfter\u003ca/\u003e running the script, you will see a directory called `training_\u003cTIMESTAMP\u003e`. \n1. This directory is called a *Training Directory* and is generated when training a MinImagen instance. \n2. It contains information about the configuration (`parameters` subdirectory), and contains the model checkpoints (`state_dicts` and `tmp` directories). \n3. It also contains a `training_progress.txt` file that records training progress.\n\nYou will also see a directory called `generated_images_\u003cTIMESTEP\u003e`.\n1. This directory contains a folder of images generated by the model (`generated_images`).\n2. It also contains `captions.txt` files, which documents the captions that were input to get the images (where the line index of a given caption corresponds to the image number in the `generated_iamges` folder).\n3. Finally, this directory also contains `imagen_training_directory.txt`, which specifies the name of the Training Directory used to load the MinImagen instance / generate images. \n\n\u003cbr/\u003e\n    \n### `train.py`\n\n`main.py` simply runs `train.py` and `inference.py` in series, the former to train the model and the latter to generate the image.\n\nTo train a model, simply run `train.py` and specify relevant command line arguments. The [possible arguments](https://github.com/AssemblyAI-Examples/MinImagen/blob/d7de8350db17713fb630e127c010020820953872/minimagen/training.py#L178) are:\n\n- `--PARAMETERS` or `-p`, which specifies a directory that specifies the MinImagen configuration to use. It should be structured like a `parameters` subdirectory within a Training Directory (example in [`parameters`](https://github.com/AssemblyAI-Examples/MinImagen/tree/main/parameters)).\n- `--NUM_WORKERS\"` or `-n`, which specifies the number of workers to use for the DataLoaders.\n- `--BATCH_SIZE` or `-b`, which specifies the batch size to use during training.\n- `--MAX_NUM_WORDS` or `-mw`, which specifies the maximum number of words allowed in a caption.\n- `--IMG_SIDE_LEN` or `-s`, specifies the final side length of the square images the MinImagen will output.\n- `--EPOCHS` or `-e`, which specifies the number of training epochs.\n- `--T5_NAME` `-t5`, which specifies the name of T5 encoder to use.\n- `--TRAIN_VALID_FRAC` or `-f`, which specifies the fraction of dataset to use for training (vs. validation).\n- `--TIMESTEPS` or `-t`, which specifies the number of timesteps in Diffusion Process.\n- `--OPTIM_LR` or `-lr`, which specifies the learning rate for Adam optimizer.\n- `--ACCUM_ITER` or `-ai`, which specifies the number of batches to accumulate for gradient accumulation.\n- `--CHCKPT_NUM` or `-cn`, which specifies the interval of batches to create a temporary model checkpoint at during training.\n- `--VALID_NUM` or `-vn`, which specifies the number of validation images to use. If None, uses full amount from train/valid split. The reason for including this is that, even with an e.g. 0.99 `--TRAIN_VALID_FRAC`, a prohibitively large number of images could still be left for validation for very large datasets.\n- `--RESTART_DIRECTORY` or `-rd`, training directory to load MinImagen instance from if resuming training. A new Training Directory will be created for the training, leaving the previous Training Directory from which the checkpoint is loaded unperturbed.\n- `--TESTING` or `-test`, which is used to run the script with a small MinImagen instance and small dataset for testing.\n\nFor example, to run a small training using the provided example [`parameters`](https://github.com/AssemblyAI-Examples/MinImagen/tree/main/parameters) folder, run the following in the terminal:\n\n```bash\npython train.py --PARAMETERS ./parameters --BATCH_SIZE 2 --TIMESTEPS 25 --TESTING\n```\nAfter execution, you will see a new `training_\u003cTIMESTAMP\u003e` [Training Directory](#training-directory) that contains the files as [listed above](#training-directory) from the training.\n    \n\u003cbr/\u003e\n    \n### `inference.py`\n    \nTo generate images using a model from a [Training Directory](#training-directory), we can use `inference.py`. Simply run `inference.py` and specify relevant command line arguments. The possible arguments are:\n    \n- `--TRAINING_DIRECTORY\"` or `-d`, which specifies the training directory from which to load the MinImagen instance for inference.\n- `--CAPTIONS` or `-c`, which specifies either (a) a single caption to generate an image for, or (b) a filepath to a `.txt` file that contains a list of captions to generate images for, where each caption is on a new line.\n    \nFor example, to generate images for the example captions provided in [`captions.txt`](https://github.com/AssemblyAI-Examples/MinImagen/blob/main/captions.txt) using the model generated from the above training line, simply run\n    \n```bash\npython inference.py -CAPTIONS captions.txt --TRAINING_DIRECTORY training_\u003cTIMESTAMP\u003e    \n```\n\nwhere `TIMESTAMP` is replaced with the appropriate value from your training.\n    \n\u003cbr/\u003e\n\n## Usage - Package\n\n### Training\n    \nA minimal training script using the `minimagen` package is shown below. See [`train.py`](https://github.com/AssemblyAI-Examples/MinImagen/blob/main/train.py) for a more built-up version of the below code.\n    \n```python\nimport os\nfrom datetime import datetime\n\nimport torch.utils.data\nfrom torch import optim\n\nfrom minimagen.Imagen import Imagen\nfrom minimagen.Unet import Unet, Base, Super, BaseTest, SuperTest\nfrom minimagen.generate import load_minimagen, load_params\nfrom minimagen.t5 import get_encoded_dim\nfrom minimagen.training import get_minimagen_parser, ConceptualCaptions, get_minimagen_dl_opts, \\\n    create_directory, get_model_size, save_training_info, get_default_args, MinimagenTrain, \\\n    load_testing_parameters\n\n# Get device\ndevice = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n\n# Command line argument parser\nparser = get_minimagen_parser()\nargs = parser.parse_args()\n\n# Create training directory\ntimestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\ndir_path = f\"./training_{timestamp}\"\ntraining_dir = create_directory(dir_path)\n\n# Replace some cmd line args to lower computational load.\nargs = load_testing_parameters(args)\n\n# Load subset of Conceptual Captions dataset.\ntrain_dataset, valid_dataset = ConceptualCaptions(args, smalldata=True)\n\n# Create dataloaders\ndl_opts = {**get_minimagen_dl_opts(device), 'batch_size': args.BATCH_SIZE, 'num_workers': args.NUM_WORKERS}\ntrain_dataloader = torch.utils.data.DataLoader(train_dataset, **dl_opts)\nvalid_dataloader = torch.utils.data.DataLoader(valid_dataset, **dl_opts)\n\n# Use small U-Nets to lower computational load.\nunets_params = [get_default_args(BaseTest), get_default_args(SuperTest)]\nunets = [Unet(**unet_params).to(device) for unet_params in unets_params]\n\n# Specify MinImagen parameters\nimagen_params = dict(\n    image_sizes=(int(args.IMG_SIDE_LEN / 2), args.IMG_SIDE_LEN),\n    timesteps=args.TIMESTEPS,\n    cond_drop_prob=0.15,\n    text_encoder_name=args.T5_NAME\n)\n\n# Create MinImagen from UNets with specified imagen parameters\nimagen = Imagen(unets=unets, **imagen_params).to(device)\n\n# Fill in unspecified arguments with defaults to record complete config (parameters) file\nunets_params = [{**get_default_args(Unet), **i} for i in unets_params]\nimagen_params = {**get_default_args(Imagen), **imagen_params}\n\n# Get the size of the Imagen model in megabytes\nmodel_size_MB = get_model_size(imagen)\n\n# Save all training info (config files, model size, etc.)\nsave_training_info(args, timestamp, unets_params, imagen_params, model_size_MB, training_dir)\n\n# Create optimizer\noptimizer = optim.Adam(imagen.parameters(), lr=args.OPTIM_LR)\n\n# Train the MinImagen instance\nMinimagenTrain(timestamp, args, unets, imagen, train_dataloader, valid_dataloader, training_dir, optimizer, timeout=30)\n```\n\n### Image Generation\n    \nA minimal inference script using the `minimagen` package is shown below. See [`inference.py`](https://github.com/AssemblyAI-Examples/MinImagen/blob/main/inference.py) for a more built-up version of the below code.\n\n```python\nfrom argparse import ArgumentParser\nfrom minimagen.generate import load_minimagen, sample_and_save\n\n# Command line argument parser\nparser = ArgumentParser()\nparser.add_argument(\"-d\", \"--TRAINING_DIRECTORY\", dest=\"TRAINING_DIRECTORY\", help=\"Training directory to use for inference\", type=str)\nargs = parser.parse_args()\n\n# Specify the caption(s) to generate images for\ncaptions = ['a happy dog']\n\n# Use `sample_and_save` to generate and save the iamges\nsample_and_save(captions, training_directory=args.TRAINING_DIRECTORY)\n\n\n\n# Alternatively, rather than specifying a Training Directory, you can input just a MinImagen instance to use for image generation.\n# In this case, information about the MinImagen instance used to generate the images will not be saved.\nminimagen = load_minimagen(args.TRAINING_DIRECTORY)\nsample_and_save(captions, minimagen=minimagen)    \n```\n    \nTo see more of what MinImagen has to offer, or to get additional details on the scripts above, check out the [MinImagen Documentation](https://assemblyai-examples.github.io/MinImagen/)\n\n\u003cbr/\u003e\n\n## Modifying the Source Code\nIf you want to make modifications to the source code (rather than use the `minimagen` package), first clone this repository and navigate into it:\n\n```bash\n$ git clone https://github.com/AssemblyAI-Examples/MinImagen.git\n$ cd MinImagen\n```\n\nAfter that, create a virtual environment:\n```bash\n$ pip install virtualenv\n$ virtualenv venv\n```\n\nThen activate the virtual environment and install all dependencies:\n```bash\n$ .\\venv\\Scripts\\activate.bat  # Windows\n$ source venv/bin/activate  # MacOS/Linux\n$ pip install -r requirements.txt\n```\n\nNow you can modify the source code and the changes will be reflected when running any of the [included scripts](#usage---command-line) (as long as the virtual environment created above is active).\n\n\n\u003cbr/\u003e\n\n## Additional Resources\n\n- For a step-by-step guide on how to build the version of Imagen in this repository, see [Build Your Own Imagen Text-to-Image Model](https://www.assemblyai.com/blog/build-your-own-imagen-text-to-image-model/).\n- For an deep-dive into how Imagen works, see [How Imagen Actually Works](https://www.assemblyai.com/blog/how-imagen-actually-works/).\n- For a deep-dive into Diffusion Models, see our [Introduction to Diffusion Models for Machine Learning](https://www.assemblyai.com/blog/diffusion-models-for-machine-learning-introduction/) guide.\n- For additional learning resources on Machine Learning and Deep Learning, check out our [Blog](https://www.assemblyai.com/blog/) and [YouTube channel](https://www.youtube.com/c/AssemblyAI).\n- Read the original Imagen paper [here](https://arxiv.org/abs/2205.11487).\n    \n## Socials\n- Follow us on [Twitter](https://twitter.com/AssemblyAI) for more Deep Learning content.\n- [Follow our newsletter](https://assemblyai.us17.list-manage.com/subscribe?u=cb9db7b18b274c2d402a56c5f\u0026id=2116bf7c68) to stay up to date on our recent content.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAssemblyAI-Community%2FMinImagen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FAssemblyAI-Community%2FMinImagen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FAssemblyAI-Community%2FMinImagen/lists"}