{"id":13563605,"url":"https://github.com/kuprel/min-dalle","last_synced_at":"2025-05-14T08:08:25.877Z","repository":{"id":40779444,"uuid":"507972385","full_name":"kuprel/min-dalle","owner":"kuprel","description":"min(DALL·E) is a fast, minimal port of DALL·E Mini to PyTorch","archived":false,"fork":false,"pushed_at":"2025-04-28T01:38:43.000Z","size":48720,"stargazers_count":3487,"open_issues_count":25,"forks_count":253,"subscribers_count":26,"default_branch":"main","last_synced_at":"2025-04-28T02:32:43.946Z","etag":null,"topics":["artificial-intelligence","deep-learning","pytorch","text-to-image"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kuprel.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null},"funding":{"github":["kuprel"]}},"created_at":"2022-06-27T15:53:59.000Z","updated_at":"2025-04-28T01:38:47.000Z","dependencies_parsed_at":"2023-01-21T08:45:26.965Z","dependency_job_id":null,"html_url":"https://github.com/kuprel/min-dalle","commit_stats":null,"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuprel%2Fmin-dalle","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuprel%2Fmin-dalle/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuprel%2Fmin-dalle/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kuprel%2Fmin-dalle/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kuprel","download_url":"https://codeload.github.com/kuprel/min-dalle/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254101558,"owners_count":22014908,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","deep-learning","pytorch","text-to-image"],"created_at":"2024-08-01T13:01:21.272Z","updated_at":"2025-05-14T08:08:20.863Z","avatar_url":"https://github.com/kuprel.png","language":"Python","funding_links":["https://github.com/sponsors/kuprel"],"categories":["Python","其他_机器视觉"],"sub_categories":["网络服务_其他"],"readme":"# min(DALL·E)\n\n[![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kuprel/min-dalle/blob/main/min_dalle.ipynb)\n\u0026nbsp;\n[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces%20Demo-blue)](https://huggingface.co/spaces/kuprel/min-dalle)\n\u0026nbsp;\n[![Replicate](https://replicate.com/kuprel/min-dalle/badge)](https://replicate.com/kuprel/min-dalle)\n\u0026nbsp;\n[![Discord](https://img.shields.io/discord/823813159592001537?color=5865F2\u0026logo=discord\u0026logoColor=white)](https://discord.com/channels/823813159592001537/912729332311556136)\n\n[YouTube Walk-through](https://youtu.be/x_8uHX5KngE) by The AI Epiphany\n\nThis is a fast, minimal port of Boris Dayma's [DALL·E Mini](https://github.com/borisdayma/dalle-mini) (with mega weights).  It has been stripped down for inference and converted to PyTorch.  The only third party dependencies are numpy, requests, pillow and torch.\n\nTo generate a 3x3 grid of DALL·E Mega images it takes:\n- 55 sec with a T4 in Colab\n- 33 sec with a P100 in Colab\n- 15 sec with an A10G on Hugging Face\n\nHere's a more detailed breakdown of performance on an A100. Credit to [@technobird22](https://github.com/technobird22) and his [NeoGen](https://github.com/technobird22/NeoGen) discord bot for the graph.\n\u003cbr /\u003e\n\u003cimg src=\"https://github.com/kuprel/min-dalle/raw/main/performance.png\" alt=\"min-dalle\" width=\"450\"/\u003e\n\u003cbr /\u003e\n\nThe flax model and code for converting it to torch can be found [here](https://github.com/kuprel/min-dalle-flax).\n\n## Install\n\n```bash\n$ pip install min-dalle\n```  \n\n## Usage\n\nLoad the model parameters once and reuse the model to generate multiple images.\n\n```python\nfrom min_dalle import MinDalle\n\nmodel = MinDalle(\n    models_root='./pretrained',\n    dtype=torch.float32,\n    device='cuda',\n    is_mega=True, \n    is_reusable=True\n)\n```\n\nThe required models will be downloaded to `models_root` if they are not already there.  Set the `dtype` to `torch.float16` to save GPU memory.  If you have an Ampere architecture GPU you can use `torch.bfloat16`.  Set the `device` to either \"cuda\" or \"cpu\".  Once everything has finished initializing, call `generate_image` with some text as many times as you want.  Use a positive `seed` for reproducible results.  Higher values for `supercondition_factor` result in better agreement with the text but a narrower variety of generated images.  Every image token is sampled from the `top_k` most probable tokens.  The largest logit is subtracted from the logits to avoid infs.  The logits are then divided by the `temperature`.  If `is_seamless` is true, the image grid will be tiled in token space not pixel space.\n\n```python\nimage = model.generate_image(\n    text='Nuclear explosion broccoli',\n    seed=-1,\n    grid_size=4,\n    is_seamless=False,\n    temperature=1,\n    top_k=256,\n    supercondition_factor=32,\n    is_verbose=False\n)\n\ndisplay(image)\n```\n\u003cimg src=\"https://github.com/kuprel/min-dalle/raw/main/examples/nuclear_broccoli.jpg\" alt=\"min-dalle\" width=\"400\"/\u003e\n\nCredit to [@hardmaru](https://twitter.com/hardmaru) for the [example](https://twitter.com/hardmaru/status/1544354119527596034)\n\n\n### Saving Individual Images\nThe images can also be generated as a `FloatTensor` in case you want to process them manually.\n\n```python\nimages = model.generate_images(\n    text='Nuclear explosion broccoli',\n    seed=-1,\n    grid_size=3,\n    is_seamless=False,\n    temperature=1,\n    top_k=256,\n    supercondition_factor=16,\n    is_verbose=False\n)\n```\n\nTo get an image into PIL format you will have to first move the images to the CPU and convert the tensor to a numpy array.\n```python\nimages = images.to('cpu').numpy()\n```\nThen image $i$ can be coverted to a PIL.Image and saved\n```python\nimage = Image.fromarray(images[i])\nimage.save('image_{}.png'.format(i))\n```\n\n### Progressive Outputs\n\nIf the model is being used interactively (e.g. in a notebook) `generate_image_stream` can be used to generate a stream of images as the model is decoding.  The detokenizer adds a slight delay for each image.  Set `progressive_outputs` to `True` to enable this.  An example is implemented in the colab.\n\n```python\nimage_stream = model.generate_image_stream(\n    text='Dali painting of WALL·E',\n    seed=-1,\n    grid_size=3,\n    progressive_outputs=True,\n    is_seamless=False,\n    temperature=1,\n    top_k=256,\n    supercondition_factor=16,\n    is_verbose=False\n)\n\nfor image in image_stream:\n    display(image)\n```\n\u003cimg src=\"https://github.com/kuprel/min-dalle/raw/main/examples/dali_walle_animated.gif\" alt=\"min-dalle\" width=\"300\"/\u003e\n\n### Command Line\n\nUse `image_from_text.py` to generate images from the command line.\n\n```bash\n$ python image_from_text.py --text='artificial intelligence' --no-mega\n```\n\u003cimg src=\"https://github.com/kuprel/min-dalle/raw/main/examples/artificial_intelligence.jpg\" alt=\"min-dalle\" width=\"200\"/\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuprel%2Fmin-dalle","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkuprel%2Fmin-dalle","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkuprel%2Fmin-dalle/lists"}