{"id":21858562,"url":"https://github.com/nerdyrodent/clip-guided-diffusion","last_synced_at":"2025-04-06T11:10:38.187Z","repository":{"id":38829462,"uuid":"407320196","full_name":"nerdyrodent/CLIP-Guided-Diffusion","owner":"nerdyrodent","description":"Just playing with getting CLIP Guided Diffusion running locally, rather than having to use colab. ","archived":false,"fork":false,"pushed_at":"2022-08-29T10:21:11.000Z","size":2187,"stargazers_count":386,"open_issues_count":6,"forks_count":49,"subscribers_count":11,"default_branch":"main","last_synced_at":"2025-03-30T10:07:26.704Z","etag":null,"topics":["openai-clip","text-to-image","text2image"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nerdyrodent.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-16T21:30:13.000Z","updated_at":"2025-03-25T07:03:42.000Z","dependencies_parsed_at":"2022-07-12T17:38:53.318Z","dependency_job_id":null,"html_url":"https://github.com/nerdyrodent/CLIP-Guided-Diffusion","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nerdyrodent%2FCLIP-Guided-Diffusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nerdyrodent%2FCLIP-Guided-Diffusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nerdyrodent%2FCLIP-Guided-Diffusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nerdyrodent%2FCLIP-Guided-Diffusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nerdyrodent","download_url":"https://codeload.github.com/nerdyrodent/CLIP-Guided-Diffusion/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247471521,"owners_count":20944158,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["openai-clip","text-to-image","text2image"],"created_at":"2024-11-28T02:46:23.653Z","updated_at":"2025-04-06T11:10:38.170Z","avatar_url":"https://github.com/nerdyrodent.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CLIP-Guided-Diffusion\nJust playing with getting CLIP Guided Diffusion running locally, rather than having to use colab. \n\nOriginal colab notebooks by Katherine Crowson (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings):\n\n* Original 256x256 notebook: [![Open In Colab][colab-badge]][colab-notebook1]\n\n[colab-notebook1]: \u003chttps://colab.research.google.com/drive/12a_Wrfi2_gwwAuN3VvMTwVMz9TfqctNj#scrollTo=X5gODNAMEUCR\u003e\n[colab-badge]: \u003chttps://colab.research.google.com/assets/colab-badge.svg\u003e\n\nIt uses OpenAI's 256x256 unconditional ImageNet diffusion model (https://github.com/openai/guided-diffusion)\n\n* Original 512x512 notebook: [![Open In Colab][colab-badge]][colab-notebook2]\n\n[colab-notebook2]: \u003chttps://colab.research.google.com/drive/1QBsaDAZv8np29FPbvjffbE1eytoJcsgA#scrollTo=VnQjGugaDZPJ\u003e\n[colab-badge]: \u003chttps://colab.research.google.com/assets/colab-badge.svg\u003e\n\nIt uses a 512x512 unconditional ImageNet diffusion model fine-tuned from OpenAI's 512x512 class-conditional ImageNet diffusion model (https://github.com/openai/guided-diffusion)\n\nTogether with CLIP (https://github.com/openai/CLIP), they connect text prompts with images.\n\nEither the 256 or 512 model can be used here (by setting `--output_size` to either 256 or 512)\n\nSome example images:\n\n\"A woman standing in a park\":\n\n\u003cimg src=\"./Samples/woman_collage.jpg\" width=\"640px\"\u003e\n\n\"An alien landscape\":\n\n\u003cimg src=\"./Samples/alien_collage.jpg\" width=\"640px\"\u003e\n\n\"A painting of a man\":\n\n\u003cimg src=\"./Samples/man_collage.jpg\" width=\"640px\"\u003e\n\n*images enhanced with [Real-ESRGAN](https://github.com/xinntao/Real-ESRGAN)\n\nYou may also be interested in [VQGAN-CLIP](https://github.com/nerdyrodent/VQGAN-CLIP)\n\n## Environment\n* Ubuntu 20.04 (Windows untested but should work)\n* Anaconda\n* Nvidia RTX 3090\n\nTypical VRAM requirments:\n* 256 defaults: 10 GB\n* 512 defaults: 18 GB\n\n## Set up\n\nThis example uses [Anaconda](https://www.anaconda.com/products/individual#Downloads) to manage virtual Python environments.\n\nCreate a new virtual Python environment for CLIP-Guided-Diffusion:\n```sh\nconda create --name cgd python=3.9\nconda activate cgd\n```\n\nDownload and change directory:\n```sh\ngit clone https://github.com/nerdyrodent/CLIP-Guided-Diffusion.git\ncd CLIP-Guided-Diffusion\n```\n\nRun the setup file:\n```sh\n./setup.sh\n```\n\nOr if you want to run the commands manually:\n```sh\n# Install dependencies\n\npip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html\ngit clone https://github.com/openai/CLIP\ngit clone https://github.com/crowsonkb/guided-diffusion\npip install -e ./CLIP\npip install -e ./guided-diffusion\npip install lpips matplotlib\n\n# Download the diffusion models\n\ncurl -OL 'https://the-eye.eu/public/AI/models/512x512_diffusion_unconditional_ImageNet/512x512_diffusion_uncond_finetune_008100.pt'\ncurl -OL 'https://openaipublic.blob.core.windows.net/diffusion/jul-2021/256x256_diffusion_uncond.pt'\n```\n## Run\n\nThe simplest way to run is just to pass in your text prompt. For example:\n\n```sh\npython generate_diffuse.py -p \"A painting of an apple\"\n```\n\u003cimg src=\"./Samples/a_painting_of_an_apple.png\" width=\"256px\"\u003e\u003c/img\u003e\n\n### Multiple prompts\n\nText and image prompts can be split using the pipe symbol in order to allow multiple prompts. You can also use a colon followed by a number to set a weight for that prompt. For example:\n\n```sh\npython generate_diffuse.py -p \"A painting of an apple:1.5|a surreal painting of a weird apple:0.5\"\n```\n\u003cimg src=\"./Samples/weird_apple.png\" width=\"256px\"\u003e\u003c/img\u003e\n\n### Other options\n\nThere are a variety of other options to play with. Use help to display them:\n```sh\npython generate_diffuse.py -h\n```\n\n```sh\nusage: generate_diffuse.py [-h] [-p PROMPTS] [-ip IMAGE_PROMPTS] [-ii INIT_IMAGE]\n[-st SKIP_TIMESTEPS] [-is INIT_SCALE] [-m CLIP_MODEL] [-t TIMESTEPS]\n[-ds DIFFUSION_STEPS] [-se SAVE_EVERY] [-bs BATCH_SIZE] [-nb N_BATCHES] [-cuts CUTN]\n[-cutb CUTN_BATCHES] [-cutp CUT_POW] [-cgs CLIP_GUIDANCE_SCALE]\n[-tvs TV_SCALE] [-rgs RANGE_SCALE] [-os IMAGE_SIZE] [-s SEED] [-o OUTPUT] [-nfp] [-pl]\n```\n\n### init_image \n* 'skip_timesteps' needs to be between approx. 200 and 500 when using an init image.\n* 'init_scale' enhances the effect of the init image, a good value is 1000.\n\n### Timesteps\nThe number of timesteps (or the number from one of ddim25, ddim50, ddim150, ddim250, ddim500, ddim1000) must divide exactly into diffusion_steps. \n\n### image guidance\n* 'clip_guidance_scale' Controls how much the image should look like the prompt.\n* 'tv_scale' Controls the smoothness of the final output.\n* 'range_scale' Controls how far out of range RGB values are allowed to be.\n\nExamples using a number of options:\n```sh\npython generate_diffuse.py -p \"An amazing fractal\" -os=256 -cgs=1000 -tvs=50 -rgs=50 -cuts=16 -cutb=4 -t=200 -se=200 -m=ViT-B/32 -o=my_fractal.png\n```\n\u003cimg src=\"./Samples/my_fractal.png\" width=\"256px\"\u003e\u003c/img\u003e\n\n```sh\npython generate_diffuse.py -p \"An impressionist painting of a cat:1.75|trending on artstation:0.25\" -cgs=500 -tvs=55 -rgs=50 -cuts=16 -cutb=2 -t=100 -ds=2000 -m=ViT-B/32 -pl -o=cat_100.png\n```\n\u003cimg src=\"./Samples/cat_100.png\" width=\"256px\"\u003e\n\n(Funny looking cat, but hey!)\n\n## Videos\n\nUsing the ```-vid``` option saves the diffusion steps and makes a video. The steps can also be upscaled if you have the portable version of https://github.com/xinntao/Real-ESRGAN installed locally, and opt to do so.\n\n## Other repos\n\nYou may also be interested in https://github.com/afiaka87/clip-guided-diffusion\n\nFor upscaling images, try https://github.com/xinntao/Real-ESRGAN\n\n## Citations\n\n```bibtex\n@misc{unpublished2021clip,\n    title  = {CLIP: Connecting Text and Images},\n    author = {Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal},\n    year   = {2021}\n}\n```\n* Guided Diffusion - https://github.com/openai/guided-diffusion\n* Katherine Crowson - \u003chttps://github.com/crowsonkb\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnerdyrodent%2Fclip-guided-diffusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnerdyrodent%2Fclip-guided-diffusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnerdyrodent%2Fclip-guided-diffusion/lists"}