{"id":13777993,"url":"https://github.com/afiaka87/clip-guided-diffusion","last_synced_at":"2025-08-05T22:48:01.257Z","repository":{"id":37759468,"uuid":"387964669","full_name":"afiaka87/clip-guided-diffusion","owner":"afiaka87","description":"A CLI tool/python module for generating images from text using guided diffusion and CLIP from OpenAI.","archived":false,"fork":false,"pushed_at":"2022-02-08T06:13:56.000Z","size":53685,"stargazers_count":452,"open_issues_count":9,"forks_count":62,"subscribers_count":12,"default_branch":"main","last_synced_at":"2024-08-03T18:12:12.692Z","etag":null,"topics":["artificial-intelligence","deep-learning","diffusion","image-generation","multimodal","multimodality","openai","openai-clip","text-to-image","text-to-image-synthesis"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/afiaka87.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-07-21T02:04:09.000Z","updated_at":"2024-07-30T08:56:31.000Z","dependencies_parsed_at":"2022-07-19T23:32:05.927Z","dependency_job_id":null,"html_url":"https://github.com/afiaka87/clip-guided-diffusion","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/afiaka87%2Fclip-guided-diffusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/afiaka87%2Fclip-guided-diffusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/afiaka87%2Fclip-guided-diffusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/afiaka87%2Fclip-guided-diffusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/afiaka87","download_url":"https://codeload.github.com/afiaka87/clip-guided-diffusion/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":225043273,"owners_count":17411955,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","deep-learning","diffusion","image-generation","multimodal","multimodality","openai","openai-clip","text-to-image","text-to-image-synthesis"],"created_at":"2024-08-03T18:00:50.341Z","updated_at":"2024-11-17T13:31:20.315Z","avatar_url":"https://github.com/afiaka87.png","language":"Python","funding_links":[],"categories":["Applications"],"sub_categories":["GAN"],"readme":"# CLIP Guided Diffusion\nFrom [@crowsonkb](https://github.com/crowsonkb).\n\n\u003cp\u003e\n\u003ca href=\"https://replicate.ai/afiaka87/clip-guided-diffusion\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/static/v1?label=run\u0026message=clip-guided-diffusion\u0026color=blue\"\u003e\u003c/a\u003e    \u003ca href=\"https://replicate.ai/afiaka87/pyglide\" target=\"_blank\"\u003e\u003cimg src=\"https://img.shields.io/static/v1?label=run\u0026message=pyglide\u0026color=green\"\u003e\u003c/a\u003e\n\u003c/p\u003e\n\nDisclaimer: I'm redirecting efforts to [pyglide](https://replicate.ai/afiaka87/pyglide) and may be slow to address bugs here.\n\nI also recommend looking at [@crowsonkb's](https://github.com/crowsonkb) [v-diffusion-pytorch](https://github.com/crowsonkb/v-diffusion-pytorch).\n\nSee captions and more generations in the [Gallery](/images/README.md).\n\n## Install\n\n```sh\ngit clone https://github.com/afiaka87/clip-guided-diffusion.git\ncd clip-guided-diffusion\ngit clone https://github.com/crowsonkb/guided-diffusion.git\npip3 install -e guided-diffusion\npython3 setup.py install\n```\n\n## Run\n\n`cgd -txt \"Alien friend by Odilon Redo\"`\n\nA gif of the full run will be saved to `./outputs/caption_{j}.gif` by default.\n\n![Alien friend by Oidlon Redo](images/alien_friend_by_Odilon_Redo_00.gif)\n\n- `./outputs` will contain all intermediate outputs\n- `current.png` will contain the current generation.\n- (optional) Provide **`--wandb_project \u003cproject_name\u003e`** to enable logging intermediate outputs to wandb. Requires free account. URL to run will be provided in CLI - [example run](https://wandb.ai/dalle-pytorch-replicate/red_ball_cgd/reports/CLIP-Guided-Diffusion--VmlldzoxMDc1MjMz)\n- `~/.cache/clip-guided-diffusion/` will contain downloaded checkpoints from OpenAI/Katherine Crowson.\n\n## Usage - CLI\n\n### Text to image generation\n\n`--prompts` / `-txts`\n`--image_size` / `-size`\n\n`cgd --image_size 256 --prompts \"32K HUHD Mushroom\"`\n\n![32K HUHD Mushroom](/images/32K_HUHD_Mushroom.png?raw=true)\n\n### Text to image generation (multiple prompts with weights)\n\n- multiple prompts can be specified with the `|` character.\n- you may optionally specify a weight for each prompt using a `:` character.\n- e.g. `cgd --prompts \"Noun to visualize:1.0|style:0.1|location:0.1|something you dont want:-0.1\"`\n- weights must not sum to 0\n\n`cgd -txt \"32K HUHD Mushroom|Green grass:-0.1\"`\n\n\u003cimg src=\"images/32K_HUHD_Mushroom_MIN_green_grass.png\"\u003e\u003c/img\u003e\n\n### CPU\n\n- **Using a CPU will take a very long time** compared to using a GPU.\n\n`cgd --device cpu --prompt \"Some text to be generated\"`\n\n### CUDA GPU\n\n`cgd --prompt \"Theres no need to specify a device, it will be chosen automatically\"`\n\n### Iterations/Steps (Timestep Respacing)\n\n`--timestep_respacing` or `-respace` (default: `1000`)\n\n- Uses fewer timesteps over the same diffusion schedule. Sacrifices accuracy/alignment for quicker runtime.\n- options: - `25`, `50`, `150`, `250`, `500`, `1000`, `ddim25`,`ddim50`,`ddim150`, `ddim250`,`ddim500`,`ddim1000`\n- (default: `1000`)\n- prepending a number with `ddim` will use the ddim scheduler. e.g. `ddim25` will use the 25 timstep ddim scheduler. This method may be better at shorter timestep_respacing values.\n\n### Existing image\n\n#### `--init_image`/`-init`\n\n- Blend an image with the diffusion for a number of steps.\n\n#### `--skip_timesteps`/`-skip`\n\nThe number of timesteps to spend blending the image with the guided-diffusion samples.\nMust be less than `--timestep_respacing` and greater than 0.\nGood values using timestep_respacing of 1000 are 250 to 500.\n\n- `-respace 1000 -skip 500`\n- `-respace 500 -skip 250`\n- `-respace 250 -skip 125`\n- `-respace 125 -skip 75`\n\n#### (optional)`--init_scale`/`-is`\n\nTo enable a VGG perceptual loss after the blending, you must specify an `--init_scale` value. 1000 seems to work well.\n\n```sh\ncgd --prompts \"A mushroom in the style of Vincent Van Gogh\" \\\n  --timestep_respacing 1000 \\\n  --init_image \"images/32K_HUHD_Mushroom.png\" \\\n  --init_scale 1000 \\\n  --skip_timesteps 350\n```\n\n\u003cimg src=\"images/a_mushroom_in_the_style_of_vangogh.png?raw=true\" width=\"200\"\u003e\u003c/img\u003e\n\n### Image size\n\n- options: `64, 128, 256, 512 pixels (square)`\n- **Note about 64x64** when using the 64x64 checkpoint, the cosine noise scheduler is used. For unclear reasons, this noise scheduler requires different values for `--clip_guidance_scale` and `--tv_scale`. I recommend starting with `-cgs 5 -tvs 0.00001` and experimenting from around there.  **`--clip_guidance_scale` and `--tv_scale` will require experimentation.**\n- For all other checkpoints, clip_guidance_scale seems to work well around 1000-2000 and tv_scale at 0, 100, 150 or 200\n\n```sh\ncgd --init_image=images/32K_HUHD_Mushroom.png \\\n    --skip_timesteps=500 \\\n    --image_size 64 \\\n    --prompt \"8K HUHD Mushroom\"\n```\n\n\u003cimg src=\"images/32K_HUHD_Mushroom_64.png?raw=true\" width=\"200\"\u003e\u003c/img\u003e\n_resized to 200 pixels for visibility_\n\n```sh\ncgd --image_size 512 --prompt \"8K HUHD Mushroom\"\n```\n\n\u003cimg src=\"images/32K_HUHD_Mushroom_512.png?raw=true\"\u003e\u003c/img\u003e\n\n**New: Non-square Generations (experimental)**\nGenerate portrait or landscape images by specifying a number to offset the width and/or height.\n\n- offset should be a multiple of 16 for image sizes 64x64, 128x128\n- offset should be a multiple of 32 for image sizes 256x256, 512x512\n- may cause NaN/Inf errors.\n- a positive offset will require more memory.\n- a _negative_ offset uses less memory and is faster.\n\n```sh\nmy_caption=\"a photo of beautiful green hills and a sunset, taken with a blackberry in 2004\"\ncgd --prompts \"$my_caption\" \\\n    --image_size 128 \\\n    --width_offset 32 \n```\n\u003cimg src=\"images/green-hills.png\"\u003e\n\n## Full Usage - Python\n\n```python\n# Initialize diffusion generator\nfrom cgd import clip_guided_diffusion\nimport cgd_util\n\ncgd_generator = clip_guided_diffusion(\n    prompts=[\"an image of a fox in a forest\"],\n    image_prompts=[\"image_to_compare_with_clip.png\"],\n    batch_size=1,\n    clip_guidance_scale=1500,\n    sat_scale=0,\n    tv_scale=150,\n    init_scale=1000,\n    range_scale=50,\n    image_size=256,\n    class_cond=False,\n    randomize_class=False, # only works with class conditioned checkpoints\n    cutout_power=1.0,\n    num_cutouts=16,\n    timestep_respacing=\"1000\",\n    seed=0,\n    diffusion_steps=1000, # dont change this\n    skip_timesteps=400,\n    init_image=\"image_to_blend_and_compare_with_vgg.png\",\n    clip_model_name=\"ViT-B/16\",\n    dropout=0.0,\n    device=\"cuda\",\n    prefix_path=\"store_images/\",\n    wandb_project=None,\n    wandb_entity=None,\n    progress=True,\n)\nprefix_path.mkdir(exist_ok=True)\nlist(enumerate(tqdm(cgd_generator))) # iterate over generator\n```\n\n## Full Usage - CLI\n\n```sh\nusage: cgd [-h] [--prompts PROMPTS] [--image_prompts IMAGE_PROMPTS]\n           [--image_size IMAGE_SIZE] [--init_image INIT_IMAGE]\n           [--init_scale INIT_SCALE] [--skip_timesteps SKIP_TIMESTEPS]\n           [--prefix PREFIX] [--checkpoints_dir CHECKPOINTS_DIR]\n           [--batch_size BATCH_SIZE]\n           [--clip_guidance_scale CLIP_GUIDANCE_SCALE] [--tv_scale TV_SCALE]\n           [--range_scale RANGE_SCALE] [--sat_scale SAT_SCALE] [--seed SEED]\n           [--save_frequency SAVE_FREQUENCY]\n           [--diffusion_steps DIFFUSION_STEPS]\n           [--timestep_respacing TIMESTEP_RESPACING]\n           [--num_cutouts NUM_CUTOUTS] [--cutout_power CUTOUT_POWER]\n           [--clip_model CLIP_MODEL] [--uncond]\n           [--noise_schedule NOISE_SCHEDULE] [--dropout DROPOUT]\n           [--device DEVICE] [--wandb_project WANDB_PROJECT]\n           [--wandb_entity WANDB_ENTITY] [--height_offset HEIGHT_OFFSET]\n           [--width_offset WIDTH_OFFSET] [--use_augs] [--use_magnitude]\n           [--quiet]\n\noptional arguments:\n  -h, --help            show this help message and exit\n  --prompts PROMPTS, -txts PROMPTS\n                        the prompt/s to reward paired with weights. e.g. 'My\n                        text:0.5|Other text:-0.5' (default: )\n  --image_prompts IMAGE_PROMPTS, -imgs IMAGE_PROMPTS\n                        the image prompt/s to reward paired with weights. e.g.\n                        'img1.png:0.5,img2.png:-0.5' (default: )\n  --image_size IMAGE_SIZE, -size IMAGE_SIZE\n                        Diffusion image size. Must be one of [64, 128, 256,\n                        512]. (default: 128)\n  --init_image INIT_IMAGE, -init INIT_IMAGE\n                        Blend an image with diffusion for n steps (default: )\n  --init_scale INIT_SCALE, -is INIT_SCALE\n                        (optional) Perceptual loss scale for init image.\n                        (default: 0)\n  --skip_timesteps SKIP_TIMESTEPS, -skip SKIP_TIMESTEPS\n                        Number of timesteps to blend image for. CLIP guidance\n                        occurs after this. (default: 0)\n  --prefix PREFIX, -dir PREFIX\n                        output directory (default: outputs)\n  --checkpoints_dir CHECKPOINTS_DIR, -ckpts CHECKPOINTS_DIR\n                        Path subdirectory containing checkpoints. (default:\n                        /home/samsepiol/.cache/clip-guided-diffusion)\n  --batch_size BATCH_SIZE, -bs BATCH_SIZE\n                        the batch size (default: 1)\n  --clip_guidance_scale CLIP_GUIDANCE_SCALE, -cgs CLIP_GUIDANCE_SCALE\n                        Scale for CLIP spherical distance loss. Values will\n                        need tinkering for different settings. (default: 1000)\n  --tv_scale TV_SCALE, -tvs TV_SCALE\n                        Controls the smoothness of the final output. (default:\n                        150.0)\n  --range_scale RANGE_SCALE, -rs RANGE_SCALE\n                        Controls how far out of RGB range values may get.\n                        (default: 50.0)\n  --sat_scale SAT_SCALE, -sats SAT_SCALE\n                        Controls how much saturation is allowed. Used for\n                        ddim. From @nshepperd. (default: 0.0)\n  --seed SEED, -seed SEED\n                        Random number seed (default: 0)\n  --save_frequency SAVE_FREQUENCY, -freq SAVE_FREQUENCY\n                        Save frequency (default: 1)\n  --diffusion_steps DIFFUSION_STEPS, -steps DIFFUSION_STEPS\n                        Diffusion steps (default: 1000)\n  --timestep_respacing TIMESTEP_RESPACING, -respace TIMESTEP_RESPACING\n                        Timestep respacing (default: 1000)\n  --num_cutouts NUM_CUTOUTS, -cutn NUM_CUTOUTS\n                        Number of randomly cut patches to distort from\n                        diffusion. (default: 16)\n  --cutout_power CUTOUT_POWER, -cutpow CUTOUT_POWER\n                        Cutout size power (default: 1.0)\n  --clip_model CLIP_MODEL, -clip CLIP_MODEL\n                        clip model name. Should be one of: ('ViT-B/16',\n                        'ViT-B/32', 'RN50', 'RN101', 'RN50x4', 'RN50x16') or a\n                        checkpoint filename ending in `.pt` (default:\n                        ViT-B/32)\n  --uncond, -uncond     Use finetuned unconditional checkpoints from OpenAI\n                        (256px) and Katherine Crowson (512px) (default: False)\n  --noise_schedule NOISE_SCHEDULE, -sched NOISE_SCHEDULE\n                        Specify noise schedule. Either 'linear' or 'cosine'.\n                        (default: linear)\n  --dropout DROPOUT, -drop DROPOUT\n                        Amount of dropout to apply. (default: 0.0)\n  --device DEVICE, -dev DEVICE\n                        Device to use. Either cpu or cuda. (default: )\n  --wandb_project WANDB_PROJECT, -proj WANDB_PROJECT\n                        Name W\u0026B will use when saving results. e.g.\n                        `--wandb_project \"my_project\"` (default: None)\n  --wandb_entity WANDB_ENTITY, -ent WANDB_ENTITY\n                        (optional) Name of W\u0026B team/entity to log to.\n                        (default: None)\n  --height_offset HEIGHT_OFFSET, -ht HEIGHT_OFFSET\n                        Height offset for image (default: 0)\n  --width_offset WIDTH_OFFSET, -wd WIDTH_OFFSET\n                        Width offset for image (default: 0)\n  --use_augs, -augs     Uses augmentations from the `quick` clip guided\n                        diffusion notebook (default: False)\n  --use_magnitude, -mag\n                        Uses magnitude of the gradient (default: False)\n  --quiet, -q           Suppress output. (default: False)\n\n```\n## Development\n\n```sh\ngit clone https://github.com/afiaka87/clip-guided-diffusion.git\ncd clip-guided-diffusion\ngit clone https://github.com/afiaka87/guided-diffusion.git\npython3 -m venv cgd_venv\nsource cgd_venv/bin/activate\npip install -r requirements.txt\npip install -e guided-diffusion\n```\n\n### Run integration tests\n\n- Some tests require a GPU; you may ignore them if you dont have one.\n\n```sh\npython -m unittest discover\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fafiaka87%2Fclip-guided-diffusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fafiaka87%2Fclip-guided-diffusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fafiaka87%2Fclip-guided-diffusion/lists"}