{"id":17714286,"url":"https://github.com/fast-codi/CoDi","last_synced_at":"2025-03-13T22:32:20.579Z","repository":{"id":222751719,"uuid":"757459488","full_name":"fast-codi/CoDi","owner":"fast-codi","description":"[CVPR24] CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation","archived":false,"fork":false,"pushed_at":"2024-03-02T01:28:31.000Z","size":2373,"stargazers_count":35,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-03-02T02:30:56.809Z","etag":null,"topics":["cvpr2024"],"latest_commit_sha":null,"homepage":"https://fast-codi.github.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fast-codi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-02-14T14:40:19.000Z","updated_at":"2024-03-01T16:16:17.000Z","dependencies_parsed_at":"2024-02-16T02:31:42.703Z","dependency_job_id":"88395b30-d5ba-4cfd-b67c-450a49711bb6","html_url":"https://github.com/fast-codi/CoDi","commit_stats":null,"previous_names":["fast-codi/codi"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fast-codi%2FCoDi","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fast-codi%2FCoDi/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fast-codi%2FCoDi/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fast-codi%2FCoDi/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fast-codi","download_url":"https://codeload.github.com/fast-codi/CoDi/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243494508,"owners_count":20299824,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cvpr2024"],"created_at":"2024-10-25T11:02:20.415Z","updated_at":"2025-03-13T22:32:20.573Z","avatar_url":"https://github.com/fast-codi.png","language":"Python","funding_links":[],"categories":["Accelerate"],"sub_categories":[],"readme":"# \u003cimg src=\"https://www.gstatic.com/android/keyboard/emojikitchen/20201001/u1f430/u1f430_u1f422.png\" width=32px /\u003e CoDi: Conditional Diffusion Distillation (CVPR24)\n\n\u003e CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation \u003cbr\u003e\n\u003e Kangfu Mei \u003csup\u003e1, 2\u003c/sup\u003e, Mauricio Delbracio \u003csup\u003e2\u003c/sup\u003e, Hossein Talebi \u003csup\u003e2\u003c/sup\u003e, Zhengzhong Tu \u003csup\u003e2\u003c/sup\u003e, Vishal M. Patel \u003csup\u003e1\u003c/sup\u003e, Peyman Milanfar \u003csup\u003e2\u003c/sup\u003e \u003cbr\u003e\n\u003e \u003csup\u003e1\u003c/sup\u003eJohns Hopkins University \u003cbr\u003e\n\u003e \u003csup\u003e2\u003c/sup\u003eGoogle Research \u003cbr\u003e\n\n[\\[Paper\\]](https://arxiv.org/abs/2310.01407) [\\[Project Page\\]](https://fast-codi.github.io)\n\n\n*Disclaimer: This is not an official Google product. This repository contains an unofficial implementation. Please refer to the official implementation at https://github.com/google-research/google-research/tree/master/CoDi.*\n\n*Disclaimer: All models in this repository were trained using publicly available data.*\n\n## Introduction\n\nCoDi can efficiently distill the sampling steps of a conditional diffusion model\nfrom an unconditional one (e.g. StableDiffsusion), enabling rapid generation of\nhigh-quality images (i.e. 1-4 steps) under various conditional settings (e.g.\nInpainting, InstructPix2Pix, etc.).\n\n![teaser](figs/teaser.jpg)\n\nOn the standard real-world image super-resolution benchmark, we show that CoDi\nis capable of achieving 50 steps sampling performance in terms of FID and LPIPS\nwith 4 steps only. It largely outperforms previous guided-distillation and\nconsistency model. On the less challenge tasks such text-guided inpainting, we\nshow that a new parameter-efficient distillation first proposed by us can even\nbeat the original 50 steps sampling in the FID and LPIPS metrics.\n\n![performance](figs/performance.png)\n\n\n## News\n-   Feb-27-2024 We relase the canny-image-to-image checkpoint and demo with parameter-efficient CoDi. 🏁\n\n-   Feb-26-2024 CoDi is accepted by CVPR24 🏁\n\n-   Feb-22-2024 We relase the checkpoint and demo for parameter-efficient CoDi. 🏁\n\n-   Dec-02-2023 We relase the training script of CoDi. 🏁\n\n## Detail Contents\n\n1.  [Training CoDi on HuggingFace Data](#training-codi-on-huggingface-data)\n2.  [Training CoDi on Your Own Data](#training-codi-on-your-own-data)\n2.  [Testing CoDi on Canny Images](#testing-codi-on-canny-images)\n3.  [Citations](#citations)\n4.  [Acknowledgement](#acknowledgement)\n\n\u003e Note: The following instructions are modified from\n\u003e https://github.com/huggingface/community-events/blob/main/jax-controlnet-sprint/README.md\n\n\n## Training CoDi on HuggingFace Data\n\nAll you need to do is to update the `DATASET_NAME` from the HuggingFace hub to\ntrain on (could be your own, possibly private, dataset). A good choice is to\ncheck the datasets under\nhttps://huggingface.co/spaces/jax-diffusers-event/leaderboard.\n\n```bash\nexport HF_HOME=\"/data/kmei1/huggingface/\"\nexport DISK_DIR=\"/data/kmei1/huggingface/cache\"\nexport MODEL_DIR=\"stabilityai/stable-diffusion-2-1\"\nexport OUTPUT_DIR=\"canny_model\"\nexport DATASET_NAME=\"jax-diffusers-event/canny_diffusiondb\"\nexport NCCL_P2P_DISABLE=1\nexport CUDA_VISIBLE_DEVICES=5\n# export XLA_FLAGS=\"--xla_force_host_platform_device_count=4 --xla_dump_to=/tmp/foo\"\n\npython3 train_codi_flax.py \\\n --pretrained_model_name_or_path $MODEL_DIR \\\n --output_dir $OUTPUT_DIR \\\n --dataset_name $DATASET_NAME \\\n --load_from_disk \\\n --cache_dir $DISK_DIR \\\n --resolution 512 \\\n --learning_rate 8e-6 \\\n --train_batch_size 2 \\\n --gradient_accumulation_steps 2 \\\n --revision main \\\n --from_pt \\\n --mixed_precision bf16 \\\n --max_train_steps 200_000 \\\n --checkpointing_steps 10_000 \\\n --validation_steps 100 \\\n --dataloader_num_workers 8 \\\n --distill_learning_steps 20 \\\n --ema_decay 0.99995 \\\n --onestepode uncontrol \\\n --onestepode_control_params target \\\n --onestepode_sample_eps vprediction \\\n --cfg_aware_distill \\\n --distill_loss consistency_x \\\n --distill_type conditional \\\n --image_column original_image \\\n --caption_column prompt \\\n --conditioning_image transformed_image \\\n --report_to wandb \\\n --validation_image \"figs/control_bird_canny.png\" \\\n --validation_prompt \"birds\" \\\n```\n\nNote that you may need to change the `--image_column`, `--caption_column`, and\n`--conditioning_image` according to your selected dataset. For example, you need\nto add these options for the `jax-diffusers-event/canny_diffusiondb`\ndataset according to this https://huggingface.co/datasets/jax-diffusers-event/canny_diffusiondb.\n\n## Training CoDi on Your Own Data\n\n### Data preprocessing\n\nHere we demonstrate how to prepare a large dataset to train a ControlNet model\nthat generates images conditioned on an image representation that only has edge information (using canny edge detection)\n\nMore specifically, we use an example script defined in https://github.com/huggingface/community-events/blob/main/jax-controlnet-sprint/dataset_tools/coyo_1m_dataset_preprocess.py:\n\n-   Selects 1 million image-text pairs from an existing dataset COYO-700M.\n    Downloads each image and use Canny edge detector to generate the\n    conditioning image. Create a metafile that links all the images and\n    processed images to their text captions.\n\n-   Use the following command to run the example data preprocessing script. If\n    you've mounted a disk to your TPU, you should place your train_data_dir and\n    cache_dir on the mounted disk\n\n```bash\npython3 coyo_1m_dataset_preprocess.py \\\n --train_data_dir=\"/data/dataset\" \\\n --cache_dir=\"/data\" \\\n --max_train_samples=1000000 \\\n --num_proc=32\n```\n\nOnce the script finishes running, you can find a data folder at the specified\n`train_data_dir` with the below folder structure:\n\n```\ndata\n├── images\n│   ├── image_1.png\n│   ├── .......\n│   └── image_1000000.jpeg\n├── processed_images\n│   ├── image_1.png\n│   ├── .......\n│   └── image_1000000.jpeg\n└── meta.jsonl\n```\n\n### Training\n\nAll you need to do is to update the `DATASET_DIR` with the correct path to your\ndata folder.\n\nHere is an example to run a training script that will load the dataset from the\ndisk\n\n```bash\nexport HF_HOME=\"/data/huggingface/\"\nexport DISK_DIR=\"/data/huggingface/cache\"\nexport MODEL_DIR=\"runwayml/stable-diffusion-v1-5\"\nexport OUTPUT_DIR=\"/data/canny_model\"\nexport DATASET_DIR=\"/data/dataset\"\n\npython3 train_codi_flax.py \\\n --pretrained_model_name_or_path=$MODEL_DIR \\\n --output_dir=$OUTPUT_DIR \\\n --train_data_dir=$DATASET_DIR \\\n --load_from_disk \\\n --cache_dir=$DISK_DIR \\\n --resolution=512 \\\n --learning_rate=1e-5 \\\n --train_batch_size=2 \\\n --revision=\"non-ema\" \\\n --from_pt \\\n --max_train_steps=500000 \\\n --checkpointing_steps=10000 \\\n --dataloader_num_workers=16 \\\n --distill_learning_steps 50 \\\n --distill_timestep_scaling 10 \\\n --onestepode control \\\n --onestepode_control_params target \\\n --onestepode_sample_eps v_prediction \\\n --distill_loss consistency_x \\\n```\n\n## Testing CoDi on Canny Images\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\u003cimg src='figs/control_bird_canny.png' width=\"240px\" /\u003e\u003c/td\u003e\n    \u003ctd\u003e\u003cimg src='figs/codi_4step_canny.png' width=\"240px\" /\u003e\u003c/td\u003e \n  \u003c/tr\u003e\n  \u003ctr\u003e\n  \u003ctd\u003eCanny Image\u003c/td\u003e\n  \u003ctd\u003e\u003cb\u003eOurs w. 4-step sampling\u003c/b\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003ePrompt: birds\u003c/tr\u003e\n\u003c/table\u003e\n\nWe provide the pretrained *canny-edge-to-image* model according to the Controlnet experiments https://huggingface.co/lllyasviel/sd-controlnet-canny.\nNote that we are using the open-sourced data, i.e., jax-diffusers-event/canny_diffusiondb, and thus there are difference in the styles between ControlNet's result and ours.\n```bash\nexport HF_HOME=\"/data/kmei1/huggingface/\"\nexport DISK_DIR=\"/data/kmei1/huggingface/cache\"\nexport MODEL_DIR=\"stabilityai/stable-diffusion-2-1\"\nexport NCCL_P2P_DISABLE=1\nexport CUDA_VISIBLE_DEVICES=5\n\n# download pretrained checkpoint and relocate it.\nwget https://www.cis.jhu.edu/~kmei1/publics/codi/canny_99000.tar.fz \u0026\u0026 tar -xzvf canny_99000.tar.fz -C experiments\n\npython test_canny.py\n\n# or gradio user interface\npython gradio_canny_to_image.py\n```\nThe user interface looks like this 👇\n![demo](figs/gradio_demo.jpg)\n\n\n## Citations\n\nYou may want to cite:\n\n```\n@article{mei2023conditional,\n  title={CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation},\n  author={Mei, Kangfu and Delbracio, Mauricio and Talebi, Hossein and Tu, Zhengzhong and Patel, Vishal M and Milanfar, Peyman},\n  journal={arXiv preprint arXiv:2310.01407},\n  year={2023}\n}\n```\n\n## Acknowledgement\n\nThe codes are based on [Diffusers](https://github.com/huggingface/diffusers) and\n[HuggingFace](https://github.com/huggingface/community-events/tree/main/jax-controlnet-sprint).\nPlease also follow their licenses. Thanks for their awesome works.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffast-codi%2FCoDi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffast-codi%2FCoDi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffast-codi%2FCoDi/lists"}