{"id":13487700,"url":"https://github.com/tellurion-kanata/colorizeDiffusion","last_synced_at":"2025-03-27T23:31:30.665Z","repository":{"id":215371922,"uuid":"589702332","full_name":"tellurion-kanata/colorizeDiffusion","owner":"tellurion-kanata","description":"Implementation of ColorizeDiffusion","archived":false,"fork":false,"pushed_at":"2025-03-23T02:39:33.000Z","size":13940,"stargazers_count":54,"open_issues_count":0,"forks_count":3,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-23T03:25:25.929Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tellurion-kanata.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-01-16T18:29:02.000Z","updated_at":"2025-03-23T02:39:36.000Z","dependencies_parsed_at":null,"dependency_job_id":"bfb9edf3-b2cc-4212-b5d7-7b64676847d2","html_url":"https://github.com/tellurion-kanata/colorizeDiffusion","commit_stats":null,"previous_names":["ydk-tellurion/colorizediffusion","tellurion-kanata/colorizediffusion"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tellurion-kanata%2FcolorizeDiffusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tellurion-kanata%2FcolorizeDiffusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tellurion-kanata%2FcolorizeDiffusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tellurion-kanata%2FcolorizeDiffusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tellurion-kanata","download_url":"https://codeload.github.com/tellurion-kanata/colorizeDiffusion/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245944019,"owners_count":20697944,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T18:01:02.347Z","updated_at":"2025-03-27T23:31:30.654Z","avatar_url":"https://github.com/tellurion-kanata.png","language":"Python","funding_links":[],"categories":["Colorization"],"sub_categories":[],"readme":"# ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text\r\n\r\n![img](assets/teaser.png)\r\n\r\n(March. 2025)\r\nFundemental issue for this repository: [ColorizeDiffusion (e-print)](https://arxiv.org/abs/2401.01456).  \r\nVersion 1 - trained with 512px (WACV 2025): [ColorizeDiffusion](https://openaccess.thecvf.com/content/WACV2025/html/Yan_ColorizeDiffusion_Improving_Reference-Based_Sketch_Colorization_with_Latent_Diffusion_Model_WACV_2025_paper.html) Basic reference-based training. Released.  \r\nVersion 1.5 - trained with 512px (CVPR 2025): [ColorizeDiffusion 1.5 (e-preprint)](https://arxiv.org/html/2502.19937v1) Solving spatial entangelment. Released.  \r\nVersion 2 - trained with 768px, paper and code: Enhancing background and style transfer. Available soon.  \r\nVersion XL - trained with 1024px : Enhancing embedding guidance for character colorization, geometry disentanglement. Ongoing.  \r\n\r\nModel weights are available: https://huggingface.co/tellurion/colorizer.\r\n\r\n## Implementation Details\r\nThe repository offers the implementation of ColorizeDiffusion.  \r\nNow, only the noisy model introduced in the paper, which utilizes the local tokens.\r\n\r\n## Getting Start\r\nTo utilize the code in this repository, ensure that you have installed the required dependencies as specified in the requirements.\r\n\r\n### To install and run:\r\n```shell\r\nconda env create -f environment.yaml\r\nconda activate hf\r\n```\r\n\r\n## User Interface:\r\nWe also provided a Web UI based on Gradio UI. To run it, just:\r\n```shell\r\npython -u app.py\r\n```\r\nThen you can browse the UI in http://localhost:7860/.\r\n\r\n### Inference:\r\n-------------------------------------------------------------------------------------------\r\n#### Important inference options:\r\n| Options                   | Description                                                                       |\r\n|:--------------------------|:----------------------------------------------------------------------------------|\r\n| Mask guide mode           | Activate mask guided attention and corresponding lora weights for colorization.   | \r\n| Crossattn scale           | Used to diminish all kinds of artifacts caused by the distribution problem.       |\r\n| Pad reference with margin | Used to diminish spatial entanglement, pad reference to T times of current width. |\r\n| Reference guidance scale  | Classifier-free guidance scale of the reference image, suggested 5.               |\r\n| Sketch guidance scale     | Classifier-free guidance scale of the sketch image, suggested 1.                  |\r\n| Attention injection       | Strengthen similarity with reference.                                             |\r\n| Visualize                 | Used for local manipulation. Visualize the regions selected by each threshold.    |\r\n\r\nFor artifacts like spatial entanglement (the distribution problem discussed in the paper) like this\r\n![img](assets/entanglement.png)  \r\nPlease activate background enhance (optionally with foreground enhance).\r\n\r\n### Manipulation:\r\nThe colorization results can be manipulated using text prompts.\r\n\r\nFor local manipulations, a visualization is provided to show the correlation between each prompt and tokens in the reference image.\r\n\r\n\r\nThe manipulation result and correlation visualization of the settings:\r\n    \r\n    Target prompt: the girl's blonde hair\r\n    Anchor prompt the girl's brown hair\r\n    Control prompt the girl's brown hair, \r\n    Target scale: 8\r\n    Enhanced: false\r\n    Thresholds: 0.5、0.55、0.65、0.95\r\n\r\n![img](assets/preview1.png)\r\n![img](assets/preview2.png)\r\nAs you can see, the manipluation unavoidably changed some unrelated regions as it is taken on the reference embeddings.\r\n\r\n#### Manipulation options:\r\n| Options                   | Description                                                                                                                                                                                                       |\r\n| :-----                    |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\r\n| Group index               | The index of selected manipulation sequences's parameter group.                                                                                                                                                   |\r\n| Target prompt             | The prompt used to specify the desired visual attribute for the image after manipulation.                                                                                                                         |\r\n| Anchor prompt             | The prompt to specify the anchored visaul attribute for the image before manipulation.                                                                                                                            |\r\n| Control prompt            | Used for local manipulation (crossattn-based models). The prompt to specify the target regions.                                                                                                                   |\r\n| Enhance                   | Specify whether this manipulation should be enhanced or not. (More likely to influence unrelated attribute).                                                                                                      |\r\n| Target scale              | The scale used to progressively control the manipulation.                                                                                                                                                         |\r\n| Thresholds                | Used for local manipulation (crossattn-based models). Four hyperparameters used to reduce the influnece on irrelevant visual attributes, where 0.0 \u003c threshold 0 \u003c threshold 1 \u003c threshold 2 \u003c threshold 3 \u003c 1.0. |\r\n| \\\u003cThreshold0 \t\t\t\t| Select regions most related to control prompt. Indicated by deep blue.                                                                                                                                            |\r\n| Threshold0-Threshold1     | Select regions related to control prompt. Indicated by blue.                                                                                                                                                      |\r\n| Threshold1-Threshold2\t\t| Select neighbouring but unrelated regions. Indicated by green.                                                                                                                                                    |\r\n| Threshold2-Threshold3\t\t| Select unrelated regions. Indicated by orange.                                                                                                                                                                    |\r\n| \\\u003eThreshold3\t\t\t\t| Select most unrelated regions. Indicated by brown.                                                                                                                                                                |\r\n|Add| Click add to save current manipulation in the sequence.        |  \r\n\r\n## Code reference\r\n1. [Stable Diffusion v2](https://github.com/Stability-AI/stablediffusion)\r\n2. [Stable Diffusion XL](https://github.com/Stability-AI/generative-models)\r\n3. [SD-webui-ControlNet](https://github.com/Mikubill/sd-webui-controlnet)\r\n4. [Stable-Diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui)\r\n5. [K-diffusion](https://github.com/crowsonkb/k-diffusion)\r\n6. [Deepspeed](https://github.com/microsoft/DeepSpeed)\r\n7. [sketchKeras-PyTorch](https://github.com/higumax/sketchKeras-pytorch)\r\n\r\n## Citation\r\n```\r\n@article{2024arXiv240101456Y,\r\n       author = {{Yan}, Dingkun and {Yuan}, Liang and {Wu}, Erwin and {Nishioka}, Yuma and {Fujishiro}, Issei and {Saito}, Suguru},\r\n        title = \"{ColorizeDiffusion: Adjustable Sketch Colorization with Reference Image and Text}\",\r\n      journal = {arXiv e-prints},\r\n         year = {2024},\r\n          doi = {10.48550/arXiv.2401.01456},\r\n}\r\n\r\n@InProceedings{Yan_2025_WACV,\r\n    author    = {Yan, Dingkun and Yuan, Liang and Wu, Erwin and Nishioka, Yuma and Fujishiro, Issei and Saito, Suguru},\r\n    title     = {ColorizeDiffusion: Improving Reference-Based Sketch Colorization with Latent Diffusion Model},\r\n    booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},\r\n    year      = {2025},\r\n    pages     = {5092-5102}\r\n}\r\n\r\n@article{2025arXiv250219937Y,\r\n    author = {{Yan}, Dingkun and {Wang}, Xinrui and {Li}, Zhuoru and {Saito}, Suguru and {Iwasawa}, Yusuke and {Matsuo}, Yutaka and {Guo}, Jiaxian},\r\n    title = \"{Image Referenced Sketch Colorization Based on Animation Creation Workflow}\",\r\n    journal = {arXiv e-prints},\r\n    year = {2025},\r\n    doi = {10.48550/arXiv.2502.19937},\r\n}\r\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftellurion-kanata%2FcolorizeDiffusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftellurion-kanata%2FcolorizeDiffusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftellurion-kanata%2FcolorizeDiffusion/lists"}