{"id":22676929,"url":"https://github.com/nick8592/text-guided-image-colorization","last_synced_at":"2025-04-09T07:10:24.759Z","repository":{"id":255482878,"uuid":"849832087","full_name":"nick8592/text-guided-image-colorization","owner":"nick8592","description":"This repository provides an interactive image colorization tool that leverages Stable Diffusion (SDXL) and BLIP for user-controlled color generation. With a retrained model using the ControlNet approach, users can upload images and specify colors for different objects, enhancing the colorization process through a user-friendly Gradio interface.","archived":false,"fork":false,"pushed_at":"2024-11-23T15:29:04.000Z","size":6583,"stargazers_count":87,"open_issues_count":5,"forks_count":8,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-02T06:10:26.865Z","etag":null,"topics":["blip","controlnet","gradio","image-colorization","stable-diffusion"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nick8592.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-08-30T10:38:14.000Z","updated_at":"2025-03-30T02:42:53.000Z","dependencies_parsed_at":null,"dependency_job_id":"5b34ab3d-beb7-42ab-9345-486670f1563c","html_url":"https://github.com/nick8592/text-guided-image-colorization","commit_stats":null,"previous_names":["nick8592/text-guided-image-colorization"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nick8592%2Ftext-guided-image-colorization","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nick8592%2Ftext-guided-image-colorization/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nick8592%2Ftext-guided-image-colorization/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nick8592%2Ftext-guided-image-colorization/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nick8592","download_url":"https://codeload.github.com/nick8592/text-guided-image-colorization/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247994122,"owners_count":21030050,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["blip","controlnet","gradio","image-colorization","stable-diffusion"],"created_at":"2024-12-09T17:59:03.853Z","updated_at":"2025-04-09T07:10:24.730Z","avatar_url":"https://github.com/nick8592.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Text-Guided-Image-Colorization\n\nThis project utilizes the power of **Stable Diffusion (SDXL/SDXL-Light)** and the **BLIP (Bootstrapping Language-Image Pre-training)** captioning model to provide an interactive image colorization experience. Users can influence the generated colors of objects within images, making the colorization process more personalized and creative.\n\n![framework.jpg](images/framework.jpg)\n\n## Table of Contents\n - [Features](#features)\n - [Installation](#installation)\n - [Quick Start](#quick-start)\n - [Dataset Usage](#dataset-usage)\n - [Training](#training)\n - [Evaluation](#evaluation)\n - [Results](#results)\n - [License](#license)\n\n## News  \n- **(2024/11/23)** The project is now available on [Hugging Face Spaces](https://huggingface.co/spaces/fffiloni/text-guided-image-colorization) 🎉 Big thanks to @fffiloni!\n\n  \n## Features\n\n- **Interactive Colorization**: Users can specify desired colors for different objects in the image.\n- **ControlNet Approach**: Enhanced colorization capabilities through retraining with ControlNet, allowing SDXL to better adapt to the image colorization task.\n- **High-Quality Outputs**: Leverage the latest advancements in diffusion models to generate vibrant and realistic colorizations.\n\n## Installation\n\nTo set up the project locally, follow these steps:\n\n1. **Clone the Repository**:\n\n   ```bash\n   git clone https://github.com/nick8592/text-guided-image-colorization.git\n   cd text-guided-image-colorization\n   ```\n\n2. **Install Dependencies**:\n   Make sure you have Python 3.7 or higher installed. Then, install the required packages:\n\n   ```bash\n   pip install -r requirements.txt\n   ```\n   Install `torch` and `torchvision` matching your CUDA version:\n   ```bash\n   pip install torch torchvision --index-url https://download.pytorch.org/whl/cuXXX\n   ```\n   Replace `XXX` with your CUDA version (e.g., `118` for CUDA 11.8). For more info, see [PyTorch Get Started](https://pytorch.org/get-started/locally/).   \n\n\n3. **Download Pre-trained Models**:\n   | Models | Hugging Face |\n   |:---:|:---:|\n   |SDXL-Lightning Caption|[link](https://huggingface.co/nickpai/sdxl_light_caption_output)|\n   |SDXL-Lightning Custom Caption (Recommand)|[link](https://huggingface.co/nickpai/sdxl_light_custom_caption_output)|\n\n\n   ```bash\n   text-guided-image-colorization/sdxl_light_caption_output\n   └── checkpoint-30000\n       ├── controlnet\n       │   ├── diffusion_pytorch_model.safetensors\n       │   └── config.json\n       ├── optimizer.bin\n       ├── random_states_0.pkl\n       ├── scaler.pt\n       └── scheduler.bin\n   ```\n\n## Quick Start\n\n1. Run the `gradio_ui.py` script:\n\n```bash\npython gradio_ui.py\n```\n\n2. Open the provided URL in your web browser to access the Gradio-based user interface.\n\n3. Upload an image and use the interface to control the colors of specific objects in the image. But still the model can generate images without a specific prompt.\n\n4. The model will generate a colorized version of the image based on your input (or automatic). See the [demo video](https://x.com/weichenpai/status/1829513077588631987).\n![Gradio UI](images/gradio_ui.png)\n\n\n## Dataset Usage\n\nYou can find more details about the dataset usage in the [Dataset-for-Image-Colorization](https://github.com/nick8592/Dataset-for-Image-Colorization).\n\n## Training\n\nFor training, you can use one of the following scripts:\n\n- `train_controlnet.sh`: Trains a model using [Stable Diffusion v2](https://huggingface.co/stabilityai/stable-diffusion-2-1)\n- `train_controlnet_sdxl.sh`: Trains a model using [SDXL](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)\n- `train_controlnet_sdxl_light.sh`: Trains a model using [SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning)\n\nAlthough the training code for SDXL is provided, due to a lack of GPU resources, I wasn't able to train the model by myself. Therefore, there might be some errors when you try to train the model.\n\n## Evaluation\n\nFor evaluation, you can use one of the following scripts:\n\n- `eval_controlnet.sh`: Evaluates the model using [Stable Diffusion v2](https://huggingface.co/stabilityai/stable-diffusion-2-1) for a folder of images.\n- `eval_controlnet_sdxl_light.sh`: Evaluates the model using [SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning) for a folder of images.\n- `eval_controlnet_sdxl_light_single.sh`: Evaluates the model using [SDXL-Lightning](https://huggingface.co/ByteDance/SDXL-Lightning) for a single image.\n\n## Results\n### Prompt-Guided\n| Caption | Condition 1 | Condition 2 | Condition 3 |\n|:---:|:---:|:---:|:---:|\n| ![000000022935_gray.jpg](images/000000022935_gray.jpg) | ![000000022935_green_shirt_on_right_girl.jpeg](images/000000022935_green_shirt_on_right_girl.jpeg) | ![000000022935_purple_shirt_on_right_girl.jpeg](images/000000022935_purple_shirt_on_right_girl.jpeg) |![000000022935_red_shirt_on_right_girl.jpeg](images/000000022935_red_shirt_on_right_girl.jpeg) |\n| a photography of a woman in a soccer uniform kicking a soccer ball | + \"green shirt\"| + \"purple shirt\" | + \"red shirt\" |\n| ![000000041633_gray.jpg](images/000000041633_gray.jpg) | ![000000041633_bright_red_car.jpeg](images/000000041633_bright_red_car.jpeg) | ![000000041633_dark_blue_car.jpeg](images/000000041633_dark_blue_car.jpeg) |![000000041633_black_car.jpeg](images/000000041633_black_car.jpeg) |\n| a photography of a photo of a truck | + \"bright red car\"| + \"dark blue car\" | + \"black car\" |\n| ![000000286708_gray.jpg](images/000000286708_gray.jpg) | ![000000286708_orange_hat.jpeg](images/000000286708_orange_hat.jpeg) | ![000000286708_pink_hat.jpeg](images/000000286708_pink_hat.jpeg) |![000000286708_yellow_hat.jpeg](images/000000286708_yellow_hat.jpeg) |\n| a photography of a cat wearing a hat on his head | + \"orange hat\"| + \"pink hat\" | + \"yellow hat\" |\n\n### Prompt-Free\nGround truth images are provided solely for reference purpose in the image colorization task.\n| Grayscale Image | Colorized Result | Ground Truth |\n|:---:|:---:|:---:|\n| ![000000025560_gray.jpg](images/000000025560_gray.jpg) | ![000000025560_color.jpg](images/000000025560_color.jpg) | ![000000025560_gt.jpg](images/000000025560_gt.jpg) |\n| ![000000065736_gray.jpg](images/000000065736_gray.jpg) | ![000000065736_color.jpg](images/000000065736_color.jpg) | ![000000065736_gt.jpg](images/000000065736_gt.jpg) |\n| ![000000091779_gray.jpg](images/000000091779_gray.jpg) | ![000000091779_color.jpg](images/000000091779_color.jpg) | ![000000091779_gt.jpg](images/000000091779_gt.jpg) |\n| ![000000092177_gray.jpg](images/000000092177_gray.jpg) | ![000000092177_color.jpg](images/000000092177_color.jpg) | ![000000092177_gt.jpg](images/000000092177_gt.jpg) |\n| ![000000166426_gray.jpg](images/000000166426_gray.jpg) | ![000000166426_color.jpg](images/000000166426_color.jpg) | ![000000025560_gt.jpg](images/000000166426_gt.jpg) |\n\n## Read More  \n\nHere are some related articles you might find interesting:  \n\n- [Image Colorization: Bringing Black and White to Life](https://medium.com/generative-ai/image-colorization-bringing-black-and-white-to-life-b14d3e0db763)  \n- [Understanding RGB, YCbCr, and Lab Color Spaces](https://medium.com/@weichenpai/understanding-rgb-ycbcr-and-lab-color-spaces-f9c4a5fe485a)  \n- [Comparison Between CLIP and BLIP Models](https://medium.com/generative-ai/comparison-between-clip-and-blip-models-42f8a6ff4b1e)  \n- [A Step-by-Step Guide to Interactive Machine Learning with Gradio](https://medium.com/generative-ai/a-step-by-step-guide-to-interactive-machine-learning-with-gradio-3fde7541da52)  \n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for more details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnick8592%2Ftext-guided-image-colorization","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnick8592%2Ftext-guided-image-colorization","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnick8592%2Ftext-guided-image-colorization/lists"}