https://github.com/haoming02/sd-webui-diffusion-cg
An Extension for Automatic1111 Webui that performs color grading based on the latent tensor value range
https://github.com/haoming02/sd-webui-diffusion-cg
stable-diffusion-webui stable-diffusion-webui-plugin
Last synced: 6 months ago
JSON representation
An Extension for Automatic1111 Webui that performs color grading based on the latent tensor value range
- Host: GitHub
- URL: https://github.com/haoming02/sd-webui-diffusion-cg
- Owner: Haoming02
- License: mit
- Created: 2023-11-15T07:15:51.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-06T07:48:11.000Z (almost 2 years ago)
- Last Synced: 2024-05-07T07:46:30.526Z (almost 2 years ago)
- Topics: stable-diffusion-webui, stable-diffusion-webui-plugin
- Language: Python
- Homepage:
- Size: 4.65 MB
- Stars: 46
- Watchers: 3
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# SD Webui Diffusion Color Grading
This is an Extension for the [Automatic1111 Webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui), which performs *Color Grading* during the generation, producing a more **neutral** and **balanced**, but also **vibrant** and **contrasty** color.
> This is the fruition of the joint research between [TimothyAlexisVass](https://github.com/TimothyAlexisVass) with their findings, and me with my experience in developing [Vectorscope CC](https://github.com/Haoming02/sd-webui-vectorscope-cc)
**Note:** This Extension is disabled during [ADetailer](https://github.com/Bing-su/adetailer) phase to prevent inconsistent colors
## Features
This Extension comes with two main effects, **Recenter** and **Normalization**:
#### Recenter
Abstract
TimothyAlexisVass discovered that, the value of the latent noise Tensor often starts off-centered, and the mean of each channel tends to drift away from `0`. Therefore, I wrote a function to guide the mean back to `0`.
Effects
When you enable the feature, the output images will not have a biased color tint, and all colors will distribute more evenly; Additionally, the brightness will be adjusted so that bright areas are not overblown and dark areas are not clipped, producing an effect similar to HDR photos.
Samples
Off | On
#### Normalization
Abstract
By encoding images back into latent noise using VAE, TimothyAlexisVass discovered that the resulting values usually fall within a certain range, and thus theorized that if the final latent noise has a smaller value range than normal, then some precision is essentailly wasted. This gave me an idea to write a function that make the latent noise utilize the full depth.
Effects
When you enable the feature, the latent noise will attempt to span across the full value range if possible, before getting decoded by the VAE. As a result, bright areas will get brighter and dark areas will get darker, and additional details may also be introduced in these areas.
Off | On
> Both features increase the image file size when enabled, suggesting that they "contain more informations"
#### Misc.
- You can enable both features at the same time to generate some stunning images
- This Extension supports both `SD 1.5` and `SDXL` checkpoints
Off | On
## Settings
In the `Diffusion CG` section under the Stable Diffusion category in the **Settings** tab, you can make either feature default to `enable`, as well as setting the Stable Diffusion architecture to start with.
Structures of Stable Diffusion
The `Tensor` of the latent noise has a dimention of `[batch, 4, height / 8, width / 8]`.
- For **SD 1.5:** From my trial and error when developing [Vectorscope CC](https://github.com/Haoming02/sd-webui-vectorscope-cc), each of the 4 channels essentially represents the `-K`, `-M`, `C`, `Y` color for the **CMYK** color model.
- For **SDXL:** According to TimothyAlexisVass's [Blogpost](https://huggingface.co/blog/TimothyAlexisVass/explaining-the-sdxl-latent-space), the first 3 channels represent the `Y'`, `-Cr`, `-Cb` color for the **Y'CbCr** color model, while the 4th channel is the pattern/structure.