Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/madebyollin/dino-diffusion
Bare-bones diffusion model code
https://github.com/madebyollin/dino-diffusion
Last synced: 3 months ago
JSON representation
Bare-bones diffusion model code
- Host: GitHub
- URL: https://github.com/madebyollin/dino-diffusion
- Owner: madebyollin
- License: mit
- Created: 2023-02-05T02:14:23.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-07-17T14:40:50.000Z (5 months ago)
- Last Synced: 2024-07-17T18:11:16.735Z (5 months ago)
- Language: Jupyter Notebook
- Homepage: https://madebyoll.in/posts/dino_diffusion/
- Size: 3.14 MB
- Stars: 134
- Watchers: 4
- Forks: 10
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# π¦ Dino Diffusion: Bare-bones diffusion model code
Dino Diffusion is a short, stand-alone notebook of PyTorch code that [learns to generate images](https://madebyoll.in/posts/dino_diffusion/) based on a training dataset. Dino Diffusion works by training a "noisy-thing-goes-in, denoised-thing-comes-out" neural network ("diffusion model") on the dataset. Dino Diffusion was written to satisfy my own curiosity about how diffusion models work - it is not production-grade software - but if you make anything cool with it, let me know!
Here's what results look for the [huggan PokΓ©mon](https://huggingface.co/datasets/huggan/pokemon) dataset (after a few hours training on one RTX A4000 - about 2x faster than Colab):
![](distribution_comparison.png)![](training_screenshot.jpg)
Here's what results look like for the [huggan anime-faces](https://huggingface.co/datasets/huggan/anime-faces) dataset:
![](training_screenshot_2.jpg)
There are many sophisticated diffusion model training codebases out there already, with lots of clever features. Dino Diffusion is not sophisticated and does not have any clever features.
Notably, Dino Diffusion does not have:
1. Any self-attention layers, or any normalization layers, or any activation functions invented [after 1975](https://link.springer.com/article/10.1007/BF00342633). Dino just uses Conv+ReLU for everything.
3. Any fancy prediction-target setups ("epsilon parameterization", "v objective"). Dino just predicts the noise-free image.
4. Any "discrete-time formulation" or "sinusoidal timestep embedding" or whatever. Dino just consumes the % noise of the image in [0, 1].
5. Any "Gaussian distribution parameterization" or Gaussian anything. Dino just uses `torch.rand` (uniform from [0, 1)).
6. Any `sqrt_one_minus_alphas_cumprod` or its [whimsical cadre of associates](https://github.com/hojonathanho/diffusion/blob/master/diffusion_tf/diffusion_utils.py#L70).
1. During training, Dino mixes in some random % of noise and asks the network to remove that % of noise.
2. During image generation, Dino starts with 100% noise, and repeatedly mixes in denoised predictions to make the noise % fall to 0.Dino Diffusion harkens back to a simpler time, when dinosaurs ruled the earth.
# Addendum
For Serious Training Runs, I recommend admitting a _bit_ of complexity back in and using the [IADB](https://arxiv.org/pdf/2305.03486) diffusion formulation (gaussian noise, [-1, 1] images, and prediction of `image - noise` instead of `image`) with noise levels sampled from a stratified lognorm distribution (`stratified_noise_level = th.randperm(batch_size).add(th.rand(batch_size)).div_(batch_size)`; `noise_level = th.sigmoid(th.erfinv(stratified_noise_level.mul_(2).sub_(1)).mul_(2**0.5))` instead of `noise_level = th.rand(batch_size)`). Still no `sqrt_one_minus_alphas_cumprod` though.