Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/madebyollin/dino-diffusion

Bare-bones diffusion model code
https://github.com/madebyollin/dino-diffusion

Last synced: about 1 month ago
JSON representation

Bare-bones diffusion model code

Host: GitHub
URL: https://github.com/madebyollin/dino-diffusion
Owner: madebyollin
License: mit
Created: 2023-02-05T02:14:23.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-07-17T14:40:50.000Z (7 months ago)
Last Synced: 2024-09-03T00:04:04.717Z (5 months ago)
Language: Jupyter Notebook
Homepage: https://madebyoll.in/posts/dino_diffusion/
Size: 3.14 MB
Stars: 139
Watchers: 4
Forks: 11
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        # 🦖 Dino Diffusion: Bare-bones diffusion model code



  



Dino Diffusion is a short, stand-alone notebook of PyTorch code that [learns to generate images](https://madebyoll.in/posts/dino_diffusion/) based on a training dataset. Dino Diffusion works by training a "noisy-thing-goes-in, denoised-thing-comes-out" neural network ("diffusion model") on the dataset. Dino Diffusion was written to satisfy my own curiosity about how diffusion models work - it is not production-grade software - but if you make anything cool with it, let me know!

Here's what results look for the [huggan Pokémon](https://huggingface.co/datasets/huggan/pokemon) dataset (after a few hours training on one RTX A4000 - about 2x faster than Colab):

![](distribution_comparison.png)![](training_screenshot.jpg)

Here's what results look like for the [huggan anime-faces](https://huggingface.co/datasets/huggan/anime-faces) dataset:

![](training_screenshot_2.jpg)

There are many sophisticated diffusion model training codebases out there already, with lots of clever features. Dino Diffusion is not sophisticated and does not have any clever features.

Notably, Dino Diffusion does not have:

1. Any self-attention layers, or any normalization layers, or any activation functions invented [after 1975](https://link.springer.com/article/10.1007/BF00342633). Dino just uses Conv+ReLU for everything.

3. Any fancy prediction-target setups ("epsilon parameterization", "v objective"). Dino just predicts the noise-free image.

4. Any "discrete-time formulation" or "sinusoidal timestep embedding" or whatever. Dino just consumes the % noise of the image in [0, 1].

5. Any "Gaussian distribution parameterization" or Gaussian anything. Dino just uses `torch.rand` (uniform from [0, 1)).

6. Any `sqrt_one_minus_alphas_cumprod` or its [whimsical cadre of associates](https://github.com/hojonathanho/diffusion/blob/master/diffusion_tf/diffusion_utils.py#L70).

   1. During training, Dino mixes in some random % of noise and asks the network to remove that % of noise. 

   2. During image generation, Dino starts with 100% noise, and repeatedly mixes in denoised predictions to make the noise % fall to 0.

Dino Diffusion harkens back to a simpler time, when dinosaurs ruled the earth.

# Addendum

For Serious Training Runs, I recommend admitting a _bit_ of complexity back in and using the [IADB](https://arxiv.org/pdf/2305.03486) diffusion formulation (gaussian noise, [-1, 1] images, and prediction of `image - noise` instead of `image`) with noise levels sampled from a stratified lognorm distribution (`stratified_noise_level = th.randperm(batch_size).add(th.rand(batch_size)).div_(batch_size)`; `noise_level = th.sigmoid(th.erfinv(stratified_noise_level.mul_(2).sub_(1)).mul_(2**0.5))` instead of `noise_level = th.rand(batch_size)`). Still no `sqrt_one_minus_alphas_cumprod` though.