https://github.com/andraniksargsyan/edufusion

DIY Stable Diffusion 1.5 implementation with minimal dependencies.
https://github.com/andraniksargsyan/edufusion

ddim ddpm diy educational latent-diffusion ldm stable-diffusion

Last synced: 7 months ago
JSON representation

DIY Stable Diffusion 1.5 implementation with minimal dependencies.

Host: GitHub
URL: https://github.com/andraniksargsyan/edufusion
Owner: AndranikSargsyan
License: mit
Created: 2024-10-22T22:44:55.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-12-08T21:31:42.000Z (11 months ago)
Last Synced: 2025-02-04T13:43:32.792Z (9 months ago)
Topics: ddim, ddpm, diy, educational, latent-diffusion, ldm, stable-diffusion
Language: Python
Homepage:
Size: 606 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # Edufusion: Stable Diffusion 1.5 for Educational Purposes



Edufusion is a DIY implementation of Stable Diffusion 1.5 with minimal dependencies, designed for educational purposes.

This repository provides only the core components of Stable Diffusion 1.5 (VAE, CLIP Tokenizer, CLIP Text Encoder and UNet) along with their corresponding weights. The implementation of the sampling loop is left as an exercise for learners.

The goals of this project are:

- Foster a deep understanding of Diffusion Models by challenging users to implement sampling methods using the provided pre-trained LDM components.

- Provide a minimal implementation of LDM, enabling learners to grasp the functionality of each component without navigating complex dependency trees.

- Enable users to easily conduct hands-on experimentation with diffusion models to deepen their understanding and to potentially contribute new innovations to the field.

## Sample usage

The package can be installed like this:

```bash

git clone https://github.com/AndranikSargsyan/Edufusion

cd Edufusion

pip install -e .

```

Then download the model weights by executing the following commands:

```bash

mkdir -p models

wget -ncv --show-progress -O models/text-encoder-sd-v1-5-fp16.pt https://huggingface.co/andraniksargsyan/stable-diffusion-v1-5/resolve/main/text-encoder-sd-v1-5-fp16.pt?download=true

wget -ncv --show-progress -O models/vae-sd-v1-5-fp16.pt https://huggingface.co/andraniksargsyan/stable-diffusion-v1-5/resolve/main/vae-sd-v1-5-fp16.pt?download=true

wget -ncv --show-progress -O models/unet-sd-v1-5-fp16.pt https://huggingface.co/andraniksargsyan/stable-diffusion-v1-5/resolve/main/unet-sd-v1-5-fp16.pt?download=true

```

After the installation the components can be used as demonstrated in the following sections.

### Variational AutoEncoder (VAE)

In Stable Diffusion, the VAE helps in encoding high-dimensional images into a lower-dimensional latent space, reducing the complexity for UNet processing.

```python

import torch

import torchvision.transforms.functional as tvF

from PIL import Image

from edufusion import AutoencoderKL

vae = AutoencoderKL().to(device="cuda").half()

vae.load_state_dict(torch.load("path/to/vae-sd-v1-5-fp16.pt", weights_only=True))

img = Image.open("path/to/image.jpg").resize((512, 512))

img_tensor = 2.0 * tvF.pil_to_tensor(img).unsqueeze(0) / 255 - 1.0

img_tensor = img_tensor.to("cuda").half()

with torch.no_grad():

    z0 = vae.encode(img_tensor).mean

    x = vae.decode(z0)

reconstructed_img = tvF.to_pil_image((x[0] / 2 + 0.5).clip(0, 1))

```

### CLIP Text Encoder

The CLIP Text Encoder in Stable Diffusion converts text prompts into vector representations, which guide the image generation process.

```python

from edufusion import FrozenCLIPEmbedder

text_encoder = FrozenCLIPEmbedder()

text_encoder = text_encoder.to(device="cuda").half()

text_encoder.load_state_dict(torch.load("path/to/text-encoder-sd-v1-5-fp16.pt", weights_only=True))

encoded_text = text_encoder.encode(['The quick brown fox jumps over the lazy dog'])

print(encoded_text.shape)

```

### UNet

The UNet is the denoiser network in DDPM theory.

```python

from edufusion import UNetModel

unet = UNetModel().to(device="cuda").half()

unet.load_state_dict(torch.load("./models/unet-sd-v1-5-fp16.pt", weights_only=True))

print("Total number of UNet parameters:", sum(p.numel() for p in unet.parameters()))

```

### Tasks to DIY

- Verify the reconstruction ability of the provided VAE,

- Implement [DDPM](https://arxiv.org/abs/2006.11239) reverse process with classifier-free guidance,

- Implement [DDIM](https://arxiv.org/abs/2010.02502) reverse process with classifier-free guidance,

- Implement deterministic DDIM forward process and perform DDIM inversion to verify the reconstruction quality,

- Perform [SDEdit](https://arxiv.org/abs/2108.01073)-like image editing,

- Implement [Blended Latent Diffusion](https://arxiv.org/abs/2206.02779) inpainting method,

- ...

## Acknowledgements

The code in this repository is based on

[stablediffusion](https://github.com/Stability-AI/stablediffusion), [CLIP](https://github.com/openai/CLIP) and [🤗 Transformers](https://github.com/huggingface/transformers).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/andraniksargsyan/edufusion

Awesome Lists containing this project

README