https://github.com/vlvink/latent-diffusion-core

Minimal latent diffusion (mini Stable Diffusion) implementation in PyTorch — VAE, U-Net with cross-attention, fast samplers, and text-to-image generation.
https://github.com/vlvink/latent-diffusion-core

diffusion-models image-generation latent-diffusion machine-learning pytorch research stable-diffusion text-to-image variational-autoencoder

Last synced: 16 days ago
JSON representation

Minimal latent diffusion (mini Stable Diffusion) implementation in PyTorch — VAE, U-Net with cross-attention, fast samplers, and text-to-image generation.

Host: GitHub
URL: https://github.com/vlvink/latent-diffusion-core
Owner: vlvink
Created: 2025-09-07T10:17:50.000Z (28 days ago)
Default Branch: main
Last Pushed: 2025-09-13T20:34:48.000Z (21 days ago)
Last Synced: 2025-09-13T22:21:04.100Z (21 days ago)
Topics: diffusion-models, image-generation, latent-diffusion, machine-learning, pytorch, research, stable-diffusion, text-to-image, variational-autoencoder
Language: Python
Homepage:
Size: 36.1 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

Latent Diffusion Core

The repository contains a minimal implementation of the latent diffusion model (mini Stable Diffusion) on PyTorch.

## Project Description

This project demonstrates:
- Training Variational Autoencoder (VAE) to compress images into a compact latent space.
- Training a U-Net denoizer in this latent space using a diffusion process.
- Using a text encoder (CLIP) and cross-attention to generate images based on text prompts.
- Support for fast samplers (DDIM/PNDM) and classifier-free guidance.

The final goal is to generate 256×256 images based on a text query with a quality approaching Stable Diffusion.

## Installation
To run this project, you'll need to set up a Python environment and install the necessary dependencies.

### Prerequisites
Make sure you have Python 3.11 installed.

1. Clone the repository:
```bash
git clone https://github.com/vlvink/latent_diffusion_core.git
cd latent_diffusion_core
```

2. Install the requirements
```bash
poetry install
```

3. Setting the poetry environment
```bash
poetry shell
```

### Dataset downloading
```bash
curl -L -o ./data/coco-2017-dataset.zip \
https://www.kaggle.com/api/v1/datasets/download/awsaf49/coco-2017-dataset

unzip -o ./data/coco-2017-dataset.zip -d ./data
mv ./data/coco2017/* ./data/
rmdir ./data/coco2017

rm ./data/coco-2017-dataset.zip
```

## Running the Code
### Training VAE
```bash
python train_vae.py \
--epochs 50 \
--batch-size 64
```

### Training Diffusion Model
```bash
python train_diffusion.py \
--epochs 200 \
--batch-size 32 \
--text-encoder clip-vit
```

### Image Generation
```bash
python sampling.py \
--prompt "A futuristic cityscape at sunset" \
--steps 50 \
--guidance-scale 7.5 \
--output out.png
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vlvink/latent-diffusion-core

Awesome Lists containing this project

README

Latent Diffusion Core