https://github.com/vlvink/latent-diffusion-core
Minimal latent diffusion (mini Stable Diffusion) implementation in PyTorch — VAE, U-Net with cross-attention, fast samplers, and text-to-image generation.
https://github.com/vlvink/latent-diffusion-core
diffusion-models image-generation latent-diffusion machine-learning pytorch research stable-diffusion text-to-image variational-autoencoder
Last synced: 16 days ago
JSON representation
Minimal latent diffusion (mini Stable Diffusion) implementation in PyTorch — VAE, U-Net with cross-attention, fast samplers, and text-to-image generation.
- Host: GitHub
- URL: https://github.com/vlvink/latent-diffusion-core
- Owner: vlvink
- Created: 2025-09-07T10:17:50.000Z (28 days ago)
- Default Branch: main
- Last Pushed: 2025-09-13T20:34:48.000Z (21 days ago)
- Last Synced: 2025-09-13T22:21:04.100Z (21 days ago)
- Topics: diffusion-models, image-generation, latent-diffusion, machine-learning, pytorch, research, stable-diffusion, text-to-image, variational-autoencoder
- Language: Python
- Homepage:
- Size: 36.1 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Latent Diffusion Core
The repository contains a minimal implementation of the latent diffusion model (mini Stable Diffusion) on PyTorch.
## Project Description
This project demonstrates:
- Training Variational Autoencoder (VAE) to compress images into a compact latent space.
- Training a U-Net denoizer in this latent space using a diffusion process.
- Using a text encoder (CLIP) and cross-attention to generate images based on text prompts.
- Support for fast samplers (DDIM/PNDM) and classifier-free guidance.The final goal is to generate 256×256 images based on a text query with a quality approaching Stable Diffusion.
## Installation
To run this project, you'll need to set up a Python environment and install the necessary dependencies.### Prerequisites
Make sure you have Python 3.11 installed.1. Clone the repository:
```bash
git clone https://github.com/vlvink/latent_diffusion_core.git
cd latent_diffusion_core
```2. Install the requirements
```bash
poetry install
```3. Setting the poetry environment
```bash
poetry shell
```### Dataset downloading
```bash
curl -L -o ./data/coco-2017-dataset.zip \
https://www.kaggle.com/api/v1/datasets/download/awsaf49/coco-2017-datasetunzip -o ./data/coco-2017-dataset.zip -d ./data
mv ./data/coco2017/* ./data/
rmdir ./data/coco2017rm ./data/coco-2017-dataset.zip
```## Running the Code
### Training VAE
```bash
python train_vae.py \
--epochs 50 \
--batch-size 64
```### Training Diffusion Model
```bash
python train_diffusion.py \
--epochs 200 \
--batch-size 32 \
--text-encoder clip-vit
```### Image Generation
```bash
python sampling.py \
--prompt "A futuristic cityscape at sunset" \
--steps 50 \
--guidance-scale 7.5 \
--output out.png
```