{"id":32692926,"url":"https://github.com/msmrexe/pytorch-diffusion-sprites","last_synced_at":"2026-05-12T23:37:30.615Z","repository":{"id":321794907,"uuid":"1087194246","full_name":"msmrexe/pytorch-diffusion-sprites","owner":"msmrexe","description":"A PyTorch from-scratch implementation of Denoising Diffusion Probabilistic Models (DDPM) and Denoising Diffusion Implicit Models (DDIM) sampling to generate 16x16 pixel art sprites.","archived":false,"fork":false,"pushed_at":"2025-10-31T14:57:02.000Z","size":45,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-31T16:13:54.990Z","etag":null,"topics":["course-project","ddim","ddpm","denoising","denoising-diffusion-probabilistic-models","diffusion-models","generative-models","image-generation","pytorch","u-net","university-project"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/msmrexe.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-31T14:17:35.000Z","updated_at":"2025-10-31T15:33:49.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/msmrexe/pytorch-diffusion-sprites","commit_stats":null,"previous_names":["msmrexe/pytorch-ddpm-sprites"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/msmrexe/pytorch-diffusion-sprites","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msmrexe%2Fpytorch-diffusion-sprites","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msmrexe%2Fpytorch-diffusion-sprites/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msmrexe%2Fpytorch-diffusion-sprites/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msmrexe%2Fpytorch-diffusion-sprites/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/msmrexe","download_url":"https://codeload.github.com/msmrexe/pytorch-diffusion-sprites/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/msmrexe%2Fpytorch-diffusion-sprites/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":282166184,"owners_count":26625195,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-11-01T02:00:06.759Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["course-project","ddim","ddpm","denoising","denoising-diffusion-probabilistic-models","diffusion-models","generative-models","image-generation","pytorch","u-net","university-project"],"created_at":"2025-11-01T16:02:14.492Z","updated_at":"2025-11-01T16:05:06.300Z","avatar_url":"https://github.com/msmrexe.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Scratch-Built Diffusion Models: DDPM \u0026 DDIM for Sprites\n\nDiffusion models have rapidly become a cornerstone of modern generative AI, known for their ability to produce stunningly high-fidelity results. This project provides a complete **from-scratch PyTorch implementation** exploring the core mechanics of these powerful models. It implements the foundational **Denoising Diffusion Probabilistic Model (DDPM)** and its faster, deterministic counterpart, the **Denoising Diffusion Implicit Model (DDIM)**. Developed for the M.S. course Generative Models, this repository breaks down the complex theory into clean, modular code for generating 16x16 pixel art sprites.\n\n## Features\n\n* **Denoising Diffusion Probabilistic Model (DDPM)**: Full implementation from scratch.\n* **Denoising Diffusion Implicit Models (DDIM)**: Includes a faster, deterministic DDIM sampling loop.\n* **U-Net Noise Predictor**: A U-Net architecture designed to predict the noise added at any timestep.\n* **Modular Code**: All logic is organized into a clean, importable `src/` package.\n* **Evaluation**: Built-in script to calculate the Fréchet Inception Distance (FID) score.\n\n## Core Concepts \u0026 Techniques\n\n* **Generative Modeling**: Learning a data distribution $p(x)$ to generate new samples.\n* **Diffusion Models**: A class of models that work by systematically destroying data structure (forward process) and then learning to reverse the process (reverse process).\n* **Forward (Noising) Process**: A Markov process that gradually adds Gaussian noise to an image $\\mathbf{x}_0$ over $T$ timesteps, producing a sequence of noisy images $\\mathbf{x}_1, ..., \\mathbf{x}_T$.\n* **Reverse (Denoising) Process**: A learned Markov process $p_{\\theta}(\\mathbf{x}_{t-1} | \\mathbf{x}_t)$ that denoises an image from $\\mathbf{x}_T \\sim \\mathcal{N}(0, \\mathbf{I})$ back to a clean image $\\mathbf{x}_0$.\n* **U-Net Architecture**: Using skip connections to preserve high-resolution features, making it ideal for image-to-image tasks like noise prediction.\n\n---\n\n## How It Works\n\nThis project trains a model, $\\epsilon_{\\theta}$, to reverse a diffusion process. The process is broken into two parts: the fixed forward process and the learned reverse process.\n\n### 1. The Forward (Noising) Process\n\nThe forward process, $q$, gradually adds Gaussian noise to a clean image $\\mathbf{x}\\_0$ according to a variance schedule $\\beta_t$. We define $\\alpha_t = 1 - \\beta_t$ and $\\bar{\\alpha}\\_t = \\prod_{i=1}^{t} \\alpha_i$.\n\nA key property of this process is that we can sample $\\mathbf{x}_t$ at any arbitrary timestep $t$ in a closed-form equation, without having to iterate through all $t$ steps:\n\n$$q(\\mathbf{x}_t | \\mathbf{x}_0) = \\mathcal{N}(\\mathbf{x}_t; \\sqrt{\\bar{\\alpha}_t} \\mathbf{x}_0, (1 - \\bar{\\alpha}_t)\\mathbf{I})$$\n\nThis means we can generate a training pair $(\\mathbf{x}_t, t)$ by picking a random image $\\mathbf{x}_0$, a random timestep $t$, and sampling a noise vector $\\epsilon \\sim \\mathcal{N}(0, \\mathbf{I})$. The noised image is then:\n\n$$\\mathbf{x}_t = \\sqrt{\\bar{\\alpha}_t} \\mathbf{x}_0 + \\sqrt{1 - \\bar{\\alpha}_t} \\epsilon$$\n\n### 2. The Learned Reverse (Denoising) Process\n\nThe goal of the model is to learn the reverse process $p_{\\theta}(\\mathbf{x}\\_{t-1} | \\mathbf{x}\\_t)$. It can be shown that if $\\beta_t$ is small, this reverse transition is also Gaussian. The model $\\epsilon_{\\theta}(\\mathbf{x}\\_t, t)$ is trained to predict the noise $\\epsilon$ that was added to create $\\mathbf{x}_t$.\n\nThe training loss is a simple Mean Squared Error (MSE) between the predicted noise and the actual noise:\n\n$$L = \\mathbb{E}_{t, \\mathbf{x}_0, \\epsilon} \\left[ ||\\epsilon - \\epsilon_{\\theta}(\\mathbf{x}_t, t)||^2 \\right]$$\n\n### 3. Sampling (DDPM vs. DDIM)\n\nOnce the model $\\epsilon_{\\theta}$ is trained, we can generate new images by starting with pure noise $\\mathbf{x}\\_T \\sim \\mathcal{N}(0, \\mathbf{I})$ and iteratively sampling $\\mathbf{x}_{t-1}$ from $\\mathbf{x}_t$ for $t = T, ..., 1$.\n\n#### Algorithm 1: DDPM Sampling\n\nThe original DDPM paper derives the following equation for sampling $\\mathbf{x}_{t-1}$:\n\n$$\\mathbf{x}_{t-1} = \\frac{1}{\\sqrt{\\alpha_t}} \\left( \\mathbf{x}_t - \\frac{1 - \\alpha_t}{\\sqrt{1 - \\bar{\\alpha}_t}} \\epsilon_{\\theta}(\\mathbf{x}_t, t) \\right) + \\sigma_t \\mathbf{z}$$\n\nwhere $\\mathbf{z} \\sim \\mathcal{N}(0, \\mathbf{I})$ (if $t \u003e 1$) and $\\sigma_t^2 = \\beta_t$. This is a **stochastic** process, as new noise $\\mathbf{z}$ is added at each step. It requires all $T$ steps (e.g., 1000) to generate an image.\n\n#### Algorithm 2: DDIM Sampling\n\nDDIM provides a more general sampling process that is **deterministic** when $\\eta = 0$. It also allows for \"jumps,\" sampling in far fewer steps (e.g., 50-100) while achieving high-quality results.\n\nThe DDIM update rule is:\n\n$$\\mathbf{x}_{t-1} = \\sqrt{\\bar{\\alpha}_{t-1}} \\mathbf{x}_0^{\\text{pred}} + \\sqrt{1 - \\bar{\\alpha}_{t-1} - \\sigma_t^2} \\epsilon_{\\theta}(\\mathbf{x}_t, t) + \\sigma_t \\mathbf{z}$$\n\nwhere $\\mathbf{x}_0^{\\text{pred}}$ is the model's prediction of the *original* clean image, and $\\sigma_t$ is a parameter controlled by $\\eta$. When $\\eta=0$, $\\sigma_t=0$, and the process becomes deterministic.\n\n### 4. Model Architecture (U-Net)\n\nThe noise predictor $\\epsilon_{\\theta}(\\mathbf{x}_t, t)$ is a U-Net.\n* **Input:** A noised image $\\mathbf{x}_t$ (shape `[B, 3, 16, 16]`) and its timestep $t$.\n* **Output:** The predicted noise $\\epsilon$ (shape `[B, 3, 16, 16]`).\n* **Architecture:** It consists of a down-sampling path (encoder) and an up-sampling path (decoder) with skip connections. The timestep $t$ and context labels $c$ are embedded and injected into the model at various resolutions. This implementation uses `ResidualConvBlock`s and fixes a critical inefficiency from the original notebook where a shortcut layer was re-initialized on every forward pass.\n\n---\n\n## Project Structure\n\n```\npytorch-diffusion-sprites/\n├── .gitignore             # Ignores data, logs, outputs, and pycache\n├── LICENSE                # MIT License file\n├── README.md              # You are here!\n├── requirements.txt       # Project dependencies\n├── notebooks/\n│   └── run.ipynb          # Jupyter notebook to run the full pipeline\n├── scripts/\n│   ├── download_data.sh   # Script to download the .npy dataset\n│   ├── train.py           # Main training script\n│   ├── sample.py          # Script to generate sample images\n│   └── evaluate.py        # Script to generate images and run FID evaluation\n└── src/\n    ├── __init__.py        # Makes 'src' a Python package\n    ├── config.py          # All hyperparameters and file paths\n    ├── data_loader.py     # CustomDataset and get_dataloaders function\n    ├── model.py           # U-Net model architecture (Unet, ResidualConvBlock, etc.)\n    ├── diffusion.py       # DiffusionScheduler class (holds DDPM/DDIM logic)\n    └── utils.py           # Utility functions (logging, plotting, saving images)\n```\n\n## How to Use\n\n1.  **Clone the Repository:**\n    ```bash\n    git clone https://github.com/msmrexe/pytorch-diffusion-sprites.git\n    cd pytorch-diffusion-sprites\n    ```\n\n2.  **Install Requirements:**\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n3.  **Download the Data:**\n    Run the download script. This will create a `data/` folder and place the `.npy` files inside.\n    ```bash\n    bash scripts/download_data.sh\n    ```\n\n4.  **Train the Model:**\n    Run the training script. The model will be trained according to the settings in `src/config.py`. The best model (based on validation loss) will be saved to `outputs/models/ddpm_sprite_best.pth`. A loss plot will be saved to `outputs/loss_plot.png`.\n    ```bash\n    python scripts/train.py\n    ```\n\n5.  **Generate Samples:**\n    After training, you can generate a grid of sample images.\n\n    * **Using DDPM (1000 steps, stochastic):**\n        ```bash\n        python scripts/sample.py --n-samples 16 --method ddpm\n        ```\n    * **Using DDIM (50 steps, deterministic):**\n        ```bash\n        python scripts/sample.py --n-samples 16 --method ddim --n-ddim-steps 50 --eta 0.0\n        ```\n    This will save a file to `outputs/samples/`.\n\n6.  **Evaluate the Model (FID Score):**\n    This script will generate 3000 real images and 3000 fake images, save them to `outputs/eval/`, and then compute the FID score.\n    ```bash\n    python scripts/evaluate.py --n-samples 3000 --method ddim --n-ddim-steps 100\n    ```\n\u003c!---\n    *Expected Output:*\n    ```\n    [...]\n    [2025-10-31 10:11:18] [INFO] Calculating FID score...\n    [2025-10-31 10:12:18] [INFO] *** FID Score: [some_value] ***\n    [2025-10-31 10:12:18] [INFO] Evaluation complete.\n    ```\n---\u003e\n\n---\n\n## Author\n\nFeel free to connect or reach out if you have any questions!\n\n* **Maryam Rezaee**\n* **GitHub:** [@msmrexe](https://github.com/msmrexe)\n* **Email:** [ms.maryamrezaee@gmail.com](mailto:ms.maryamrezaee@gmail.com)\n\n---\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE) file for full details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsmrexe%2Fpytorch-diffusion-sprites","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmsmrexe%2Fpytorch-diffusion-sprites","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmsmrexe%2Fpytorch-diffusion-sprites/lists"}