{"id":50506487,"url":"https://github.com/addo561/stable-diffusion","last_synced_at":"2026-06-02T16:31:01.113Z","repository":{"id":358095834,"uuid":"1239995007","full_name":"addo561/stable-diffusion","owner":"addo561","description":"A from-scratch PyTorch implementation of Stable Diffusion focused on understanding the mathematics, architecture, and engineering behind latent diffusion models. Built by manually implementing the UNet, attention mechanisms, schedulers, CFG, and latent denoising pipeline while supporting pretrained weight injection.","archived":false,"fork":false,"pushed_at":"2026-05-22T19:41:49.000Z","size":2207,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-05-22T22:28:01.240Z","etag":null,"topics":["clip","cross-attention","sampling","unet","vae"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/addo561.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-05-15T16:53:56.000Z","updated_at":"2026-05-22T19:41:53.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/addo561/stable-diffusion","commit_stats":null,"previous_names":["addo561/stable-diffusion"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/addo561/stable-diffusion","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/addo561%2Fstable-diffusion","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/addo561%2Fstable-diffusion/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/addo561%2Fstable-diffusion/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/addo561%2Fstable-diffusion/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/addo561","download_url":"https://codeload.github.com/addo561/stable-diffusion/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/addo561%2Fstable-diffusion/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33831622,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-02T02:00:07.132Z","response_time":109,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clip","cross-attention","sampling","unet","vae"],"created_at":"2026-06-02T16:31:00.075Z","updated_at":"2026-06-02T16:31:01.102Z","avatar_url":"https://github.com/addo561.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 🎨 Stable Diffusion Pipeline\n\nA hands-on implementation of Stable Diffusion v1.4 inference with custom DDIM sampling,\nclassifier-free guidance, and inpainting — built piece by piece to understand what's actually happening under the hood.\n\n---\n\n## 🖼️ Inpainting Result\n\n\u003e Full walkthrough in [`in-painting.ipynb`](./in-painting.ipynb)\n\n\u003c!-- Replace with your inpainting output image --\u003e\n\u003cimg width=\"512\" height=\"512\" alt=\"__results___9_12\" src=\"https://github.com/user-attachments/assets/b4df05f5-3ee7-4b81-9470-2fb43227cd13\" /\u003e\n\n\n*Mask-based latent blending: the original image is preserved outside the mask, and new content is diffused inside it.*\n\n---\n\n## 🚀 What This Project Does\n\nGenerate images from text prompts using a custom-built sampling pipeline, with working inpainting on top.\n\n---\n\n## 🛠️ What I Built From Scratch\n\n- **DDIM Sampler** — complete noise scheduling and denoising loop\n- **Classifier-Free Guidance** — custom conditional/unconditional steering\n- **VAE Interface** — latent encoding/decoding with proper scaling\n- **CLIP Text Pipeline** — tokenization and embedding extraction\n- **Inpainting Logic** — mask-based latent blending (`in-painting.ipynb`)\n\n## 🤝 What's Integrated\n\n- **UNet Backbone** — `UNet2DConditionModel` from 🤗 Diffusers (pre-trained weights)\n\n---\n\n## 💡 Why This Approach\n\nInitial work focused on injecting weights into a fully custom UNet architecture.\n684/686 layers loaded successfully, but architectural mismatches (GEGLU vs GELU activations,\nupsampling order) prevented coherent outputs. Rather than paper over the issue,\nthe pragmatic call was to use the proven Diffusers UNet as a stable backbone while keeping\nevery other component custom — quality without sacrificing what was learned.\n\n\u003e See [`stable-diffusion.ipynb`](./stable-diffusion.ipynb) for that experiment.\n\n---\n\n## 🏗️ Architecture\n\n### Inference Loop\n\n\u003cimg width=\"680\" height=\"582\" alt=\"sd_inference_loop\" src=\"https://github.com/user-attachments/assets/06010fd7-035d-467f-a56d-f835ab1801d9\" /\u003e\n\n---\n\n### Custom UNet vs. Diffusers UNet\n\nPrompt: *\"an astronaut riding a horse\"* — 35 steps each\n\n| My Custom UNet (weight injection attempt) | Diffusers UNet (final pipeline) |\n|:-----------------------------------------:|:-------------------------------:|\n| \u003cimg width=\"512\" height=\"512\" alt=\"custom unet output\" src=\"https://github.com/user-attachments/assets/706cfb48-bda0-44b5-b8ba-47b1c613ef4a\" /\u003e | \u003cimg width=\"512\" height=\"512\" alt=\"diffusers unet output\" src=\"https://github.com/user-attachments/assets/38f4f630-d950-4c8d-a9d8-bd1849e78466\" /\u003e |\n| *Garbled / incoherent output* | *Coherent, prompt-following output* |\n\n---\n\n## 📦 Features\n\n- ✅ Text-to-image generation\n- ✅ Configurable steps and guidance scale\n- ✅ Custom DDIM sampling loop\n- ✅ Inpainting with custom masks (`in-painting.ipynb`)\n\n---\n\n## 🔧 Usage\n\n```bash\npython inference.py -c \"your prompt\" -s 50 -g 7.5\n```\u003cimg width=\"512\" height=\"512\" alt=\"__results___9_12\" src=\"https://github.com/user-attachments/assets/c5168e58-7eee-426e-99fd-4c7c80b13542\" /\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faddo561%2Fstable-diffusion","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Faddo561%2Fstable-diffusion","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Faddo561%2Fstable-diffusion/lists"}