{"id":50337357,"url":"https://github.com/fdb/latent-diffusion-from-scratch","last_synced_at":"2026-05-29T14:30:47.148Z","repository":{"id":342588185,"uuid":"889427586","full_name":"fdb/latent-diffusion-from-scratch","owner":"fdb","description":null,"archived":false,"fork":false,"pushed_at":"2026-03-06T16:05:25.000Z","size":8263,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-06T17:34:32.760Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fdb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-11-16T11:03:46.000Z","updated_at":"2026-03-06T16:05:29.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/fdb/latent-diffusion-from-scratch","commit_stats":null,"previous_names":["fdb/latent-diffusion-from-scratch"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/fdb/latent-diffusion-from-scratch","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdb%2Flatent-diffusion-from-scratch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdb%2Flatent-diffusion-from-scratch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdb%2Flatent-diffusion-from-scratch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdb%2Flatent-diffusion-from-scratch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fdb","download_url":"https://codeload.github.com/fdb/latent-diffusion-from-scratch/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fdb%2Flatent-diffusion-from-scratch/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33657690,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-05-29T02:00:06.066Z","response_time":107,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-05-29T14:30:45.837Z","updated_at":"2026-05-29T14:30:47.137Z","avatar_url":"https://github.com/fdb.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Latent Diffusion Experiments\n\nPaired conditional diffusion models that generate images from pose skeleton inputs. Includes both pixel-space (256x256) and latent-space (32x32x4 via SD 1.5 VAE) variants.\n\n## Installation\n\nInstall [uv](https://docs.astral.sh/uv/getting-started/installation/) first. All commands use `uv run` — no need to activate a virtualenv.\n\n## Latent-Space Paired Diffusion (Recommended)\n\nUses a pretrained Stable Diffusion 1.5 VAE to compress images to 32x32x4 latent space before training. The UNet operates on ~48x fewer values than the pixel-space version, dramatically speeding up training and inference.\n\n### 1. Training\n\nTraining images should be paired JPGs (target on left, source/skeleton on right) in a single directory.\n\n```bash\n# Basic training\nuv run python train_latent_paired.py --train_dir datasets/research-week-2025\n\n# With custom settings\nuv run python train_latent_paired.py \\\n  --train_dir datasets/research-week-2025 \\\n  --num_epochs 50 \\\n  --batch_size 8 \\\n  --learning_rate 1e-4\n\n# Resume from checkpoint\nuv run python train_latent_paired.py \\\n  --resume_from output/train_latent_paired_.../checkpoints/checkpoint-0010\n\n# Force re-encode images through VAE (e.g. after changing dataset)\nuv run python train_latent_paired.py --recache\n```\n\nOn first run, all images are encoded through the frozen VAE and cached to `_latent_cache.pt` in the dataset directory. Subsequent runs load from cache instantly.\n\n### 2. Inference\n\n```bash\nuv run python inference_latent_paired.py \\\n  --checkpoint output/train_latent_paired_.../checkpoints/checkpoint-0010 \\\n  --input example-pose.png \\\n  --output result.png \\\n  --steps 20\n```\n\n### 3. ONNX Export\n\nExports three ONNX models for deployment (e.g. in Figment):\n\n```bash\nuv run python export_latent_onnx.py \\\n  --checkpoint_dir output/train_latent_paired_.../checkpoints/checkpoint-0010\n\n# Optional: also export fp16 versions\nuv run python export_latent_onnx.py \\\n  --checkpoint_dir output/train_latent_paired_.../checkpoints/checkpoint-0010 \\\n  --fp16\n```\n\nThis produces:\n- `vae_encoder.onnx` — encodes 256x256 RGB to 32x32x4 latent\n- `unet.onnx` — 8-channel latent UNet\n- `vae_decoder.onnx` — decodes 32x32x4 latent back to 256x256 RGB\n\nThe VAE scaling factor (0.18215) is baked into the encoder/decoder ONNX models.\n\n### 4. Figment Node\n\nOpen `latent-paired-diffusion.fgmt` in [Figment](https://figmentapp.com) and configure the three ONNX model paths. The node runs VAE encoding, DDIM denoising, and VAE decoding entirely on the GPU via WebGPU.\n\n## Pixel-Space Paired Diffusion (Legacy)\n\nThe original pixel-space variant operates at 256x256x3 with a 6-channel UNet.\n\n```bash\n# Training\nuv run python train_paired_256.py --num_epochs 50 --batch_size 4\n\n# Inference\nuv run python inference_paired.py \\\n  --checkpoint output/train_paired_.../checkpoints/checkpoint-0010 \\\n  --input example-pose.png\n\n# ONNX export (single UNet model)\nuv run python export_unet_onnx.py \\\n  --checkpoint_dir output/train_paired_.../checkpoints/checkpoint-0010\n```\n\nFigment node: `paired-diffusion.fgmt`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffdb%2Flatent-diffusion-from-scratch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffdb%2Flatent-diffusion-from-scratch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffdb%2Flatent-diffusion-from-scratch/lists"}