https://github.com/Merserk/ComfyUI-PiD
ComfyUI custom node for NVIDIA PiD pixel diffusion decoding and upscale workflows
https://github.com/Merserk/ComfyUI-PiD
comfyui comfyui-custom-node comfyui-node diffusion-models flux flux2 latent-decoder nvidia-pid pid pixel-diffusion sd3 vae-decoder z-image
Last synced: 6 days ago
JSON representation
ComfyUI custom node for NVIDIA PiD pixel diffusion decoding and upscale workflows
- Host: GitHub
- URL: https://github.com/Merserk/ComfyUI-PiD
- Owner: Merserk
- License: mit
- Created: 2026-05-25T08:27:25.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-06-12T20:13:19.000Z (14 days ago)
- Last Synced: 2026-06-12T22:11:43.871Z (14 days ago)
- Topics: comfyui, comfyui-custom-node, comfyui-node, diffusion-models, flux, flux2, latent-decoder, nvidia-pid, pid, pixel-diffusion, sd3, vae-decoder, z-image
- Language: Python
- Homepage:
- Size: 500 KB
- Stars: 85
- Watchers: 3
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-comfyui - **ComfyUI-PiD**
README
# ComfyUI-PiD
Compact ComfyUI nodes for **NVIDIA PiD / PixelDiT** using ComfyUI-native `Comfy-Org/PixelDiT` model loading.


PiD is a latent-conditioned pixel diffusion decoder/upscaler:
```text
LATENT + caption + sigma -> PiD -> IMAGE
```
## Install
```bash
cd ComfyUI/custom_nodes
git clone https://github.com/Merserk/ComfyUI-PiD.git
cd ComfyUI-PiD
python -m pip install -r requirements.txt
```
Restart ComfyUI.
Requirements: recent ComfyUI with native PixelDiT/PiD support, Python `>=3.10`, NVIDIA CUDA GPU recommended.
## Models
Most nodes can download required files automatically when `auto_download=true`.
| Use | Source | Local folder |
| --- | --- | --- |
| PiD diffusion + Gemma text encoder | `Comfy-Org/PixelDiT` | `ComfyUI/models/diffusion_models/nvidia_pid/` and `ComfyUI/models/text_encoders/nvidia_pid/` |
| Caption Creator | `Qwen/Qwen3.5-0.8B` | `ComfyUI/models/text_encoders/nvidia_pid/qwen35_caption/` |
| Upscale VAEs | Flux/Z-Image, Flux2, SD3 VAE files | `ComfyUI/models/vae/nvidia_pid/` |
Use `model_precision=bf16` for best quality. `fp8` is available only for Flux1-family `2k/2kto4k` and Flux2-family `2k`; Flux2 `2kto4k`, SD3, SDXL, and Qwen-Image must use `bf16`.
## Nodes
| Node | Output | Purpose |
| --- | --- | --- |
| **PiD Decode** | `IMAGE` | One-node PiD decode from latent + caption + sigma. |
| **PiD Text Prompt** | `text`, `caption` | One prompt for normal text encoding and PiD caption input. |
| **PiD Caption Creator** | `text`, `caption` | Creates a caption from an input image with local Qwen. |
| **PiD Empty Latent Image** | `LATENT` | Backbone-aware empty latent with correct channels/downscale. |
| **PiD KSampler Capture** | `final_latent`, `pid_latent`, `pid_sigma` | KSampler-compatible sampler that captures the PiD latent and sigma. |
| **PiD Prepare** | `PID_PREP` | Moves/validates latent data and resolves PiD model assets. |
| **PiD Sample** | `PID_SAMPLES` | Runs native PiD sampling. |
| **PiD Finalize** | `IMAGE` | Converts PiD samples to a ComfyUI image. |
| **PiD Upscale** | `IMAGE` | Image-only tiled PiD upscaler with `2x/4x/6x/8x` output. |
Recommended PiD sampling: `pid_steps=4`, `cfg_scale=1.0`, `scale=0` or `4`.
## Supported Backbones
| Backbone value | PiD family | Checkpoints | Latent | PiD Upscale |
| --- | --- | --- | --- | --- |
| `zimage` | Flux1 | `2k`, `2kto4k` | 16ch / 8x | yes |
| `zimage-turbo` | Flux1 | `2k`, `2kto4k` | 16ch / 8x | yes |
| `flux` | Flux1 | `2k`, `2kto4k` | 16ch / 8x | yes |
| `flux2` | Flux2 | `2k`, `2kto4k` | 128ch / 16x | yes |
| `flux2-klein-4b` | Flux2 | `2k`, `2kto4k` | 128ch / 16x | yes |
| `flux2-klein-9b` | Flux2 | `2k`, `2kto4k` | 128ch / 16x | yes |
| `sd3` | SD3 | `2k`, `2kto4k` | 16ch / 8x | yes |
| `sdxl` | SDXL | `2kto4k` only | 4ch / 8x | no |
| `qwenimage` | Qwen-Image | `2kto4k` only | 16ch / 8x | no |
| `qwenimage-2512` | Qwen-Image | `2kto4k` only | 16ch / 8x | no |
`dinov2` and `siglip` are not supported by the native Comfy-Org PiD model set.
## Output Size Guide
Released PiD checkpoints use native `4x` scale.
| `pid_ckpt_type` | Base latent/image size | Final PiD output | Valid base presets |
| --- | --- | --- | --- |
| `2k` | 512-class | base × 4, e.g. `512x512 -> 2048x2048` | `512x512`, `576x432`, `432x576`, `624x416`, `416x624`, `672x384`, `384x672`, `784x336`, `336x784` |
| `2kto4k` | 1024-class | base × 4, e.g. `1024x1024 -> 4096x4096` | `1024x1024`, `1024x768`, `768x1024`, `1008x672`, `672x1008`, `1024x576`, `576x1024`, `1008x432`, `432x1008` |
Latent size depends on backbone downscale. Example: Flux2 `1024x1024` uses a `128 × 64 × 64` latent.
## PiD Upscale
`PiD Upscale` accepts `IMAGE` and returns `IMAGE`. It is separate from latent decode: the node cuts the image into tiles, encodes each tile with the matching VAE, runs native 4-step PiD, blends tiles, then resizes to the selected final factor.
| Setting | Values / behavior |
| --- | --- |
| `pid_ckpt_type` | `2k` uses 512px tiles; `2kto4k` uses 1024px tiles. |
| `backbone` | `zimage`, `zimage-turbo`, `flux`, `flux2`, `flux2-klein-4b`, `flux2-klein-9b`, `sd3`. |
| `model_precision` | Same limits as PiD decode; use `bf16` for best quality. |
| `upscale_factor` | Final output size: `2x`, `4x`, `6x`, or `8x`. |
| `strength` | PiD detail regeneration sigma, `0.0` to `1.0`; default `0.4`. |
| `caption` | Optional string input; connect `PiD Caption Creator` or `PiD Text Prompt`. |
| Profile | Tile size | Overlap | Small-image prepass |
| --- | ---: | ---: | ---: |
| `2k` | 512 | 64 | Resize long edge to 512, PiD once, then tiled upscale. |
| `2kto4k` | 1024 | 128 | Resize long edge to 1024, PiD once, then tiled upscale. |
Upscale VAEs are required because image tiles must be encoded into each backbone latent format:
| Backbone family | Accepted VAE names |
| --- | --- |
| Flux1 / Z-Image | `ae.safetensors` |
| Flux2 / Flux2-Klein | `flux2_ae.safetensors`, `flux2-vae.safetensors` |
| SD3 | `sd3_vae.safetensors`, `diffusion_pytorch_model.safetensors` |
Final upscale size is always based on the original input image: `width × factor`, `height × factor`. SDXL and Qwen-Image are not available in `PiD Upscale` because this implementation only maps image VAEs for Flux1/Z-Image, Flux2/Flux2-Klein, and SD3.
## Recommended Capture Settings
| Backbone | LDM steps | Capture step | Sampler / scheduler |
| --- | ---: | ---: | --- |
| `flux`, `sd3` | 28 | 24 | `euler` / `flowmatch_euler_discrete` |
| `sdxl` | 30 | 26 | `euler` / `normal` |
| `flux2` | 50 | 46 | `euler` / `flowmatch_euler_discrete` |
| `flux2-klein-4b`, `flux2-klein-9b` | 4 | 4 | `euler` / `flowmatch_euler_discrete` |
| `qwenimage`, `qwenimage-2512` | 50 | 44 | `euler` / `flowmatch_euler_discrete` |
| `zimage` | 50 | 46 | `euler` / `flowmatch_euler_discrete`, `flowmatch_shift=3.0` |
| `zimage-turbo` | 9 | 9 | `euler` / `flowmatch_euler_discrete`, `flowmatch_shift=3.0` |
## Main Workflows
### Text-to-image / generation
```text
PiD Text Prompt -> normal text encode + PiD caption
PiD Empty Latent Image -> model sampler
PiD KSampler Capture pid_latent + pid_sigma -> PiD Prepare
PiD Prepare -> PiD Sample -> PiD Finalize -> Save Image
```
### Direct decode
```text
LATENT + caption + sigma -> PiD Decode -> Save Image
```
### Image-to-image clean decode
```text
Load Image -> Resize -> VAE Encode -> PiD Prepare -> PiD Sample -> PiD Finalize -> Save Image
```
### Tiled upscale
```text
Load Image -> PiD Caption Creator -> PiD Upscale -> Save Image
```
## Example Workflows
Included in `example_workflows/`:
```text
pid_flux_complete.json
pid_flux2_complete.json
pid_flux2_klein_4b_complete.json
pid_flux2_klein_9b_complete.json
pid_qwenimage_complete.json
pid_qwenimage_2512_complete.json
pid_sd3_complete.json
pid_sdxl_complete.json
pid_zimage_complete.json
pid_zimage_turbo_complete.json
pid_image_to_image_2k_complete.json
pid_image_to_image_2kto4k_complete.json
pid_upscale_complete.json
```
## License
MIT