{"id":48577656,"url":"https://github.com/ericrollei/eric_qwen_edit_experiments","last_synced_at":"2026-04-08T16:03:15.650Z","repository":{"id":345384563,"uuid":"1184872882","full_name":"EricRollei/Eric_Qwen_Edit_Experiments","owner":"EricRollei","description":"Edit Images with Qwen at higher resolution (up to 17mp) and Generate up to 60mp","archived":false,"fork":false,"pushed_at":"2026-03-31T19:59:18.000Z","size":18180,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-03-31T21:38:55.830Z","etag":null,"topics":["comfy","comfyui-custom-node","custom-nodes","diffusers","firered","flowmatch","generative-ai","iamge-edit","image-fusion","qwen-image","qwen-image-edit-2511","spectrum","workflow"],"latest_commit_sha":null,"homepage":"https://historic.camera","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/EricRollei.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-18T02:31:52.000Z","updated_at":"2026-03-31T19:59:22.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/EricRollei/Eric_Qwen_Edit_Experiments","commit_stats":null,"previous_names":["ericrollei/eric_qwen_edit_experiments"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/EricRollei/Eric_Qwen_Edit_Experiments","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricRollei%2FEric_Qwen_Edit_Experiments","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricRollei%2FEric_Qwen_Edit_Experiments/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricRollei%2FEric_Qwen_Edit_Experiments/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricRollei%2FEric_Qwen_Edit_Experiments/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/EricRollei","download_url":"https://codeload.github.com/EricRollei/Eric_Qwen_Edit_Experiments/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/EricRollei%2FEric_Qwen_Edit_Experiments/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31562697,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["comfy","comfyui-custom-node","custom-nodes","diffusers","firered","flowmatch","generative-ai","iamge-edit","image-fusion","qwen-image","qwen-image-edit-2511","spectrum","workflow"],"created_at":"2026-04-08T16:03:14.661Z","updated_at":"2026-04-08T16:03:15.642Z","avatar_url":"https://github.com/EricRollei.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":" # Eric Qwen-Edit \u0026 Qwen-Image Nodes\n\n## 🖼️ Up to 17 MP image editing · 50 MP+ text-to-image generation\n\nComfyUI custom nodes for **Qwen-Image-Edit-2511** (image editing) and **Qwen-Image-2512** (text-to-image generation) — 20-billion-parameter MMDiT models by Qwen (Alibaba).  \n30 nodes covering loading, single-image editing, multi-image fusion, style transfer, inpainting, inpaint-with-transfer, LoRA, Spectrum acceleration, delta overlay, mask utilities, **text-to-image generation**, multi-stage generation, prompt rewriting, **2× VAE super-resolution upscaling**, **ControlNet-guided generation**, and **ControlNet inpainting** *(experimental)*.\n\n![8 MP image editing in just a few nodes](examples/FireRed11-8mp.png)\n*Edit images at up to 16 MP resolution — just a loader, LoRA, and edit node.*\n\n![Advanced Qwen-Edit worfflow up to 16mp](workflows\\Qwen-Edit-HiRes-Adv.png)\n*Advanced Qwen-Edit workflow - just drop into ComfyUI - I use a lot of my own custom nodes which are all available but you can use the basic workflow too - just check the workflows folder*\n\n## Features\n\n- **Text-to-image generation** — Generate images from text prompts using Qwen-Image-2512\n- **Preserves input resolution** — No forced upscaling to fill a pixel budget (edit nodes)\n- **Configurable max_mp cap** — Control maximum output size for VRAM safety\n- **Resolution presets** — Quick selection of common aspect ratios for generation\n- **VAE tiling** — Automatic high-resolution decode without OOM\n- **Supports up to 16 MP** — Edit or generate large images directly\n- **True CFG** — Two full transformer forward passes per step (conditional + unconditional)\n- **Dual conditioning paths** — VL path (~384 px semantic tokens via Qwen2.5-VL) + VAE/ref path (output-resolution pixel latents), individually controllable per image (edit nodes)\n- **Multi-stage generation** — Progressive upscale + re-denoise across up to 3 stages with per-stage control over steps, CFG, denoise, and sigma schedule\n- **UltraGen** — Quality-focused v2 multi-stage node with Qwen-Image-2512 best practices, per-stage seeds, sigma schedules, and upscale VAE integration\n- **ControlNet-guided generation** — UltraGen CN node with [InstantX/Qwen-Image-ControlNet-Union](https://huggingface.co/InstantX/Qwen-Image-ControlNet-Union) for Canny, SoftEdge, Depth, and Pose guided generation up to 50 MP+, with auto-scaling CN strength\n- **Spectrum acceleration** — Training-free CVPR 2026 Chebyshev feature forecaster for ~3–5× speedup (both edit and generation)\n- **Prompt rewriting** — Local or remote LLM-powered prompt enhancement via any OpenAI-compatible API (Ollama, LM Studio, DeepSeek, etc.)\n- **LoRA support** — Apply and unload LoRAs on both edit and generation pipelines with chainable weight control\n- **2× VAE super-resolution** — Optional [Wan2.1-VAE-upscale2x](https://huggingface.co/spacepxl/Wan2.1-VAE-upscale2x) integration for free 2× upscale during VAE decode, with inter-stage and final-decode modes\n- **Extended prompt token length** — Configurable `max_sequence_length` (up to 1024 tokens) in UltraGen for highly detailed prompts — not exposed by other Qwen-Image nodes or workflows\n- **Progress bars** — Native ComfyUI progress display during denoising on every generation/edit node\n\n## What Makes This Different\n\nMost ComfyUI Qwen nodes decompose the model into ComfyUI's generic UNET → Scheduler → Sampler graph. These nodes take a fundamentally different approach — running the **real Hugging Face diffusers pipeline** end-to-end, with targeted patches that unlock capabilities no other Qwen node set provides.\n\n### 1. Native-resolution editing up to 17 MP\n\nThe stock diffusers `QwenImageEditPlusPipeline` forces all outputs to ~1 MP regardless of input size — a 12 MP photo gets crushed to 1 MP and fine details are lost. This node set patches the pipeline to **preserve your input resolution** (aligned to 32 px) up to a configurable cap (default 16 MP, supports 17 MP). No other ComfyUI Qwen-Edit implementation does this.\n\n| Input | Stock Pipeline | Eric Qwen-Edit (max_mp=16) |\n|-------|----------------|----------------------------|\n| 2 MP  | 1 MP output    | 2 MP output                |\n| 6 MP  | 1 MP output    | 6 MP output                |\n| 20 MP | 1 MP output    | 16 MP output (capped)      |\n\n### 2. Real FlowMatch diffusers pipeline — not the UNET/KSampler abstraction\n\nComfyUI's native approach treats every diffusion model as a generic UNET with a separate sampler and scheduler. Users must manually add an **\"Aura Flow Shift\"** node and guess shift values. This loses model-specific details and produces inferior results.\n\nThese nodes call the **`FlowMatchEulerDiscreteScheduler`** pipeline directly, so every sigma shift, timestep, and conditioning step matches exactly what the model was trained with:\n\n| Aspect | ComfyUI native (UNET + KSampler) | Eric Qwen-Edit / Qwen-Image (diffusers) |\n|--------|-----------------------------------|------------------------------------------|\n| Sigma shifting | Manual — requires an extra \"Aura Flow Shift\" node with a user-chosen shift value | Automatic — `FlowMatchEulerDiscreteScheduler` with `use_dynamic_shifting` reads parameters from the model config |\n| Resolution-aware | No — fixed shift regardless of output size | Yes — time-shift μ is interpolated from the output resolution's latent sequence length |\n| Shift formula | `α·t / (1 + (α-1)·t)` with a single hand-tuned α | Exponential: `exp(μ) / (exp(μ) + (1/t - 1))` + terminal stretch, where μ adapts per resolution |\n| Dual conditioning | Lost — UNET abstraction has no concept of separate VL + VAE/ref paths | Preserved — VL path (~384 px semantic tokens via Qwen2.5-VL) + VAE/ref path (output-resolution pixel latents), individually controllable per image |\n| Configuration | User must wire shift nodes and pick values | Zero-config — parameters come from `scheduler_config.json` shipped with the model |\n\n**You do not need any extra shift nodes with these nodes.**\n\n### 3. Spectrum acceleration — training-free 3–5× speedup (CVPR 2026)\n\nImplements adaptive spectral feature forecasting from a CVPR 2026 paper. Instead of running all transformer blocks on every denoising step, Spectrum predicts outputs on skipped steps using Chebyshev polynomial regression with Newton forward-difference blending. The flexible-window schedule caches more aggressively in later steps where changes are smaller. Applies to both edit and generation nodes. No other ComfyUI node set ships this.\n\n### 4. Cross-architecture 2× upscale VAE — 50 MP+ generation\n\nExploits a discovery that the **Wan2.1** and **Qwen-Image** VAEs are architecturally identical (`AutoencoderKLWan` / `AutoencoderKLQwenImage`) and share the same latent space. This lets us use [spacepxl's Wan2.1-VAE-upscale2x](https://huggingface.co/spacepxl/Wan2.1-VAE-upscale2x) decoder on Qwen-Image latents for free 2× super-resolution during VAE decode. Four modes:\n\n| Mode | Effect |\n|------|--------|\n| `disabled` | Standard Qwen VAE decode |\n| `inter_stage` | Decode S2 at 2× via upscale VAE, re-encode for S3 input |\n| `final_decode` | Replace final VAE decode with 2× upscale VAE |\n| `both` | Inter-stage + final decode — stacks for **4× total** (50+ MP output) |\n\n### 5. Extended prompt token length (`max_sequence_length` up to 1024)\n\nThe `max_sequence_length` parameter is buried inside the diffusers pipeline and hardcoded to 512 everywhere else. UltraGen exposes it with range 128–1024. Padding positions are masked out via attention masks, so there's **zero quality penalty** for setting it higher — only a negligible compute increase (~8 MB VRAM). Long, detailed prompts (~200 words) that would be silently truncated at 512 tokens now reach the model in full.\n\n### 6. Built-in LLM prompt rewriting\n\nA dedicated node that calls any **OpenAI-compatible API** (Ollama, LM Studio, DeepSeek, OpenAI) to auto-expand terse prompts into rich ~200-word descriptions following Qwen's own recommended prompt methodology. API keys are loaded securely from environment variables or `api_keys.ini` — never stored in the workflow JSON. Includes language selection (English/Chinese), temperature control, custom instructions, and a passthrough toggle for A/B testing.\n\n### 7. Multi-stage progressive generation with per-stage control\n\nUp to 3 stages of progressive upscale → re-denoise, each with independent control over steps, CFG scale, denoise strength, sigma schedule (`linear` / `balanced` / `karras`), and seed mode (`same_all_stages` / `offset_per_stage` / `random_per_stage`). The UltraGen node combines all of this with tuned defaults that incorporate Qwen's official best practices — including the Chinese negative prompt that materially improves results.\n\n### 8. True CFG with norm-preserving guidance\n\nTwo full transformer forward passes per step (conditional + unconditional) for genuine classifier-free guidance — not the approximations that single-pass \"CFG-like\" implementations use. UltraGen uses norm-preserving CFG rescaling that makes high CFG values (8–10) safe at low resolution for locking in composition, with lower CFG (2–4) at higher resolution stages for refinement.\n\n### 9. Automatic VRAM management\n\nTransformer is automatically offloaded to CPU before upscale VAE decode at every exit point. Tiled decode is used for large images. The pipeline manages device placement so you don't have to wire manual offload nodes.\n\n### 10. Chainable LoRA with weight control\n\nApply multiple LoRAs in sequence with independent weight control (−2.0 to 2.0), and cleanly unload all LoRAs to restore the base model. Works on both edit and generation pipelines.\n\n## Installation\n\n### Option 1: ComfyUI Manager\n\nSearch for \"Eric Qwen-Edit\" in ComfyUI Manager.\n\n### Option 2: Manual\n\n```bash\ncd ComfyUI/custom_nodes\ngit clone https://github.com/EricRollei/Eric_Qwen_Edit_Experiments.git\n```\n\n## Requirements\n\n- **Edit Model**: Download Qwen-Image-Edit-2511 (recommended) or 2509\n  - https://huggingface.co/Qwen/Qwen-Image-Edit-2511\n- **Generation Model**: Download Qwen-Image-2512 (recommended) or Qwen-Image\n  - https://huggingface.co/Qwen/Qwen-Image-2512\n  - https://huggingface.co/Qwen/Qwen-Image\n- **Upscale VAE** *(optional)*: spacepxl/Wan2.1-VAE-upscale2x (~0.5 GB)\n  - https://huggingface.co/spacepxl/Wan2.1-VAE-upscale2x\n  - Only needed for the 2× VAE super-resolution feature in UltraGen\n- **ControlNet** *(optional)*: InstantX/Qwen-Image-ControlNet-Union (~2.3 GB)\n  - https://huggingface.co/InstantX/Qwen-Image-ControlNet-Union\n  - Canny, SoftEdge, Depth, Pose guided generation\n- **VRAM**:\n  - 24 GB for up to 2 MP\n  - 48 GB for up to 6 MP\n  - 96 GB for up to 16 MP\n\n---\n\n## Nodes\n\n### Eric Qwen-Edit Load Model\n\nLoads the Qwen-Image-Edit pipeline from a local directory.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `model_path` | STRING | — | Path to the Qwen-Image-Edit model directory |\n| `precision` | COMBO | `bf16` | Weight precision: bf16 (recommended), fp16, fp32 |\n| `device` | COMBO | `cuda` | Device: cuda, cuda:0, cuda:1, cpu |\n| `keep_in_vram` | BOOLEAN | `True` | Cache pipeline between runs to avoid reload |\n| `offload_vae` | BOOLEAN | `False` | Move VAE to CPU when not in use (saves ~1 GB) |\n| `attention_slicing` | BOOLEAN | `False` | Trade speed for lower peak VRAM |\n| `sequential_offload` | BOOLEAN | `False` | Extreme VRAM savings via sequential CPU offload |\n\n**Output:** `QWEN_EDIT_PIPELINE`\n\n---\n\n### Eric Qwen-Edit Component Loader\n\nAdvanced loader that lets you swap individual sub-models (transformer, VAE, or text encoder) from different directories. Useful for testing fine-tuned components without duplicating the full ~54 GB model.\n\n\u003e **Important — architecture constraints:** Every component must be architecture-compatible with Qwen-Image-Edit. The text encoder is **Qwen2.5-VL** (`Qwen2_5_VLForConditionalGeneration`), **not** CLIP. You cannot plug in a Stable Diffusion UNet, a standard CLIP model, or an unrelated VAE. You *can* use different fine-tuned or quantised versions of the same Qwen-Image-Edit components.\n\n\u003e **`base_pipeline_path` is always required**, even if you override all three components. The base path provides the scheduler config, tokenizer, and processor files that have no separate override.\n\n#### What the base path must contain\n\nThe minimum viable `base_pipeline_path` folder needs these files (the small config/tokenizer files, not the large weights):\n\n```\nbase_pipeline_path/\n├── model_index.json                 ← pipeline class mapping (required)\n├── scheduler/\n│   └── scheduler_config.json        ← FlowMatchEulerDiscreteScheduler config\n├── tokenizer/\n│   ├── vocab.json\n│   ├── merges.txt\n│   ├── tokenizer_config.json\n│   ├── added_tokens.json\n│   ├── special_tokens_map.json\n│   └── chat_template.jinja\n└── processor/\n    ├── tokenizer.json\n    ├── preprocessor_config.json\n    ├── video_preprocessor_config.json\n    ├── vocab.json\n    ├── merges.txt\n    ├── tokenizer_config.json\n    ├── added_tokens.json\n    ├── special_tokens_map.json\n    └── chat_template.jinja\n```\n\nIf you don't override a component, its weights are also loaded from the base path.\n\n#### Component folder structures\n\nEach override path must contain a `config.json` plus the weight files for that component:\n\n**Transformer** (~38 GB, `QwenImageTransformer2DModel` — 20B-parameter MMDiT):\n```\ntransformer_path/\n├── config.json\n├── diffusion_pytorch_model.safetensors.index.json\n├── diffusion_pytorch_model-00001-of-00005.safetensors\n├── diffusion_pytorch_model-00002-of-00005.safetensors\n├── diffusion_pytorch_model-00003-of-00005.safetensors\n├── diffusion_pytorch_model-00004-of-00005.safetensors\n└── diffusion_pytorch_model-00005-of-00005.safetensors\n```\nAlso accepts: a parent folder with a `transformer/` subfolder, or a single `.safetensors` file (loaded as state dict into the base architecture).\n\n**VAE** (~0.24 GB, `AutoencoderKLQwenImage`):\n```\nvae_path/\n├── config.json\n└── diffusion_pytorch_model.safetensors\n```\nAlso accepts a parent folder with a `vae/` subfolder.\n\n**Text Encoder** (~15.5 GB, `Qwen2_5_VLForConditionalGeneration` — Qwen2.5-VL 7B):\n```\ntext_encoder_path/\n├── config.json\n├── generation_config.json\n├── model.safetensors.index.json\n├── model-00001-of-00004.safetensors\n├── model-00002-of-00004.safetensors\n├── model-00003-of-00004.safetensors\n└── model-00004-of-00004.safetensors\n```\nAlso accepts a parent folder with a `text_encoder/` subfolder.\n\n#### Typical use cases\n\n| Scenario | What to set |\n|----------|-------------|\n| Fine-tuned transformer only | `base_pipeline_path` = full model, `transformer_path` = fine-tune dir |\n| Quantised text encoder | `base_pipeline_path` = full model, `text_encoder_path` = quantised dir |\n| Everything stock | Just use the standard **Load Model** node instead |\n\n#### Node parameters\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `base_pipeline_path` | STRING | — | Path to complete Qwen-Image-Edit model (always required — provides scheduler, tokenizer, processor, and defaults for unset components) |\n| `transformer_path` | STRING | *(empty)* | Optional override — transformer weights directory or single `.safetensors` file |\n| `vae_path` | STRING | *(empty)* | Optional override — VAE weights directory |\n| `text_encoder_path` | STRING | *(empty)* | Optional override — text encoder weights directory |\n| `precision` | COMBO | `bf16` | bf16, fp16, fp32 |\n| `device` | COMBO | `cuda` | cuda, cuda:0, cuda:1, cpu |\n| `keep_in_vram` | BOOLEAN | `True` | Cache between runs |\n| `offload_vae` | BOOLEAN | `False` | Offload VAE to CPU when idle |\n| `attention_slicing` | BOOLEAN | `False` | Attention slicing for lower VRAM |\n| `sequential_offload` | BOOLEAN | `False` | Sequential CPU offload |\n\n**Output:** `QWEN_EDIT_PIPELINE`\n\n\u003e **Note for ComfyUI users:** The standard ComfyUI \"Load Diffusion Model\" / \"Load CLIP\" / \"Load VAE\" nodes produce ComfyUI-internal model wrappers and **will not work** with these nodes. Qwen-Image-Edit requires the diffusers `from_pretrained` loading path, which is what both the Load Model and Component Loader nodes provide.\n\n---\n\n### Eric Qwen-Edit Unload\n\nFree VRAM by unloading the pipeline. Connect after the last generation node.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_EDIT_PIPELINE | *(optional)* | Pipeline to unload |\n| `images` | IMAGE | *(optional)* | Passthrough — connect to trigger unload after generation |\n\n**Output:** `status` (STRING)\n\n---\n\n### Eric Qwen-Edit Image\n\nEdit a single image using a text prompt.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_EDIT_PIPELINE | — | From any loader node |\n| `image` | IMAGE | — | Image to edit |\n| `prompt` | STRING | — | Describe the edit |\n| `negative_prompt` | STRING | *(empty)* | What to avoid |\n| `steps` | INT | `8` | Inference steps (8 for lightning LoRA, 50 for base model) |\n| `true_cfg_scale` | FLOAT | `4.0` | True CFG strength (1.0–20.0) |\n| `seed` | INT | `0` | Random seed |\n| `max_mp` | FLOAT | `8.0` | Maximum output megapixels (0.5–16.0) |\n\n**Output:** `IMAGE`\n\n---\n\n### Eric Qwen-Edit Inpaint\n\nInpaint masked regions of an image. The model has no native mask input — this node blanks the masked area, lets the model regenerate it, then composites the result back onto the original with feathered blending.\n\n**Strategy:** blank masked region → model sees hole and prompt → post-composite with Gaussian-feathered mask.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_EDIT_PIPELINE | — | From loader |\n| `image` | IMAGE | — | Image to inpaint |\n| `mask` | MASK | — | White = inpaint, black = keep |\n| `prompt` | STRING | — | Describe what to generate in masked area |\n| `mask_mode` | COMBO | `blank_white` | How to blank the mask: blank_white, blank_gray, color_overlay |\n| `feather` | INT | `8` | Gaussian blur radius for mask edge blending |\n| `negative_prompt` | STRING | *(empty)* | What to avoid |\n| `steps` | INT | `8` | Inference steps |\n| `true_cfg_scale` | FLOAT | `4.0` | True CFG strength |\n| `seed` | INT | `0` | Random seed |\n| `max_mp` | FLOAT | `8.0` | Maximum output megapixels |\n\n**Output:** `IMAGE`\n\n---\n\n### Eric Qwen-Edit Inpaint Transfer\n\nTransfer content from a reference image into the masked region of the original. Combines pre-compositing, model harmonisation, and post-compositing for seamless results.\n\n**Strategy:**\n1. Scale the transfer image (+ optional transfer mask) proportionally so the source region fits inside the target mask bounding box\n2. Pre-composite the transfer into the masked area — model sees content already in place\n3. Model harmonises lighting, color, and edges via the prompt\n4. Post-composite with feathered mask to preserve the original outside the mask\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_EDIT_PIPELINE | — | From loader |\n| `image` | IMAGE | — | Original image (target) |\n| `mask` | MASK | — | Target region (white = where to place transfer) |\n| `transfer_image` | IMAGE | — | Reference image containing the content to transfer |\n| `prompt` | STRING | — | Describe what you want (e.g. \"harmonise the pasted element with its surroundings\") |\n| `transfer_mask` | MASK | *(optional)* | Mark which part of the transfer image to use (white = keep). When provided, both masks' bounding boxes are used for proportional scaling. |\n| `transfer_vl_ref` | BOOLEAN | `True` | Also send full transfer image as a VL semantic reference |\n| `blend_strength` | FLOAT | `1.0` | Pre-composite alpha (0.0–1.0) |\n| `feather` | INT | `8` | Gaussian blur radius for post-composite blending |\n| `negative_prompt` | STRING | *(empty)* | What to avoid |\n| `steps` | INT | `8` | Inference steps |\n| `true_cfg_scale` | FLOAT | `4.0` | True CFG strength |\n| `seed` | INT | `0` | Random seed |\n| `max_mp` | FLOAT | `8.0` | Maximum output megapixels |\n\n**Output:** `IMAGE`\n\n---\n\n### Eric Qwen-Edit Multi-Image Fusion\n\nCombine 2–4 images with composition modes and per-image conditioning control over both the VL (semantic) and VAE/ref (pixel) paths.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_EDIT_PIPELINE | — | From loader |\n| `image_1` – `image_4` | IMAGE | — (2 required) | Input images (image_3, image_4 optional) |\n| `prompt` | STRING | — | Describe the desired composition |\n| `composition_mode` | COMBO | `group` | group / scene / merge / raw |\n| `subject_label` | STRING | *(empty)* | Optional label for subject identification |\n| `main_image` | COMBO | `image_1` | Which image seeds the output resolution and denoising |\n| `vae_target_size` | INT | `0` | VAE encoding resolution for ref images (0 = match output) |\n| `vl_1` – `vl_4` | BOOLEAN | `True` | Include each image in the VL semantic path |\n| `ref_1` | BOOLEAN | `True` | Include image_1 in the VAE/ref pixel path |\n| `ref_2` – `ref_4` | BOOLEAN | `False` | Include secondary images in VAE/ref path (default off — VL-only) |\n| `negative_prompt` | STRING | *(empty)* | What to avoid |\n| `steps` | INT | `8` | Inference steps |\n| `true_cfg_scale` | FLOAT | `4.0` | True CFG strength |\n| `seed` | INT | `0` | Random seed |\n| `max_mp` | FLOAT | `8.0` | Maximum output megapixels |\n\n**Output:** `IMAGE`\n\n---\n\n### Eric Qwen-Edit Style Transfer\n\nApply the visual style of one image to the content of another, with fine-grained control over which aspects of style are transferred.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_EDIT_PIPELINE | — | From loader |\n| `style_image` | IMAGE | — | Reference providing the aesthetic |\n| `content_image` | IMAGE | — | Image to restyle |\n| `style_mode` | COMBO | `full_style` | full_style / color_palette / lighting / artistic_medium / texture / custom |\n| `custom_prompt` | STRING | *(empty)* | When non-empty, always overrides the style_mode template |\n| `additional_guidance` | STRING | *(empty)* | Extra instructions appended to the auto-generated prompt |\n| `style_strength` | FLOAT | `1.0` | Scales CFG for stronger/weaker style (0.1–3.0) |\n| `vae_target_size` | INT | `1024` | VAE encoding resolution for style image |\n| `vl_style` | BOOLEAN | `True` | Style image in VL semantic path |\n| `vl_content` | BOOLEAN | `True` | Content image in VL semantic path |\n| `ref_style` | BOOLEAN | `False` | Style image in VAE/ref pixel path (off by default — avoids pixel bleed) |\n| `ref_content` | BOOLEAN | `True` | Content image in VAE/ref pixel path |\n| `negative_prompt` | STRING | *(empty)* | What to avoid |\n| `steps` | INT | `8` | Inference steps |\n| `true_cfg_scale` | FLOAT | `4.0` | True CFG strength |\n| `seed` | INT | `0` | Random seed |\n| `max_mp` | FLOAT | `8.0` | Maximum output megapixels |\n\n**Output:** `IMAGE`\n\n---\n\n### Eric Qwen-Edit Spectrum Accelerator\n\nTraining-free diffusion acceleration based on the **Spectrum** method (CVPR 2026). Uses Chebyshev polynomial feature forecasting to skip redundant transformer forward passes, achieving ~3–5× speedup with minimal quality loss.\n\nAttach this node between the loader and any generation node. The config is stored on the pipeline and takes effect during the next denoising run. Automatically disabled when total steps \u003c `min_steps`.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_EDIT_PIPELINE | — | From loader |\n| `enable` | BOOLEAN | `True` | Toggle acceleration on/off |\n| `warmup_steps` | INT | `3` | Full-compute warm-up steps before forecasting begins |\n| `window_size` | INT | `2` | History window for Chebyshev polynomial fitting |\n| `flex_window` | FLOAT | `0.75` | Fraction of remaining steps to recompute vs. forecast (0.0–1.0) |\n| `w` | FLOAT | `0.5` | Blend weight between forecast and previous features |\n| `lam` | FLOAT | `0.1` | Regularisation coefficient for the forecaster |\n| `M` | INT | `4` | Chebyshev polynomial degree |\n| `min_steps` | INT | `15` | Spectrum auto-disables below this step count |\n\n**Output:** `QWEN_EDIT_PIPELINE` (same pipeline with spectrum config attached)\n\n---\n\n### Eric Qwen-Edit LoRA\n\nLoad a LoRA adapter into the pipeline. Use the lightning LoRA for 8-step inference.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_EDIT_PIPELINE | — | From loader |\n| `lora_name` | COMBO | — | Dropdown of `.safetensors` files in `ComfyUI/models/loras/` |\n| `weight` | FLOAT | `1.0` | LoRA scale (0.0–2.0) |\n| `lora_path_override` | STRING | *(empty, optional)* | Full path to a LoRA file outside the standard loras folder |\n\n**Output:** `QWEN_EDIT_PIPELINE`\n\n---\n\n### Eric Qwen-Edit Unload LoRA\n\nRemove all LoRA adapters from the pipeline, restoring base weights.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_EDIT_PIPELINE | — | Pipeline with LoRA loaded |\n\n**Output:** `QWEN_EDIT_PIPELINE`\n\n---\n\n### Eric Qwen-Edit Delta Overlay\n\nCompare an edited image with the original, extract a change mask, and composite the edit onto the original only where changes occurred. Useful for upscaling an edit at full resolution and applying it precisely.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `original_image` | IMAGE | — | Original (before edit) |\n| `edited_image` | IMAGE | — | Edited (after edit) — may be a different resolution |\n| `threshold` | FLOAT | `0.05` | Minimum per-pixel difference to count as a change (0.0–1.0) |\n| `blur_radius` | INT | `5` | Gaussian blur on the change mask for softer edges |\n| `expand_mask` | INT | `3` | Dilate the mask by this many pixels |\n| `upscale_method` | COMBO | `lanczos` | Resampling method when resizing: lanczos, bicubic, bilinear, nearest |\n| `input_mask` | MASK | *(optional)* | If provided, intersected with the auto-detected change mask |\n\n**Outputs:**\n| Name | Type | Description |\n|------|------|-------------|\n| `composite` | IMAGE | Original with edit applied only where changes were detected |\n| `change_mask` | MASK | Binary mask of detected changes |\n| `upscaled_edit` | IMAGE | Edited image resized to match original resolution |\n\n---\n\n### Eric Qwen-Edit Apply Mask\n\nSimple mask-based compositing utility. Blends a foreground and background image using a mask.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `foreground` | IMAGE | — | Image shown in white areas of the mask |\n| `background` | IMAGE | — | Image shown in black areas of the mask |\n| `mask` | MASK | — | Blend mask: white = foreground, black = background |\n| `blur_mask` | INT | `0` | *(optional)* Additional Gaussian blur on the mask (0–50) |\n\n**Output:** `IMAGE`\n\n---\n\n## Qwen-Image Generation Nodes\n\nThese nodes use **Qwen-Image / Qwen-Image-2512** for text-to-image generation. They share the same 20B MMDiT transformer and VAE architecture as the edit model, but take only text input — no source image required.\n\n\u003e Generation nodes use a **separate pipeline type** (`QWEN_IMAGE_PIPELINE`) that is not interchangeable with the edit pipeline (`QWEN_EDIT_PIPELINE`). You need separate loader nodes for each.\n\n### Eric Qwen-Image Load Model\n\nLoads the Qwen-Image-2512 (or Qwen-Image) text-to-image pipeline.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `model_path` | STRING | — | Path to the Qwen-Image model directory |\n| `precision` | COMBO | `bf16` | Weight precision: bf16 (recommended), fp16, fp32 |\n| `device` | COMBO | `cuda` | Device: cuda, cuda:0, cuda:1, cpu |\n| `keep_in_vram` | BOOLEAN | `True` | Cache pipeline between runs |\n| `offload_vae` | BOOLEAN | `False` | Move VAE to CPU when not in use |\n| `attention_slicing` | BOOLEAN | `False` | Trade speed for lower peak VRAM |\n| `sequential_offload` | BOOLEAN | `False` | Sequential CPU offload for extreme VRAM savings |\n\n**Output:** `QWEN_IMAGE_PIPELINE`\n\n---\n\n### Eric Qwen-Image Component Loader\n\nAdvanced loader that lets you swap individual sub-models (transformer, VAE, or text encoder) from different directories.\n\n\u003e The generation pipeline has **no processor** component (unlike the edit pipeline). The base path must provide `model_index.json`, `scheduler/`, and `tokenizer/`.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `base_pipeline_path` | STRING | — | Path to complete Qwen-Image model (always required) |\n| `transformer_path` | STRING | *(empty)* | Optional override — transformer weights directory or `.safetensors` |\n| `vae_path` | STRING | *(empty)* | Optional override — VAE weights directory |\n| `text_encoder_path` | STRING | *(empty)* | Optional override — text encoder weights directory |\n| `precision` | COMBO | `bf16` | bf16, fp16, fp32 |\n| `device` | COMBO | `cuda` | cuda, cuda:0, cuda:1, cpu |\n| `keep_in_vram` | BOOLEAN | `True` | Cache between runs |\n| `offload_vae` | BOOLEAN | `False` | Offload VAE to CPU when idle |\n| `attention_slicing` | BOOLEAN | `False` | Attention slicing for lower VRAM |\n| `sequential_offload` | BOOLEAN | `False` | Sequential CPU offload |\n\n**Output:** `QWEN_IMAGE_PIPELINE`\n\n---\n\n### Eric Qwen-Image Generate\n\nGenerate images from text prompts. Choose a resolution preset or set custom dimensions.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_IMAGE_PIPELINE | — | From any generation loader node |\n| `prompt` | STRING | — | Describe the image to generate |\n| `negative_prompt` | STRING | *(empty)* | What to avoid |\n| `resolution` | COMBO | `1024×1024 (1:1)` | Resolution preset (9 common aspect ratios, or \"custom\") |\n| `width` | INT | `1024` | Custom width — only used when resolution = \"custom\" |\n| `height` | INT | `1024` | Custom height — only used when resolution = \"custom\" |\n| `steps` | INT | `50` | Inference steps |\n| `true_cfg_scale` | FLOAT | `4.0` | True CFG strength (\u003e1 enables dual forward passes) |\n| `seed` | INT | `0` | Random seed (0 = random) |\n| `max_mp` | FLOAT | `1.0` | Maximum output megapixels |\n\n**Resolution presets available:**\n`1024×1024 (1:1)`, `1152×896 (9:7)`, `896×1152 (7:9)`, `1216×832 (19:13)`, `832×1216 (13:19)`, `1344×768 (7:4)`, `768×1344 (4:7)`, `1536×640 (12:5)`, `640×1536 (5:12)`, `custom`\n\n**Output:** `IMAGE`\n\n---\n\n### Eric Qwen-Image Unload\n\nFree VRAM by unloading the generation pipeline.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_IMAGE_PIPELINE | *(optional)* | Pipeline to unload |\n| `images` | IMAGE | *(optional)* | Passthrough — connect to trigger unload after generation |\n\n**Output:** `status` (STRING)\n\n---\n\n### Eric Qwen-Image Apply LoRA\n\nApply a LoRA to the Qwen-Image generation pipeline. Loads LoRA weights onto the transformer. Multiple Apply LoRA nodes can be chained to stack several LoRAs with different weights. LoRAs are loaded from `ComfyUI/models/loras/`.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_IMAGE_PIPELINE | — | From any generation loader node |\n| `lora_name` | COMBO | — | Select LoRA from `ComfyUI/models/loras/` |\n| `weight` | FLOAT | `1.0` | LoRA weight strength (−2.0 to 2.0, step 0.05). 1.0 = full, 0.5 = half |\n| `lora_path_override` | STRING | *(empty)* | Optional: custom path override (leave empty to use dropdown) |\n\n**Output:** `QWEN_IMAGE_PIPELINE`\n\n---\n\n### Eric Qwen-Image Unload LoRA\n\nUnload all LoRAs from the Qwen-Image generation pipeline. Use to reset the model to its base state before applying different LoRAs, or to free memory.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_IMAGE_PIPELINE | — | Pipeline with LoRAs to unload |\n\n**Output:** `QWEN_IMAGE_PIPELINE`\n\n---\n\n### Eric Qwen-Image Multi-Stage Generate\n\nProgressive multi-stage text-to-image generation with full per-stage control. Up to 3 stages with independent steps, CFG, resolution, and denoise settings. Latents are upscaled between stages via bislerp and re-noised according to the per-stage denoise strength before re-sampling.\n\n- Set `upscale_to_stage2 = 0` → output Stage 1 only (single-stage).\n- Set `upscale_to_stage3 = 0` → stop after Stage 2 (two-stage).\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_IMAGE_PIPELINE | — | From any generation loader node |\n| `prompt` | STRING | — | Describe the image you want to generate |\n| `negative_prompt` | STRING | *(empty)* | What to avoid in the output |\n| `aspect_ratio` | COMBO | `1:1 Square` | Aspect ratio applied at every stage |\n| `seed` | INT | `0` | Random seed (0 = random) |\n| **Stage 1** | | | |\n| `s1_mp` | FLOAT | `0.5` | Stage 1 resolution in megapixels (0.3–2.0) |\n| `s1_steps` | INT | `15` | Stage 1 inference steps (txt2img from noise) |\n| `s1_cfg` | FLOAT | `8.0` | Stage 1 true CFG scale |\n| **Stage 2** | | | |\n| `upscale_to_stage2` | FLOAT | `2.0` | Upscale factor (area) S1→S2. 0 = skip S2 \u0026 S3, output S1 |\n| `s2_steps` | INT | `20` | Stage 2 inference steps |\n| `s2_cfg` | FLOAT | `4.0` | Stage 2 true CFG scale |\n| `s2_denoise` | FLOAT | `1.0` | Stage 2 denoise (1.0 = full, lower preserves prior detail) |\n| **Stage 3** | | | |\n| `upscale_to_stage3` | FLOAT | `2.0` | Upscale factor (area) S2→S3. 0 = skip S3, output S2 |\n| `s3_steps` | INT | `15` | Stage 3 inference steps |\n| `s3_cfg` | FLOAT | `2.0` | Stage 3 true CFG scale |\n| `s3_denoise` | FLOAT | `1.0` | Stage 3 denoise |\n\n**Output:** `IMAGE`\n\n---\n\n### Eric Qwen-Image UltraGen\n\nQuality-focused multi-stage text-to-image generation (v2). Incorporates all Qwen-Image-2512 best practices: official Chinese negative prompt as default, `max_sequence_length` up to 1024 for detailed prompts, Spectrum acceleration on Stage 1, tuned defaults (0.5 MP s1 → 4× upscale → 26-step s2 refinement), per-stage seed modes, sigma schedule selection, and optional upscale VAE for 2× super-resolution decode.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_IMAGE_PIPELINE | — | From any generation loader node |\n| `prompt` | STRING | — | Describe the image. For best results ~200 words. Connect Prompt Rewriter to auto-enhance. |\n| `negative_prompt` | STRING | *(official Chinese default)* | Official Qwen-Image-2512 negative prompt |\n| `aspect_ratio` | COMBO | `1:1 Square` | Aspect ratio applied at every stage |\n| `seed` | INT | `0` | Random seed (0 = random) |\n| `seed_mode` | COMBO | `offset_per_stage` | `same_all_stages`, `offset_per_stage` (S2=seed+1, S3=seed+2), or `random_per_stage` |\n| `max_sequence_length` | INT | `1024` | Max prompt token length (128–1024, step 64). Full capacity by default. |\n| **Stage 1** | | | |\n| `s1_mp` | FLOAT | `0.5` | Stage 1 resolution in megapixels |\n| `s1_steps` | INT | `15` | Stage 1 inference steps |\n| `s1_cfg` | FLOAT | `10.0` | Stage 1 true CFG. High CFG at low res locks in composition. |\n| **Stage 2** | | | |\n| `upscale_to_stage2` | FLOAT | `4.0` | Upscale factor (area) S1→S2. 0 = skip S2 \u0026 S3. |\n| `s2_steps` | INT | `26` | Stage 2 inference steps (main refinement) |\n| `s2_cfg` | FLOAT | `4.0` | Stage 2 true CFG (matches official recommendation) |\n| `s2_denoise` | FLOAT | `0.85` | Stage 2 denoise |\n| `s2_sigma_schedule` | COMBO | `linear` | `linear`, `balanced` (Karras ρ=3), or `karras` (Karras ρ=7) |\n| **Stage 3** | | | |\n| `upscale_to_stage3` | FLOAT | `2.0` | Upscale factor (area) S2→S3. 0 = disabled. |\n| `s3_steps` | INT | `18` | Stage 3 inference steps |\n| `s3_cfg` | FLOAT | `2.0` | Stage 3 true CFG |\n| `s3_denoise` | FLOAT | `0.45` | Stage 3 denoise (0.3–0.5 recommended for final polish) |\n| `s3_sigma_schedule` | COMBO | `karras` | Sigma schedule for S3 (karras recommended for fine micro-texture) |\n| **Upscale VAE** | | | |\n| `upscale_vae` | UPSCALE_VAE | *(optional)* | From Eric Qwen Upscale VAE Loader |\n| `upscale_vae_mode` | COMBO | `both` | `disabled`, `inter_stage`, `final_decode`, or `both` (see Upscale VAE section below) |\n\n**Output:** `IMAGE`\n\n---\n\n### Eric Qwen-Image ControlNet Loader\n\nLoads an InstantX Qwen-Image ControlNet model. Supports both the **Union** model (Canny, SoftEdge, Depth, Pose) and the **Inpainting** model. The model is kept on CPU and moved to GPU automatically when called by UltraGen CN or UltraGen Inpaint CN.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `model_path` | STRING | `InstantX/Qwen-Image-ControlNet-Union` | HuggingFace model ID or local path |\n| `dtype` | COMBO | `bfloat16` | Model precision: bfloat16, float16, float32 |\n\n**Output:** `QWEN_IMAGE_CONTROLNET`\n\n\u003e **Models:**\n\u003e - [InstantX/Qwen-Image-ControlNet-Union](https://huggingface.co/InstantX/Qwen-Image-ControlNet-Union) — Canny, SoftEdge, Depth, Pose (recommended for generation)\n\u003e - [InstantX/Qwen-Image-ControlNet-Inpainting](https://huggingface.co/InstantX/Qwen-Image-ControlNet-Inpainting) — Mask-based inpainting (experimental, see below)\n\n---\n\n### Eric Qwen-Image UltraGen CN\n\nControlNet-guided multi-stage text-to-image generation. Same architecture as UltraGen but uses the [InstantX/Qwen-Image-ControlNet-Union](https://huggingface.co/InstantX/Qwen-Image-ControlNet-Union) model on Stage 1 (and optionally Stage 2) to guide composition and structure from a control image. Supports Canny edge maps, SoftEdge/HED, depth maps, and OpenPose skeletons. Output up to **50 MP+** with upscale VAE.\n\nIncludes ControlNet auto-scaling that calibrates CN signal magnitude to match the transformer's hidden states, so the same `cn_target_strength` value works across different fine-tuned transformers without manual scale hunting.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_IMAGE_PIPELINE | — | From any generation loader node |\n| `controlnet` | QWEN_IMAGE_CONTROLNET | — | From the ControlNet Loader |\n| `control_image` | IMAGE | — | Control image (Canny, depth, pose, or soft edge map) |\n| `cn_type` | COMBO | `canny` | ControlNet type: `canny`, `soft_edge`, `depth`, `pose` |\n| `prompt` | STRING | — | Describe the image |\n| `negative_prompt` | STRING | *(official default)* | Negative prompt |\n| **ControlNet** | | | |\n| `cn_auto_scale` | BOOLEAN | `True` | Auto-calibrate CN strength to transformer magnitude |\n| `cn_target_strength` | FLOAT | `1.0` | CN influence (1.0 = standard, higher = stronger guidance) |\n| `controlnet_conditioning_scale` | FLOAT | `1.0` | Manual CN scale (when auto-scale OFF) |\n| `control_guidance_start` | FLOAT | `0.0` | When CN guidance begins (fraction of steps) |\n| `control_guidance_end` | FLOAT | `1.0` | When CN guidance ends |\n| **S2 ControlNet** | | | |\n| `s2_cn_scale` | FLOAT | `1.0` | CN strength on Stage 2 (0 = disable CN for S2) |\n| `s2_cn_start` | FLOAT | `0.0` | S2 CN guidance start |\n| `s2_cn_end` | FLOAT | `1.0` | S2 CN guidance end |\n| **Stages** | | | *(Same stage parameters as UltraGen — s1_mp, s1_steps, s1_cfg, upscale_to_stage2, s2_steps, etc.)* |\n| **Upscale VAE** | | | *(Same upscale VAE parameters as UltraGen)* |\n\n**Output:** `IMAGE`\n\n---\n\n### Eric Qwen-Image Spectrum Accelerator\n\nTraining-free diffusion sampling speedup using adaptive spectral feature forecasting (CVPR 2026). Predicts transformer outputs on skipped steps via Chebyshev polynomial regression instead of running all transformer blocks. Best for ≥20 inference steps and true CFG runs (2× transformer passes per step → double the savings). Wire between the Image Loader and any generation node.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `pipeline` | QWEN_IMAGE_PIPELINE | — | Pipeline to accelerate |\n| `enable` | BOOLEAN | `True` | Enable/disable Spectrum acceleration |\n| `warmup_steps` | INT | `3` | Initial denoising steps that always run the full transformer (2–4 recommended) |\n| `window_size` | INT | `2` | Base period between actual transformer evaluations. 2 = every other step cached. |\n| `flex_window` | FLOAT | `0.75` | Window growth rate. Later steps change less, so larger windows are safe. 0 = fixed window. |\n| `w` | FLOAT | `0.5` | Blend between Chebyshev predictor (1.0) and Newton forward-difference predictor (0.0) |\n| `lam` | FLOAT | `0.1` | Ridge regularization for Chebyshev regression. Higher = smoother predictions. |\n| `M` | INT | `4` | Chebyshev polynomial degree (1–8). Higher captures complex trajectories but risks overfitting. |\n| `min_steps` | INT | `15` | Auto-disable when `num_inference_steps` \u003c this (low step counts don't benefit) |\n\n**Output:** `QWEN_IMAGE_PIPELINE`\n\n---\n\n### Eric Qwen Prompt Rewriter\n\nEnhance image prompts using a local or remote LLM. Rewrites terse prompts into rich ~200-word descriptions following Qwen-Image-2512 recommended methodology. Connects to any OpenAI-compatible API (Ollama, LM Studio, DeepSeek, OpenAI, etc.). API keys are loaded securely from environment variables or `api_keys.ini` — never stored in the workflow file. Output connects to the prompt input of any generation node.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `prompt` | STRING | — | Original image description to enhance |\n| `api_url` | STRING | `http://localhost:11434/v1` | OpenAI-compatible API base URL |\n| `model` | STRING | `qwen3:8b` | Model name on the API server |\n| `language` | COMBO | `English` | Language for the rewritten prompt (`English` or `Chinese`) |\n| `temperature` | FLOAT | `0.7` | LLM temperature — lower = more faithful, higher = more creative |\n| `max_tokens` | INT | `2048` | Max tokens for LLM response |\n| `custom_instructions` | STRING | *(empty)* | Additional instructions appended to the system prompt |\n| `lora_triggers` | STRING | *(empty)* | LoRA trigger words/phrases, one per line or comma-separated |\n| `trigger_mode` | COMBO | `off` | How to apply trigger words: `incorporate`, `prepend`, `append`, or `off` |\n| `passthrough` | BOOLEAN | `False` | Skip rewriting and pass prompt through unchanged (for A/B testing) |\n\n**Output:** `enhanced_prompt` (STRING)\n\n#### LoRA Trigger Words\n\nMany LoRAs require specific trigger words or phrases in the prompt to activate their trained style or concept. The `lora_triggers` and `trigger_mode` inputs let you inject these automatically:\n\n| Mode | Behavior |\n|------|----------|\n| `off` | Trigger words are ignored |\n| `incorporate` | The LLM is instructed to weave the trigger words **verbatim** into the rewritten prompt naturally. Falls back to `prepend` when `passthrough` is enabled (no LLM call). |\n| `prepend` | Trigger words are prepended to the prompt (before the rewritten text). Works even in passthrough mode. |\n| `append` | Trigger words are appended to the prompt (after the rewritten text). Works even in passthrough mode. |\n\n**Usage:** Enter one trigger per line, or separate with commas. For example:\n```\nohwx woman\ncinematic lighting\nfilm grain\n```\n\nWhen using `incorporate` mode, the LLM receives an additional system instruction requiring the trigger words to appear verbatim in the output, so they blend naturally into the description rather than being awkwardly tacked on.\n\n---\n\n## ⚠️ Experimental: ControlNet Inpainting Nodes\n\n\u003e **Motivation:** Qwen-Image-Edit redraws the *entire* image on every edit, which progressively degrades areas outside the edit region — fine details, textures, and sharpness are lost across the whole canvas. A true inpainting pipeline would regenerate *only* the masked region while leaving the rest of the image completely untouched, preserving full original quality. That is the goal of these ControlNet inpainting nodes.\n\u003e\n\u003e **Status: Experimental — not fully working.** These nodes are functional but produce visible halos and ghosting artifacts from double-sampling at mask boundaries. The multi-stage pipeline generates the full image from noise while the ControlNet conditions on the masked source, but compositing the result back onto the original creates noticeable seams that the harmonization pass has not yet fully resolved. We believe techniques from the Qwen-Edit inpaint nodes (which use a fundamentally different conditioning approach) may help, but this has not been explored yet.\n\u003e\n\u003e **Alternative:** The **Eric Qwen-Edit Inpaint** node provides a separate experimental approach to masked inpainting using the Qwen-Image-Edit model. It blanks out the masked region before sending the image through both the VL and VAE encoders, then composites the generated output back onto the original using the mask with edge feathering. This approach currently produces better results than the ControlNet inpainting nodes, though it still relies on Qwen-Edit (which reprocesses the full image internally) and is itself experimental.\n\u003e\n\u003e These nodes are included for experimentation.\n\n### Eric Qwen-Image UltraGen Inpaint CN\n\nControlNet-guided multi-stage inpainting and outpainting using the [InstantX/Qwen-Image-ControlNet-Inpainting](https://huggingface.co/InstantX/Qwen-Image-ControlNet-Inpainting) model. Uses `QwenImageControlNetInpaintPipeline` with 17-channel conditioning (16ch VAE-encoded masked image + 1ch mask). Supports object replacement, background replacement, text modification, and outpainting.\n\n**Architecture:** Up to 3 stages — S1 (CN draft), S2A (CN refine + dilated mask), S2B (whole-image harmonize, no CN), S3 (polish upscale, no CN). Smart stage selection (`auto_stages`) skips S1 when input is already large enough. Final feathered composite preserves original pixels outside the mask.\n\n**Known issues:**\n- Halo artifacts at mask boundaries due to double-sampling\n- Ghosting where generated content overlaps original pixels\n- Harmonization pass (S2B) reduces but does not eliminate boundary artifacts\n\n### Eric Qwen Inpaint Prompt Rewriter\n\nVLM-powered prompt rewriter for inpainting. Analyzes the source image and mask to generate short, change-focused prompts (40–80 words) describing the desired edit. Uses mask outline overlay for spatial awareness.\n\n### Eric Qwen ControlNet Prompt Rewriter\n\nVLM-powered prompt rewriter for ControlNet-guided generation. Generates full scene descriptions (200–400 words) with CN-type awareness (Canny, SoftEdge, Depth, Pose). Outputs both the prompt and a `cn_type_index` integer.\n\n---\n\n## 2× Upscale VAE (Super-Resolution Decode)\n\nThe **Wan2.1-VAE-upscale2x** by [spacepxl](https://huggingface.co/spacepxl) is a decoder-only finetune of the Wan2.1 VAE that outputs 12 channels instead of 3. After decode, `pixel_shuffle(12→3, 2×)` produces a **2× upscaled image** — effectively free super-resolution during VAE decode with no extra diffusion steps.\n\nThe Wan2.1 and Qwen-Image VAEs are architecturally identical (`AutoencoderKLWan` / `AutoencoderKLQwenImage`) and share the same latent space, so the upscale VAE works directly with Qwen-Image latents.\n\n### How it works\n\n1. **Load the upscale VAE** with the **Eric Qwen Upscale VAE Loader** node\n2. **Connect it** to the `upscale_vae` input on the **Eric Qwen-Image UltraGen** node\n3. **Choose a mode** via the `upscale_vae_mode` dropdown\n\n### Upscale VAE Modes\n\n| Mode | Description |\n|------|-------------|\n| `disabled` | Upscale VAE ignored even if connected (safe default) |\n| `inter_stage` | Decode S2 latents at 2× via the upscale VAE, re-encode back to latents, and feed the 2× canvas to S3. Replaces the bislerp inter-stage upscale with a higher-quality decode→2×→re-encode round trip. Requires 3 active stages. |\n| `final_decode` | Replace the final stage's normal VAE decode with the 2× upscale decode. The output image is 2× the resolution of the final denoising stage. |\n| `both` | Inter-stage S2→S3 **and** 2× final decode. These stack: S3 runs on a 2× canvas from inter-stage, then the output gets another 2× from final decode = **4× total** vs. S2. |\n\n### VRAM Management\n\nThe upscale VAE is kept on CPU until needed. Before decode, the diffusion transformer is automatically offloaded to CPU to free VRAM. For large images, **tiled VAE decoding** is automatically enabled when latent spatial dimensions exceed 128 (roughly ≥1024 px per side before the 2× upscale).\n\n### Typical workflow\n\n```\n[Qwen-Image Loader] → [LoRA] → [Spectrum] → [Upscale VAE Loader] → [UltraGen]\n                                                     ↑                    ↑\n                                              upscale_vae ──────── upscale_vae\n                                                              upscale_vae_mode = final_decode\n```\n\n---\n\n### Eric Qwen Upscale VAE Loader\n\nLoad the Wan2.1 2× upscale VAE. The model is kept on CPU until decode is requested.\n\n| Parameter | Type | Default | Description |\n|-----------|------|---------|-------------|\n| `model_path` | STRING | `spacepxl/Wan2.1-VAE-upscale2x` | HuggingFace model ID or local path |\n| `subfolder` | STRING | `diffusers/Wan2.1_VAE_upscale2x_imageonly_real_v1` | Subfolder within the repo containing config.json + weights. Leave blank if model_path already points to the correct directory. |\n| `dtype` | COMBO | `bfloat16` | Model precision: bfloat16 (recommended), float16, float32 |\n\n**Output:** `UPSCALE_VAE`\n\nThe first run downloads the model from HuggingFace (~0.5 GB). Subsequent runs load from the local HuggingFace cache. You can also download the model manually and point `model_path` to the local directory.\n\n\u003e **Model source:** [spacepxl/Wan2.1-VAE-upscale2x](https://huggingface.co/spacepxl/Wan2.1-VAE-upscale2x) — a decoder-only finetune of the Wan2.1 VAE by spacepxl. The specific subfolder used is `diffusers/Wan2.1_VAE_upscale2x_imageonly_real_v1` (image-only variant, trained on real images).\n\n---\n\n## Architecture Notes\n\nQwen-Image-Edit-2511 uses a **dual conditioning path**:\n\n1. **VL path** — Each input image is processed by the built-in Qwen2.5-VL vision-language encoder at ~384 px to produce semantic token embeddings. These tell the model *what* is in each image.\n2. **VAE/ref path** — Each input image is VAE-encoded at output resolution to produce pixel-level latents. These tell the model *how* to render the pixels.\n\nMost multi-image nodes expose per-image `vl_*` and `ref_*` toggles so you can control which path each image participates in. For example, in Style Transfer, the style image defaults to VL-only (semantic style cues) while the content image defaults to both VL + ref (preserving pixel structure).\n\n## Example Workflows\n\nAll workflow PNGs below have the full ComfyUI workflow embedded — drag them directly into ComfyUI to load.\n\n### Qwen-Edit Hi-Res — Simple\n\nA minimal editing workflow: loader → LoRA → edit node → output. Quick to set up and great for getting started.\n\n![Qwen-Edit Hi-Res Simple workflow](workflows/Qwen-Edit-HiRes-Simple.png)\n\n### Qwen-Edit Hi-Res — Advanced\n\nA full-featured editing workflow with multi-stage generation, Spectrum acceleration, upscale VAE, and fine-grained stage controls.\n\n![Qwen-Edit Hi-Res Advanced workflow](workflows/Qwen-Edit-HiRes-Adv.png)\n\n### Qwen-Image Hi-Res with ControlNet\n\nText-to-image generation guided by ControlNet (Canny, SoftEdge, Depth, or Pose) using the InstantX Union model, with multi-stage UltraGen upscaling.\n\n![Qwen-Image Hi-Res ControlNet workflow](workflows/Qwen-Image-HiRes-Controlnet.png)\n\n### Qwen-Image UltraGen Hi-Res (30 MP+)\n\nText-to-image generation without ControlNet using the UltraGen multi-stage pipeline. Produces 30 MP+ output with Spectrum acceleration and upscale VAE.\n\n![Qwen-Image UltraGen Hi-Res 30MP+ workflow](workflows/Qwen-image-UltraGen-HiRes-30mp-plus.png)\n\n### Qwen-Image UltraGen — Advanced\n\nAdvanced UltraGen workflow with Prompt Rewriter, selective sharpening, and several other features for high-quality text-to-image generation.\n\n![Qwen-Image UltraGen Advanced workflow](workflows/Qwen-UltraGen-Adv.png)\n\nSee the `examples/` and `workflows/` folders for additional workflow files and screenshots.\n\n## Example Prompts\n\n- \"Change the background to a sunset over the ocean\"\n- \"Make the person smile\"\n- \"Add a red hat to the person\"\n- \"Change the car color from blue to red\"\n- \"Remove the text from the image\"\n- \"Make it look like a painting\"\n- \"Harmonise the pasted element with its surroundings\" *(inpaint transfer)*\n- \"Apply the watercolor style of Picture 1 to Picture 2\" *(style transfer)*\n\n## Tips\n\n1. **Start with lower max_mp** (4–6) to test edits, then increase\n2. **Use the lightning LoRA** with 8 steps for fast iteration on edit nodes (50 steps without)\n3. **Use negative prompts** to avoid unwanted elements\n4. **VAE tiling is automatic** — no configuration needed\n5. **Progress bars** appear in ComfyUI during denoising on all edit and generation nodes\n6. **Spectrum accelerator** can cut generation time by 3–5× with ≥15 steps (works with both edit and generation pipelines)\n7. **For inpaint transfer**, provide a `transfer_mask` to select exactly which part of the reference image to use. The node handles all scaling and positioning automatically.\n8. **Delta Overlay** is great for up-res workflows: edit at low resolution, upscale the original, then apply only the changed pixels at full resolution.\n9. **Generation resolution presets** let you quickly choose common aspect ratios without doing pixel math.\n10. **Edit and generation pipelines are separate** — you can load both simultaneously if you have enough VRAM.\n11. **Increase `max_sequence_length`** in UltraGen if you use very detailed prompts or the Prompt Rewriter node (see below).\n\n### Extended Prompt Token Length (`max_sequence_length`)\n\nMost Qwen-Image ComfyUI workflows and the default diffusers pipeline hard-code the prompt token budget at 512 tokens. The UltraGen node exposes this as a configurable parameter (`max_sequence_length`, 128–1024) — a feature **not available in other Qwen-Image nodes or workflows**.\n\n**How it works:** After the Qwen2.5-VL text encoder produces token embeddings from your prompt, the sequence is truncated to `max_sequence_length` before being fed to the transformer. If your prompt is shorter than the limit, the extra positions are zero-padded and ignored via the attention mask — so there is no quality penalty for setting it higher than needed.\n\n| Consideration | Impact |\n|---------------|--------|\n| **Prompt fidelity** | Higher values preserve more detail from long prompts. At 512, prompts over ~200 words may be silently truncated. |\n| **Generation time** | Slightly more cross-attention compute per step. Negligible for most prompts — the image latent sequence dominates. |\n| **VRAM** | ~8 MB extra per batch item at 1024 vs 512 (trivial vs. the 38 GB transformer). |\n| **Quality** | No degradation — unused positions are masked out. |\n\n**Recommendation:** Leave at **512** for typical prompts. Increase to **768–1024** when using the Prompt Rewriter node or manually writing very detailed descriptions (300+ words). The maximum is 1024 (hard limit in the model architecture).\n\n## Credits\n\n- **Qwen-Image-Edit / Qwen-Image**: Developed by Qwen Team (Alibaba)\n- **Wan2.1-VAE-upscale2x**: 2× super-resolution VAE by [spacepxl](https://huggingface.co/spacepxl) — model weights: [Apache-2.0](https://huggingface.co/spacepxl/Wan2.1-VAE-upscale2x), reference code: [MIT](https://github.com/spacepxl/ComfyUI-VAE-Utils)\n- **Spectrum**: Han *et al.*, \"Adaptive Spectral Feature Forecasting for Diffusion Sampling Acceleration\" (CVPR 2026)\n- **ComfyUI Nodes**: Eric Hiss (GitHub: [EricRollei](https://github.com/EricRollei))\n\n## License\n\nDual licensed: **CC BY-NC 4.0** for non-commercial use, separate commercial license available. See [LICENSE.txt](LICENSE.txt) for full terms.\n\nContact: eric@rollei.us / eric@historic.camera\n\n## Related\n\n- [Eric UniPic3 Nodes](https://github.com/EricRollei/Eric_UniPic3) — Similar nodes for UniPic3 model\n- [Qwen-Image GitHub](https://github.com/QwenLM/Qwen-Image)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericrollei%2Feric_qwen_edit_experiments","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fericrollei%2Feric_qwen_edit_experiments","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fericrollei%2Feric_qwen_edit_experiments/lists"}