{"id":44035650,"url":"https://github.com/D-Ogi/ComfyUI-Attention-Optimizer","last_synced_at":"2026-02-19T16:01:03.831Z","repository":{"id":334464565,"uuid":"1141482131","full_name":"D-Ogi/ComfyUI-Attention-Optimizer","owner":"D-Ogi","description":"Automatically benchmark and optimize attention in diffusion models. 1.5-2x speedup on RTX 4090.","archived":false,"fork":false,"pushed_at":"2026-02-09T12:56:32.000Z","size":17,"stargazers_count":25,"open_issues_count":1,"forks_count":5,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-02-09T17:49:01.206Z","etag":null,"topics":["attention","comfyui","comfyui-custom-node","diffusion","flash-attention","flux","optimization","performance","sageattention","stable-diffusion"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/D-Ogi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-24T23:02:20.000Z","updated_at":"2026-02-09T12:56:35.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/D-Ogi/ComfyUI-Attention-Optimizer","commit_stats":null,"previous_names":["d-ogi/comfyui-attention-optimizer"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/D-Ogi/ComfyUI-Attention-Optimizer","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/D-Ogi%2FComfyUI-Attention-Optimizer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/D-Ogi%2FComfyUI-Attention-Optimizer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/D-Ogi%2FComfyUI-Attention-Optimizer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/D-Ogi%2FComfyUI-Attention-Optimizer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/D-Ogi","download_url":"https://codeload.github.com/D-Ogi/ComfyUI-Attention-Optimizer/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/D-Ogi%2FComfyUI-Attention-Optimizer/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29621883,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-19T13:04:20.082Z","status":"ssl_error","status_checked_at":"2026-02-19T13:03:33.775Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["attention","comfyui","comfyui-custom-node","diffusion","flash-attention","flux","optimization","performance","sageattention","stable-diffusion"],"created_at":"2026-02-07T20:00:19.834Z","updated_at":"2026-02-19T16:01:03.825Z","avatar_url":"https://github.com/D-Ogi.png","language":"Python","funding_links":[],"categories":["Workflows (4120) sorted by GitHub Stars"],"sub_categories":[],"readme":"# ComfyUI Attention Optimizer\n\n**Automatically benchmark and optimize the attention mechanism in diffusion models for maximum generation speed.**\n\n## Why This Matters\n\n### The Problem\n\nModern diffusion models (SDXL, Flux, WAN, LTX-V, Hunyuan Video) are based on **transformer architecture**. The core operation - **attention** - computes relationships between all elements in the image/video latent space. This is:\n\n- **The most expensive operation** - attention takes 40-70% of total generation time\n- **O(n²) complexity** - cost grows quadratically with resolution/frames\n- **GPU-dependent** - different GPUs perform best with different implementations\n\n### The Solution\n\nMultiple optimized attention backends exist:\n- **PyTorch SDPA** - built-in, always available\n- **Flash Attention** - CUDA kernels, memory efficient\n- **SageAttention** - INT8 quantization, up to 2-4x faster\n- **xFormers** - memory efficient attention\n\n**But which one is fastest for YOUR specific GPU and model?**\n\nThis plugin **benchmarks all available backends** and **automatically applies the fastest one**.\n\n## Real-World Speedups\n\nTested on RTX 4090 with head_dim=128 (SDXL, Flux):\n\n| Backend | Time | Speedup |\n|---------|------|---------|\n| PyTorch SDPA | 5.0ms | 1.0x (baseline) |\n| Flash Attention | 5.4ms | 0.93x |\n| **SageAttention** | **2.7ms** | **1.9x** |\n\n**Result: 1.9x faster generation** just by switching attention backend.\n\nFor video models (WAN, Hunyuan) with longer sequences, speedups can reach **2-4x**.\n\n## Installation\n\n### Option 1: ComfyUI Manager (Recommended)\n\n1. Open ComfyUI Manager\n2. Click **\"Install via Git URL\"**\n3. Paste: `https://github.com/D-Ogi/ComfyUI-Attention-Optimizer.git`\n4. Restart ComfyUI\n\n### Option 2: Manual Installation\n\n```bash\ncd ComfyUI/custom_nodes\ngit clone https://github.com/D-Ogi/ComfyUI-Attention-Optimizer.git\n```\n\nRestart ComfyUI.\n\n### Optional: Install Optimized Backends\n\nThe plugin works out-of-the-box with PyTorch SDPA. For better performance, install additional backends:\n\n```bash\n# SageAttention - recommended for RTX 30xx/40xx (1.5-2x speedup)\npip install sageattention\n\n# Flash Attention - alternative for Ampere+ GPUs\npip install flash-attn\n\n# xFormers - memory efficient option\npip install xformers\n```\n\n\u003e **Note:** On Windows, Flash Attention requires building from source or using prebuilt wheels.\n\u003e SageAttention is easier to install and often faster on consumer GPUs.\n\n## Usage\n\n### Basic Usage\n\n1. Add **\"Attention Optimizer\"** node to your workflow (category: `model_patches`)\n2. Connect your model to the `model` input\n3. Run - it benchmarks once, caches results, and auto-applies the fastest backend\n\n### How It Works\n\n```\n┌─────────────────┐     ┌──────────────────────────┐     ┌─────────────┐\n│ Load Checkpoint │────▶│ Attention Optimizer      │────▶│ KSampler    │\n└─────────────────┘     │                          │     └─────────────┘\n                        │ 1. Detect model params   │\n                        │ 2. Check cache           │\n                        │ 3. Benchmark (if needed) │\n                        │ 4. Clone model \u0026 apply   │\n                        │    attention override    │\n                        └──────────────────────────┘\n```\n\n**First run:** Benchmarks all backends (~5-10 seconds), saves to cache.\n**Subsequent runs:** Loads from cache (instant), applies optimal backend.\n\n### Node Inputs\n\n| Input | Type | Default | Description |\n|-------|------|---------|-------------|\n| `model` | MODEL | required | The diffusion model to optimize |\n| `attention_backend` | dropdown | `auto` | `auto` = benchmark \u0026 select best, or force specific backend |\n| `force_refresh` | bool | False | Re-run benchmark even if cached |\n| `auto_apply` | bool | True | Apply the selected backend to this model |\n| `seq_len` | int | 8192 | Sequence length for benchmark |\n| `num_heads` | int | 24 | Number of attention heads |\n\n### Node Outputs\n\n| Output | Type | Description |\n|--------|------|-------------|\n| `model` | MODEL | Cloned model with optimized attention applied |\n| `best_attention` | STRING | Name of applied backend |\n| `kjnodes_mode` | STRING | Compatible mode for KJNodes PatchSageAttention |\n| `impl_type` | STRING | Implementation type (cuda/triton/pytorch) |\n| `speedup` | FLOAT | Speedup vs PyTorch SDPA baseline |\n| `time_ms` | FLOAT | Time per attention call in milliseconds |\n| `head_dim` | INT | Detected head dimension from model |\n| `report` | STRING | Full benchmark report text |\n\n## Supported Backends\n\n| Backend | Implementation | Best For |\n|---------|---------------|----------|\n| `pytorch` | PyTorch SDPA | Always available, baseline |\n| `xformers` | xFormers CUDA | Memory efficiency |\n| `sage_auto` | SageAttention auto | General use (auto-selects best variant) |\n| `sage_cuda` | SageAttention CUDA | RTX 30xx/40xx |\n| `sage_triton` | SageAttention Triton | When CUDA kernel unavailable |\n| `sage_fp8_cuda` | SageAttention FP8 | Maximum speed, slight quality trade-off |\n| `sage_fp8_cuda_fast` | SageAttention FP8++ | Even faster FP8 |\n| `sage3` | SageAttention 3 | RTX 50xx (Blackwell) only |\n| `flash` | Flash Attention 2 | H100, A100, RTX 30xx/40xx |\n\n## Model Compatibility\n\n| Model | Status | Notes |\n|-------|--------|-------|\n| SDXL | ✅ Full | head_dim=128, SageAttention optimal |\n| SD 1.5 | ✅ Full | head_dim=64 |\n| SD 3 | ✅ Full | |\n| Flux | ✅ Full | Per-model attention override |\n| LTX-V | ✅ Full | head_dim=160 |\n| WAN 2.1/2.2 | ✅ Full | Per-model attention override |\n| Hunyuan Video | ✅ Full | Per-model attention override |\n| Cosmos | ✅ Full | Per-model attention override |\n| SeedVR2 | ❌ N/A | Uses own attention system, not affected |\n\n## GPU Recommendations\n\n| GPU | Recommended Backend | Expected Speedup |\n|-----|---------------------|------------------|\n| RTX 4090/4080 | `sage_auto` or `sage_fp8_cuda_fast` | 1.5-2.0x |\n| RTX 3090/3080 | `sage_auto` or `flash` | 1.3-1.8x |\n| RTX 50xx (Blackwell) | `sage3` | 2-4x |\n| H100/A100 | `flash` | 1.5-2.0x |\n| AMD (ROCm) | `pytorch` | 1.0x (baseline) |\n\n## Example Benchmark Report\n\n```\n=================================================================\nBENCHMARK REPORT\n=================================================================\ndtype: float16 | head_dim: 128 | seq_len: 8192 | CUDA: 12.4 | Triton: 3.0.0\nSageAttention: v2.1.1\n\n\u003e\u003e\u003e BEST: sage_fp8_cuda_fast (1.89x speedup) \u003c\u003c\u003c\n    impl: cuda | kjnodes mode: sageattn_qk_int8_pv_fp8_cuda++\n\nResults (fastest first):\n-----------------------------------------------------------------\n [v] sage_fp8_cuda_fast       2.671ms   1.89x  (cuda) \u003c\u003c\u003c\n [v] sage_auto                2.679ms   1.88x  (auto)\n [v] sage_fp8_cuda            3.100ms   1.63x  (cuda)\n [v] sage_triton              3.446ms   1.47x  (triton)\n [v] sage_cuda                3.947ms   1.28x  (cuda)\n [v] pytorch                  5.049ms   1.00x  (pytorch)\n [v] xformers                 5.194ms   0.97x  (cuda/triton)\n [v] flash                    5.430ms   0.93x  (cuda)\n [ ] sage3                    ---       (N/A) Not installed\n-----------------------------------------------------------------\n[v] = validated (tested underlying library directly)\n=================================================================\n```\n\n## Technical Details\n\n### Why Different Backends?\n\n**PyTorch SDPA** uses cuDNN/cuBLAS - general purpose, always works.\n\n**Flash Attention** fuses operations into single CUDA kernel, reducing memory bandwidth. Great for long sequences.\n\n**SageAttention** quantizes Q/K to INT8, reducing memory and compute. Works best for head_dim ≤ 128.\n\n**xFormers** similar to Flash Attention, good memory efficiency.\n\n### head_dim Matters\n\nModels have different attention head dimensions:\n- **SD 1.5:** head_dim=64\n- **SDXL, Flux:** head_dim=128\n- **LTX-V:** head_dim=160\n\nSageAttention works best with head_dim ≤ 128. For larger dimensions, SDPA or Flash Attention may be faster.\n\n### Cache System\n\nBenchmark results are cached in `benchmark_db.json` based on:\n- Model hash (architecture + weights)\n- head_dim\n- seq_len / num_heads parameters\n\nCache is per-machine - different GPUs will have different optimal backends.\n\n## Troubleshooting\n\n### \"Backend X not available\"\nInstall the missing package:\n```bash\npip install sageattention  # for sage_*\npip install flash-attn     # for flash\npip install xformers       # for xformers\n```\n\n### No speedup observed\n1. Check if `auto_apply` is enabled\n2. Try `force_refresh=True` to re-benchmark\n3. Check console for `[Benchmark] Applied: X` message\n\n### Model not affected\nSome models (like SeedVR2) use their own attention implementation and won't be affected by this plugin. Check the compatibility table above.\n\n## License\n\nMIT License - see [LICENSE](LICENSE)\n\n## Credits\n\n- [SageAttention](https://github.com/thu-ml/SageAttention) - THU-ML\n- [Flash Attention](https://github.com/Dao-AILab/flash-attention) - Dao-AILab\n- [xFormers](https://github.com/facebookresearch/xformers) - Meta\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FD-Ogi%2FComfyUI-Attention-Optimizer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FD-Ogi%2FComfyUI-Attention-Optimizer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FD-Ogi%2FComfyUI-Attention-Optimizer/lists"}