{"id":40184537,"url":"https://github.com/kantan-kanto/ComfyUI-MultiModal-Prompt-Nodes","last_synced_at":"2026-01-28T11:00:40.696Z","repository":{"id":332236459,"uuid":"1132683898","full_name":"kantan-kanto/ComfyUI-MultiModal-Prompt-Nodes","owner":"kantan-kanto","description":"Multimodal prompt generator nodes for ComfyUI, designed to generate prompts for QwenImageEdit and Wan2.2. Supports local LLM / local GGUF models (Qwen3-VL, Qwen-VL) and Qwen API for image and video prompt generation and enhancement.","archived":false,"fork":false,"pushed_at":"2026-01-26T09:47:21.000Z","size":89,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-26T23:55:29.446Z","etag":null,"topics":["comfy-ui","comfyui-custom-node","comfyui-custom-nodes","comfyui-nodes","gguf","llama-cpp-python","llm-tools","prompt-generator","qwen","qwen-image-edit","qwen3-vl","vision-language-models","wan"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kantan-kanto.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":"AUTHORS.md","dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-12T10:01:22.000Z","updated_at":"2026-01-26T09:42:50.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/kantan-kanto/ComfyUI-MultiModal-Prompt-Nodes","commit_stats":null,"previous_names":["kantan-kanto/comfyui-multimodal-prompt-nodes"],"tags_count":3,"template":false,"template_full_name":null,"purl":"pkg:github/kantan-kanto/ComfyUI-MultiModal-Prompt-Nodes","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kantan-kanto%2FComfyUI-MultiModal-Prompt-Nodes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kantan-kanto%2FComfyUI-MultiModal-Prompt-Nodes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kantan-kanto%2FComfyUI-MultiModal-Prompt-Nodes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kantan-kanto%2FComfyUI-MultiModal-Prompt-Nodes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kantan-kanto","download_url":"https://codeload.github.com/kantan-kanto/ComfyUI-MultiModal-Prompt-Nodes/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kantan-kanto%2FComfyUI-MultiModal-Prompt-Nodes/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28844406,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-28T10:53:21.605Z","status":"ssl_error","status_checked_at":"2026-01-28T10:53:20.789Z","response_time":57,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["comfy-ui","comfyui-custom-node","comfyui-custom-nodes","comfyui-nodes","gguf","llama-cpp-python","llm-tools","prompt-generator","qwen","qwen-image-edit","qwen3-vl","vision-language-models","wan"],"created_at":"2026-01-19T19:00:25.839Z","updated_at":"2026-01-28T11:00:40.690Z","avatar_url":"https://github.com/kantan-kanto.png","language":"Python","funding_links":[],"categories":["Workflows created in 7 days"],"sub_categories":[],"readme":"# ComfyUI-MultiModal-Prompt-Nodes\n\n**Version:** 1.0.6  \n**License:** GPL-3.0\n\nMultimodal prompt generator nodes for ComfyUI, designed to generate prompts for **QwenImageEdit** and **Wan2.2**.  \nSupports **local LLM / local GGUF models** (Qwen3-VL, Qwen-VL) and **Qwen API** for image and video prompt generation and enhancement.\n\n---\n\n## Important Notes\n\n### Language Recommendation for Optimal Results\nBased on extensive testing, **Wan2.2** and **Qwen-Image-Edit** respond **significantly better to Chinese prompts than English prompts**. \n\n**Recommendation:** Set `target_language` to **\"zh\"** (Chinese) for best results with these models, even if your input is in English. The models will generate more coherent and instruction-following outputs.\n\n### Vision Input Compatibility\nVision input support varies by model and llama-cpp-python version. See Installation section for detailed compatibility information. Results may vary based on your specific environment.\n\n### Local GGUF Model Stability\nStarting from **v1.0.6**, internal GGUF model handling has been improved to ensure stable behavior\nwhen switching between different Qwen3-VL models (e.g. 8B ↔ 4B), with mmproj files now being\nproperly reloaded as part of the model switching process.\n\nThese changes are internal and do **not** affect node interfaces or workflows.\n\n---\n\n## Features\n\n### 1. Vision LLM Node\n- **Local GGUF support**: Run Qwen2.5-VL and Qwen3-VL models locally\n- **Multi-image input**: Support batch image input via ComfyUI's batch nodes (e.g., Images Batch Multiple)\n- **Flexible prompting styles**: \n  - `raw`: Direct LLM response without system prompt\n  - `default`: Balanced prompt enhancement\n  - `detailed`: Rich visual details (colors, textures, lighting, atmosphere)\n  - `concise`: Minimal keywords, focused on core elements\n  - `creative`: Artistic interpretation with unique perspectives\n- **Device selection**: Simple CPU/GPU dropdown for hardware control\n- **Auto-detect mmproj**: Automatic detection or manual selection for Qwen3-VL\n\n### 2. Qwen Image Edit Prompt Generator\n- **Dynamic model selection**: Auto-detect local GGUF models and cloud API models\n- **Image editing prompts**: Specialized for Qwen-Image-Edit tasks\n- **Manual mmproj selection**: Choose specific mmproj files or use auto-detect\n- **Multi-image support**: Up to 3 images via optional inputs (image2/image3)\n- **Unified interface**: Consistent parameter ordering and naming\n- **API key management**: Centralized configuration via `api_key.txt`\n- **Device control**: CPU/GPU selection for local models\n\n### 3. Wan Video Prompt Generator\n- **Video generation prompts**: Optimized for Wan2.2 text-to-video and image-to-video\n- **Local Qwen3-VL integration**: Use local models for prompt enhancement\n- **Task-specific optimization**: Separate prompts for T2V and I2V workflows\n- **Extended token limit**: 2048 tokens to support longer Chinese prompts (600+ characters)\n- **Device selection**: CPU/GPU dropdown for local model execution\n- **Optimized for Chinese**: Better performance with Chinese language prompts\n\n---\n\n## Installation\n\n### 1. Clone Repository\n\nClone this repository into your ComfyUI custom_nodes folder:\n```bash\ncd ComfyUI/custom_nodes\ngit clone https://github.com/yourusername/ComfyUI-MultiModal-Prompt-Nodes.git\n```\n\n### 2. Install Dependencies\n\n```bash\ncd ComfyUI-MultiModal-Prompt-Nodes\npip install -r requirements.txt\n```\n\n**Alternative manual installation:**\n```bash\npip install dashscope pillow numpy\n```\n\n### 3. Install llama-cpp-python (REQUIRED for local models)\n\n**Important:** Model compatibility varies by llama-cpp-python version. Based on my testing environment:\n\n| Version | Qwen2.5-VL (Text) | Qwen2.5-VL (Vision) | Qwen3-VL | \n|---------|-------------------|---------------------|----------|\n| 0.3.16 (official) | ✅ | ❌ | ❌ |\n| 0.3.21+ (JamePeng fork) | ✅ | ❌* | ✅ |\n\n***Note:** Vision input support may vary depending on your environment and configuration. In my setup, I have not been able to get vision input working with Qwen2.5-VL even with the JamePeng fork.\n\n**Recommended Installation (JamePeng fork for Qwen3-VL support):**  \nPlease follow the build and installation instructions provided in the JamePeng fork repository, as this fork requires a custom build and cannot be reliably installed via a simple `pip install`.\n\n**Source:** https://github.com/JamePeng/llama-cpp-python\n\n**My Environment Results:**\n- Official llama-cpp-python 0.3.16: Qwen2.5-VL text-only, no vision input, Qwen3-VL fails to load\n- JamePeng fork 0.3.21+: Qwen3-VL works with vision input, Qwen2.5-VL text works but vision input still unavailable\n\n⚠️ **Disclaimer:** Your results may differ depending on system configuration, GPU drivers, and other factors. If you encounter issues, please verify your environment setup and consider reporting compatibility details.\n\n**Note:** When using Qwen3-VL GGUF models, switching between different model sizes\n(e.g. 8B ↔ 4B) is supported and stable as of v1.0.6.\n\n### 4. Place Models\n\nPlace your GGUF models in `ComfyUI/models/LLM/`:\n```\nComfyUI/models/LLM/\n├── Qwen3VL-4B-Q4_K_M.gguf\n├── Qwen3VL-4B-Q8_0.gguf\n├── mmproj-qwen3vl-4b-f16.gguf\n└── ...\n```\n\n### 5. Configure API Key (Optional, for cloud models)\n\nFor cloud API usage, create `api_key.txt` in the node folder:\n```\nComfyUI/custom_nodes/ComfyUI-MultiModal-Prompt-Nodes/api_key.txt\n```\n\nAdd your Alibaba Cloud Dashscope API key to this file.\n\n---\n\n## Usage\n\n### Vision LLM Node\n\n**Inputs:**\n- `prompt`: Text prompt to rewrite/enhance\n- `style`: Prompt rewriting style\n  - `raw`: Direct LLM response without system prompt (useful for custom prompting)\n  - `default`: Balanced prompt enhancement\n  - `detailed`: Rich visual details\n  - `concise`: Minimal, focused keywords\n  - `creative`: Artistic interpretation\n- `target_language`: Output language (auto/en/zh)\n- `model`: Select from auto-detected local GGUF models\n- `mmproj`: mmproj file selection\n  - `(Auto-detect)`: Automatically search for matching mmproj\n  - `(Not required)`: For Qwen2.5-VL or text-only mode\n  - Specific file: Manually select mmproj file\n- `max_tokens`: Maximum tokens to generate (default: 512)\n- `temperature`: Sampling temperature (0.0-2.0, default: 0.7)\n- `device`: CPU or GPU execution\n- `image` (optional): Input image for vision-language processing\n\n**Example workflow:**\n1. Load Vision LLM Node\n2. Enter basic prompt: \"a cat sitting on a windowsill\"\n3. Attach image via batch node (optional)\n4. Select Qwen3-VL model\n5. Choose `(Auto-detect)` for mmproj or select specific file\n6. Select style: `default`\n7. Set device: `CPU` or `GPU`\n8. Run to get enhanced prompt\n\n### Qwen Image Edit Prompt Generator\n\n**Inputs:**\n- `image`: Primary input image (required)\n- `prompt`: Edit instruction or image description\n- `prompt_style`: \n  - `Qwen-Image-Edit`: For image editing tasks\n  - `Qwen-Image`: For general image understanding\n- `target_language`: Output language (auto/zh/en)\n- `llm_model`: Model selection\n  - `Local: xxx`: Local GGUF models (auto-detected)\n  - API models: qwen-vl-max, qwen-plus, etc.\n- `mmproj`: mmproj file (required for local Qwen3-VL)\n  - `(Auto-detect)`: Automatic detection\n  - `(Not required)`: For API models or Qwen2.5-VL\n  - Specific file: Manual selection\n- `max_retries`: Retry attempts for API calls (default: 3)\n- `device`: CPU/GPU selection for local models\n- `save_tokens`: Compress images to save API tokens\n- `image2/image3` (optional): Additional context images\n\n**Use cases:**\n- Image editing prompt generation\n- Multi-image context prompts\n- Style transfer descriptions\n- Visual question answering\n\n**Recommended settings:**\n- For best results: Set `target_language` to `zh` (Chinese)\n- Use local models for privacy, API models for quality\n- Enable `save_tokens` when using API models\n\n### Wan Video Prompt Generator\n\n**Inputs:**\n- `prompt`: Video scene description\n- `task_type`: \n  - `Text-to-Video`: Generate video from text description\n  - `Image-to-Video`: Generate video from image + text\n- `target_language`: Output language (auto/zh/en)\n- `llm_model`: Model selection\n  - `Local: xxx`: Local GGUF models\n  - API models: qwen-vl-max (for I2V), qwen-plus, etc.\n- `mmproj`: mmproj selection (same as other nodes)\n- `max_retries`: API retry attempts\n- `device`: CPU/GPU for local models\n- `save_tokens`: Image compression for API\n- `image` (optional): Reference frame for I2V tasks\n\n**Optimized for:**\n- Wan2.2 video generation\n- Temporal coherence descriptions\n- Camera movement instructions\n- Scene transitions\n\n**Important notes:**\n- **Use Chinese prompts** (`target_language: zh`) for best results\n- Supports up to 600+ Chinese characters (2048 tokens)\n- For I2V tasks, use `qwen-vl-*` models\n\n**Example T2V workflow:**\n1. Enter prompt: \"一只猫在窗台上看风景\" (A cat looking at scenery on a windowsill)\n2. Set `task_type`: Text-to-Video\n3. Set `target_language`: zh\n4. Select model (local or API)\n5. Run to get optimized video prompt\n\n**Example I2V workflow:**\n1. Attach input image\n2. Enter motion description: \"镜头慢慢推进\" (Camera slowly zooms in)\n3. Set `task_type`: Image-to-Video\n4. Set `target_language`: zh\n5. Ensure model supports vision (qwen-vl-*)\n6. Run to get I2V prompt\n\n---\n\n## Model Compatibility\n\n### Qwen2.5-VL (Integrated mmproj)\n- ✅ Qwen2.5-VL-2B: Text-only in my environment\n- ✅ Qwen2.5-VL-7B: Text-only in my environment\n- ⚠️ mmproj integrated but vision input unavailable in my setup\n\n### Qwen3-VL (Separate mmproj)\n- ✅ Qwen3-VL-4B: Full vision support with JamePeng fork\n- ✅ Qwen3-VL-7B: Full vision support with JamePeng fork\n- ✅ Requires matching mmproj file\n\n### Recommended Quantization\n- **Q4_K_M**: Balanced quality/size (recommended for most users)\n- **Q5_K_M**: Higher quality, larger size\n- **Q8_0**: Maximum quality, largest size\n\n### Model Sources\n- Qwen models: https://huggingface.co/Qwen\n- GGUF conversions: https://huggingface.co/models?search=qwen+gguf\n- mmproj files: Usually bundled with GGUF conversions\n\n---\n\n## Configuration\n\n### System Requirements\n- **RAM**: 8GB+ (16GB recommended for 7B models)\n- **Storage**: 3-8GB per model (depending on quantization)\n- **GPU**: Optional (CPU execution supported)\n  - NVIDIA GPU: CUDA support via llama-cpp-python\n  - AMD GPU: ROCm support (requires specific build)\n  - Intel Arc: Limited support, CPU recommended\n\n### Performance Tips\n1. **Use Q4_K_M quantization** for faster inference and lower memory usage\n2. **Reduce max_tokens** if hitting memory limits\n3. **Enable GPU** if you have compatible hardware (select `GPU` in device dropdown)\n4. **Use CPU for stability** if encountering GPU issues\n5. **Batch multiple requests** when possible for efficiency\n6. **Close other applications** to free up RAM during inference\n\n### Memory Usage Guide\n| Model | Quantization | RAM Usage |\n|-------|--------------|-----------|\n| Qwen3-VL-4B | Q4_K_M | ~4-5GB |\n| Qwen3-VL-4B | Q8_0 | ~7-8GB |\n| Qwen3-VL-7B | Q4_K_M | ~6-7GB |\n| Qwen3-VL-7B | Q8_0 | ~12-14GB |\n\n---\n\n## Troubleshooting\n\n### Installation Issues\n\n**Q: \"No module named 'llama_cpp'\" error**  \nA: Install llama-cpp-python: `pip install llama-cpp-python==0.3.21 --break-system-packages`\n\n**Q: pip install fails with \"externally-managed-environment\"**  \nA: Use `--break-system-packages` flag or create a virtual environment\n\n**Q: \"Failed to load model\" with Qwen3-VL**  \nA: Ensure you're using llama-cpp-python 0.3.21+ (JamePeng fork). Version 0.3.16 doesn't support Qwen3-VL.\n\n### Runtime Issues\n\n**Q: \"mmproj not specified\" error**  \nA: Select an mmproj file (or choose `(Auto-detect)`) in the mmproj dropdown for Qwen3-VL models\n\n**Q: \"No models found\" in model dropdown**  \nA: \n1. Place GGUF models in `ComfyUI/models/LLM/`\n2. Restart ComfyUI\n3. Verify file extensions are `.gguf`\n\n**Q: Vision input not working with Qwen2.5-VL**  \nA: This is a known issue in my environment. Qwen2.5-VL currently only supports text input. Use Qwen3-VL for vision capabilities.\n\n**Q: Out of memory errors**  \nA: \n1. Use smaller quantization (Q4_K_M instead of Q8_0)\n2. Reduce `max_tokens` parameter\n3. Close other applications\n4. Use a smaller model (4B instead of 7B)\n\n**Q: Slow inference on CPU**  \nA: Normal for large models. Consider:\n1. Q4_K_M quantization (faster than Q8_0)\n2. Smaller models (4B faster than 7B)\n3. GPU acceleration if available\n\n**Q: \"API_KEY is not set\" error with local models**  \nA: This error should only appear when using API models. If using local models (starting with \"Local:\"), this is a bug - please report it.\n\n### Output Quality Issues\n\n**Q: Wan2.2 output is incoherent or doesn't follow instructions**  \nA: Set `target_language` to `zh` (Chinese). Wan2.2 performs **significantly better** with Chinese prompts, even if your input is in English.\n\n**Q: Qwen-Image-Edit not understanding my edits**  \nA: \n1. Use `target_language: zh` for better results\n2. Be specific in edit instructions\n3. Try using reference examples in your prompt\n\n**Q: Output is cut off or incomplete**  \nA: Increase `max_tokens` parameter (Vision LLM Node) or note that other nodes have fixed limits (512 for Qwen, 2048 for Wan)\n\n### Device Selection Issues\n\n**Q: How to choose between CPU and GPU?**  \nA: \n- **GPU**: Faster inference, requires compatible hardware (NVIDIA with CUDA)\n- **CPU**: Universal compatibility, slower but stable\n- **Recommendation**: Start with CPU, switch to GPU if available and working\n\n**Q: GPU selected but still using CPU**  \nA: Your GPU may not be compatible with llama-cpp-python. Check:\n1. NVIDIA GPU with CUDA support\n2. llama-cpp-python built with CUDA support\n3. Driver installation\n\n---\n\n## API Key Management\n\n### For Cloud API Models\n\n1. Create `api_key.txt` in the node directory:\n```\nComfyUI/custom_nodes/ComfyUI-MultiModal-Prompt-Nodes/api_key.txt\n```\n\n2. Add your Alibaba Cloud Dashscope API key (single line, no quotes)\n\n3. The key will be automatically loaded by Qwen and Wan nodes when using cloud API models\n\n### Security Notes\n- Never commit `api_key.txt` to version control\n- The file is listed in `.gitignore` by default\n- API keys are only loaded when using cloud API models\n- Local models don't require API keys\n\n---\n\n## Examples\n\nSee the [examples/](examples/) directory for:\n- Basic prompt enhancement workflows\n- Multi-image vision processing\n- Image editing prompt generation\n- Video prompt generation (T2V and I2V)\n- Style-specific optimizations\n\n---\n\n## License\n\nThis project is licensed under the **GNU General Public License v3.0**.\n\n**Copyright (C) 2026 kantan-kanto**  \nGitHub: https://github.com/kantan-kanto\n\nThis program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.\n\nThis program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.\n\nYou should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.\n\n**Note:** GPL-3.0 is required due to llama-cpp-python dependency.\n\nFor full details, see the [LICENSE](LICENSE) file and [AUTHORS.md](AUTHORS.md).\n\n---\n\n## Internal Structure Notes (for Advanced Users)\n\nThis repository may introduce internal structural changes over time\n(e.g. extracting Local GGUF or Cloud API implementations into separate modules)\nto improve maintainability and stability.\n\n- Node interfaces (INPUT / RETURN types) are intended to remain stable\n- Internal refactors will be documented in the changelog\n- The `backends/` directory added in v1.0.6 is a **non-functional placeholder**\n  for future internal refactoring\n\nNo user action is required.\n\n---\n\n## Credits\n\n### Derived From / Inspirations\nThis project is a restructured and extended ComfyUI custom node collection, derived from the following GPL-3.0 licensed projects:\n\n- **ComfyUI-QwenPromptRewriter**: [lihaoyun6](https://github.com/lihaoyun6/ComfyUI-QwenPromptRewriter) (GPL-3.0)\n- **ComfyUI-QwenVL**: [1038lab](https://github.com/1038lab/ComfyUI-QwenVL) (GPL-3.0)\n\nFor detailed attribution, file-level mapping, and contribution notes, see **[AUTHORS.md](AUTHORS.md)**.\n\n### Key Dependencies / Providers\n- **llama-cpp-python**: Andrei Betlen  \n- **Qwen3-VL support**: JamePeng's llama-cpp-python fork  \n- **Qwen models**: Alibaba Cloud Qwen Team  \n- **Dashscope API**: Alibaba Cloud\n\n---\n\n## Contributing\n\nContributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.\n\nAreas needing help:\n- Testing on different hardware configurations\n- Documenting vision input compatibility across environments\n- Additional workflow examples\n- Performance optimizations\n\n---\n\n## Support\n\n- **Issues**: Report bugs or request features via GitHub Issues\n- **Documentation**: See [CHANGELOG.md](CHANGELOG.md) for version history\n- **Examples**: Check [examples/](examples/) for workflow templates\n\n---\n\n## Changelog\n\nSee [CHANGELOG.md](CHANGELOG.md) for detailed version history.\n\n### Current Version: 1.0.7\n- Fixed incorrect detection of Qwen3-VL when mmproj is set to (Not required).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkantan-kanto%2FComfyUI-MultiModal-Prompt-Nodes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkantan-kanto%2FComfyUI-MultiModal-Prompt-Nodes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkantan-kanto%2FComfyUI-MultiModal-Prompt-Nodes/lists"}