https://github.com/ShmuelRonen/ComfyUI-Janus_pro_vision
A ComfyUI custom node extension that integrates the Janus-Pro-7B vision-language model from DeepSeek AI, enabling powerful image understanding and multi-turn conversation capabilities.
https://github.com/ShmuelRonen/ComfyUI-Janus_pro_vision
Last synced: 5 months ago
JSON representation
A ComfyUI custom node extension that integrates the Janus-Pro-7B vision-language model from DeepSeek AI, enabling powerful image understanding and multi-turn conversation capabilities.
- Host: GitHub
- URL: https://github.com/ShmuelRonen/ComfyUI-Janus_pro_vision
- Owner: ShmuelRonen
- License: mit
- Created: 2025-01-29T07:44:35.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2025-01-29T08:03:10.000Z (5 months ago)
- Last Synced: 2025-01-29T08:37:19.565Z (5 months ago)
- Language: Python
- Size: 0 Bytes
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-comfyui - **ComfyUI Janus Pro Vision** - Pro-7B vision-language model from DeepSeek AI on your's local computer, enabling powerful image understanding and multi-turn conversation capabilities. (All Workflows Sorted by GitHub Stars)
README
# ComfyUI Janus Pro Vision
A ComfyUI custom node extension that integrates the Janus-Pro-7B vision-language model from DeepSeek AI on your's local computer, enabling powerful image understanding and multi-turn conversation capabilities.
#### Vision Mode (One or two images)
#### Chat Mode (One or two images)
## Features
- 🖼️ **Advanced Image Analysis**: Leverages Janus-Pro-7B's capabilities for detailed image understanding and description
- 💬 **Multi-turn Chat**: Supports interactive conversations about images with context awareness
- 🔄 **Dual Image Support**: Can analyze relationships between two images simultaneously
- 🚀 **Automatic Model Download**: Downloads model files automatically on first use
- ⚙️ **Flexible Configuration**: Customizable parameters for generation and image processing
- 🎯 **ComfyUI Integration**: Seamless integration with ComfyUI workflow## Installation
1. Clone this repository into your ComfyUI custom nodes folder:
```bash
cd ComfyUI/custom_nodes
git clone https://github.com/ShmuelRonen/ComfyUI-Janus_pro_vision.git
```2. Install required dependencies:
```bash
pip install requests
pip install tqdm
```3. The model files will be automatically downloaded on first use from DeepSeek's HuggingFace repository.
4. If automatic model download failes you can download them manualy to `models\Janus-Pro` folder:
```bash
git clone https://huggingface.co/deepseek-ai/Janus-Pro-7B
```## Available Nodes
### 1. Janus-7b-Pro Model Loader (Upload)
Handles model loading and management.
- Input: None (uses default model path)
- Output: JANUS_MODEL (model object for use in analyzer)### 2. Janus Vision 7b Pro (Chat)
Main analysis node with chat capabilities.Inputs:
- `janus_model`: Model object from loader node
- `image_a`: Primary image for analysis
- `image_b`: (Optional) Secondary image for comparison
- `prompt`: Text prompt/question about the image(s)
- `chat_mode`: Enable/disable chat functionality
- `seed`: Random seed for generation
- `temperature`: Generation temperature (0.0 - 2.0)
- `top_p`: Top-p sampling parameter (0.0 - 1.0)
- `max_tokens`: Maximum generation length
- `image_size`: Target image size for processing (512-2048)
- `frame_size`: Border thickness for image display (1-10)
- `reset_chat`: Clear chat historyOutputs:
- `response`: Model's response text
- `chat_history`: Formatted chat history (in chat mode)## Configuration
### Image Processing Parameters
- `image_size`: Controls the maximum dimension while maintaining aspect ratio (default: 1024)
- Range: 512 to 2048 pixels
- Steps: 64 pixels
- Example: If image is 2000x1000px and image_size=1024:
- Width will be scaled to 1024
- Height will be scaled proportionally to 512- `frame_size`: Border thickness for visual separation (default: 2)
- Range: 1 to 10 pixels
- Example values:
- frame_size=1: Thin border
- frame_size=2: Standard border
- frame_size=5: Thick border
- frame_size=10: Very thick border### Generation Parameters
- `temperature`: Controls response randomness
- 0.1: More focused and deterministic
- 0.7: More creative and varied
- `top_p`: Nucleus sampling parameter (0.95 recommended)
- `max_tokens`: Maximum length of generated response## Model Information
This extension uses the Janus-Pro-7B model from DeepSeek AI, which offers:
- Strong image understanding capabilities
- Multi-turn conversation support
- High-quality natural language generation
- Support for image comparison and analysis## Requirements
- ComfyUI
- Python 3.8+
- PyTorch
- Transformers library
- requests
- tqdm## License
This project is MIT licensed. The Janus-Pro-7B model has its own license from DeepSeek AI.
## Acknowledgments
- DeepSeek AI for the Janus-Pro-7B model
- ComfyUI community for the framework and support## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.