https://github.com/neverbiasu/comfyui-image-captioner

A ComfyUI extension for generating captions of images.
https://github.com/neverbiasu/comfyui-image-captioner

comfyui comfyui-nodes comfyui-workflow

Last synced: 3 months ago
JSON representation

A ComfyUI extension for generating captions of images.

Host: GitHub
URL: https://github.com/neverbiasu/comfyui-image-captioner
Owner: neverbiasu
License: mit
Created: 2024-07-26T08:18:09.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-05-12T16:09:03.000Z (5 months ago)
Last Synced: 2025-07-02T14:48:47.304Z (3 months ago)
Topics: comfyui, comfyui-nodes, comfyui-workflow
Language: Python
Homepage:
Size: 410 KB
Stars: 14
Watchers: 1
Forks: 5
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# ComfyUI ImageCaptioner

A [ComfyUI](https://github.com/comfyanonymous/ComfyUI) extension for generating captions for your images. Runs on your own system, no external services used, no filter.

Uses various VLMs with APIs to generate captions for images. You can give instructions or ask questions in natural language.

Try asking for:

* captions or long descriptions
* whether a person or object is in the image, and how many
* lists of keywords or tags
* a description of the opposite of the image

![workflow](assets/workflow.png)

## Installation

1. `git clone https://github.com/neverbiasu/ComfyUI-ImageCaptioner` into your `custom_nodes` folder
- e.g. `custom_nodes\ComfyUI-ImageCaptioner`
2. Open a console/Command Prompt/Terminal etc
3. Change to the `custom_nodes/ComfyUI-ImageCaptioner` folder you just created
- e.g. `cd C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\ComfyUI-ImageCaptioner` or wherever you have it installed
4. Run `pip install -r requirements.txt`

## Usage

Add the node via `image` -> `ImageCaptioner`

Supports tagging and outputting multiple batched inputs.
- **image**: The image you want to make captions.
- **api**: The API of dashscope.
- **use_prompt**: The prompt to drive the VLMs.

## Requirements

U need to get the API of dashscope from the [document](https://help.aliyun.com/zh/dashscope/developer-reference/acquisition-and-configuration-of-api-key?spm=a2c4g.11186623.0.0.7a32fa70GIg3tt)

## See also

* [ComfyUI-WD14-Tagger](https://github.com/pythongosssss/ComfyUI-WD14-Tagger)
* [ComfyUI-LLaVA-Captioner](https://github.com/ceruleandeep/ComfyUI-LLaVA-Captioner)
* [IELTSDuck](https://github.com/neverbiasu/IELTSDuck)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/neverbiasu/comfyui-image-captioner

Awesome Lists containing this project

README