https://github.com/fofr/cog-batch-image-captioning
Caption images for lora training
https://github.com/fofr/cog-batch-image-captioning
ai anthropic captioning claude cog gemini openai replicate
Last synced: 10 months ago
JSON representation
Caption images for lora training
- Host: GitHub
- URL: https://github.com/fofr/cog-batch-image-captioning
- Owner: fofr
- License: mit
- Created: 2024-08-13T09:33:08.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-13T12:52:23.000Z (over 1 year ago)
- Last Synced: 2025-03-24T11:56:55.922Z (11 months ago)
- Topics: ai, anthropic, captioning, claude, cog, gemini, openai, replicate
- Language: Python
- Homepage: https://replicate.com/fofr/batch-image-captioning
- Size: 17.6 KB
- Stars: 8
- Watchers: 1
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# cog-batch-image-captioning
A cog model for batch image captioning using various AI from OpenAI, Anthropic, and Google's Generative AI:
https://replicate.com/fofr/batch-image-captioning
## Features
- Process multiple images from a ZIP archive
- supports png, jpg, jpeg, webp
- Optional image resizing for more cost-effective captioning
- Customizable caption prefixes and suffixes
- Support for multiple AI models:
- OpenAI: GPT-4 and variants
- Anthropic: Claude-3.5, Claude-3 variants
- Google: Gemini-1.5 variants
- Flexible system and message prompts
- Error handling and retry mechanism
- Output as a ZIP file containing captions that match image filenames as well as a CSV summary